Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-2806

Content Filter / Cleaning

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: More Details Needed (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.1, 5.3
    • Fix Version/s: None
    • Component/s: JSPUI, OAI-PMH, Solr
    • Labels:
      None
    • Environment:
      Linux
    • Attachments:
      1
    • Comments:
      1
    • Documentation Status:
      Needed

      Description

      We want to share a problem related to the way users input metadata on the Input-forms and the problems this causes. By our experience this is mainly due to a copy/paste action from PDF files (usually an abstract of an article) and then some hiden characters are included in the text of the metadata.
      On the user interface (JSPUI) we don't have any problem, unless when they are visible (see attachement) but this causes sometimes problems on the SOLR or on the OAI-PMH interface has the XML structure is not correct. This process invalidates the harvesting process of the repository for the item and the others items after the item with errors.

      From many integrations we develop with DSpace, this problem is very usual and avoid a good interoperability. We suggest that the content could be "cleaned" to avoid these problems just after the user finish the deposit of the item. Is this possible to improve? or there is some configuration we can define to correct this?

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            josekarvalho Jose Carvalho
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated: