Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-1523

detection of duplicate items during import and submission


    • Attachments:
    • Comments:
    • Documentation Status:


      Users expressed the need for DSpace to detect whether an item they're about to import/submit already exists in the repository. This issue is trying to capture the requirements for this feature.

      The major point here is the definition of a duplicity. Some uses already have a strict definition of a duplicity, e.g. an equal value of a metadata field (dc.identifier.uuid). Others may depend on similarity of multiple metadata fields (e.g. dc.title, dc.issn) which may be expressed by Levenshtein distance while the rest may even be different (e.g. different values in dc.contributor.autor).

      This leads me to the conclusion that we need to provide a way for users to define their own method of comparison by means of a plugin. The disadvantage of this approach is that checking each imported item against all existing items using an user-defined (possibly non-optimally fast) method may slow down import and therefore the feature needs to be opt-in. Of course we should provide implementations for some commonly used cases, like those mentioned above. The input to the comparison method should be the item DSO (so that its metadata and bitstreams can be read) with the parent object filled in so that the search can be restricted to a community/collection in order to make it possible to reduce the search scope.

      Here are some recent discussion on this topic:


          Issue Links



              • Assignee:
                helix84 Ivan Masár
              • Votes:
                4 Vote for this issue
                6 Start watching this issue


                • Created: