Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-866

OAI requests with resumption token start at the wrong offset if non-public items are included and there are withdrawn items

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1, 1.6.2, 1.7.0, 1.7.1
    • Fix Version/s: 1.8.0
    • Component/s: DSpace API
    • Labels:
      None
    • Attachments:
      1
    • Comments:
      4
    • Documentation Status:
      Not Required

      Description

      In some circumstances, items are not disseminated via OAI.

      OAI responds to listRecords requests with batches of up to 100 (or other number set via oai.didl.maxresponse property) records. Requests can include a resumption token to specify which batch is required. The resumption token is then translated into an offset to the response from the database.

      OAI listRecords responses contain withdrawn items (marked as "deleted").

      If harvest.includerestricted.oai = false is set in dspace.cfg, only publicly readable items are included in the response. The offset into the database results is recalculated to skip over restricted items. If a withdrawn item is found in the database response, this item will also be skipped over by the offset recalculation code (because it is not publicly readable) even though it shouldn't because it will still be included in the OAI response. The code that actually adds items to the response, in contrast, does not skip over withdrawn items. The next batch will start n items later in the database response than it should, where n is the number of withdrawn items before the start of the batch.

      This means that a full OAI harvest via consecutive listRecords requests will miss as many items as there are withdrawn items.

      I first found this in the 1.6.1 code and it is still present in 1.7.1. I didn't check whether this affects earlier versions too. A patch against 1.7.1 is attached.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              schweer Andrea Schweer
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: