Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-2367

Improve BitstreamReader Performance and Caching behavior in XMLUI

    XMLWordPrintable

    Details

    • Attachments:
      0
    • Comments:
      3
    • Documentation Status:
      Needed

      Description

      BitstreamReader has a number of limitations in regards to handling of Caching and Not-Modified responses. Improvements have been provided in the following pull request:

      https://github.com/DSpace/DSpace/pull/800

      1.) The equivalency check evaluates incorrectly in the case that the client stores and returns the Last-Modified date. It should evaluate the Last-Modified = Modified-Since as a 304, not a 200.

      See line:
      https://github.com/DSpace/DSpace/blob/master/dspace-xmlui/src/main/java/org/dspace/app/xmlui/cocoon/BitstreamReader.java#L575

      if (modSince != -1 && item != null && item.getLastModified().getTime() < modSince)

      should be

      if (modSince != -1 && item != null && item.getLastModified().getTime() <= modSince)

      2.) The same issue happens in SitemapReader and causes sitemaps to always be retrieved with a 200.

      https://github.com/DSpace/DSpace/blob/master/dspace-xmlui/src/main/java/org/dspace/app/xmlui/cocoon/SitemapReader.java#L144

      3.) The assumption that 304 should not be returned for clients that are not bots because of risks that protected will be cached in public browsers is both poorly written, erroneous and unnecessary. The pull request removes the need to add bot Agent Headers to the sitemap because 304 is now allowed for all anonymously accessible content.

      The decision concerning if the file will be cached in the users browser is driven off of caching and expires headers, last-modified and modified-since headers are just "messaging" to make decisions regarding caching.

      Better support for caching rules has been added, Caching rules are now adjusted as follows:

      a.) Only anonymously accessible content will have headers that designate caching is allowed. The expires time has been made configurable in the dspace.config and the caching header must-revalidate option forces proxies to revalidate their cache (getting a 304 if they can just return their content).

      b.) If an anonymously accessible and cacheable bitstream is made private, the 403/401 response will force the browser to alert the user of the access limitation, in DSpace 4.2/5.0 case, it will redirect to request a copy or password login.

      c.) if the repository admin wishes to block caching of bitstreams in search engines and proxies, an override has been provided.

      Finally, there are two further efficiencies added.

      i.) CoverPage calculations: The decision to return a 304 Not-Modified is completed and returned prior to calculation of a citation CoverPage reduce computation costs.

      ii.) Open DB Connections and GC: The Item that is held onto by the reader during file download has been dropped and more primitive objects used to communicate the results of setup to the generate method. This means that the DB Connection held onto by the Context in the Item can properly be released and the Item object prior to downloading content to the client.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mdiggory Mark Diggory
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: