Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-2832

Discovery indexing loads full text of documents into Solr's documentCache, which can result in OOM errors


    • Attachments:
    • Comments:
    • Documentation Status:
      Not Required


      This is related to DS-2823, and possibly also DS-2788.

      Our current settings for the Solr "documentCache" do not seem to be optimized for the best memory usage (especially for smaller sites which wish to run at 1-2GB memory allocated to Tomcat).

      Currently, the "documentCache" is set at 512 documents by default:

      While this 512 setting may be reasonable for sites with more memory, it may be too large for sites running with just 1GB of memory, if their indexed documents are also larger in size. (For one of our sites with larger PDFs, the Solr Document size seems to be minimally 2MB...and therefore a cache of 512 Documents expands beyond 1GB of memory)

      As noted in the Solr Caching documentation: "The more fields you store in your documents, the higher the memory usage of this cache will be."

      The Solr documentation recommends to not store large fields in the documentCache by specifically enabling Lazy Loading (enableLazyFieldLoading=true) and not specifying larger fields in the "fl" (field list).

      While our "solrconfig.xml" DOES enable Lazy Loading of fields, it seems the default field (df) is always set to "search_text", which includes all fields:

      Default field is "search_text"

      "search_text" is a copy of all fields:

      The end result of all this is that it seems like our Solr documentCache ALWAYS includes the full text of indexed documents. If that full text is regularly large in size, the 512 documents in that cache can quickly use up 1GB (or more) memory, which can result in OOM (heap size) errors for smaller sites.

      We should either find a way to have the full text be "lazy loaded", or decrease the size of the documentCache by default so that it is less likely for smaller sites to run out of memory when utilizing this cache.

      For the one site where I'm seeing this behavior the most, the OOM errors always occur when running `./dspace index-discovery` (no args). But, surprisingly, `./dspace index-discovery -b` runs fine.


          Issue Links



              • Assignee:
                tdonohue Tim Donohue
                tdonohue Tim Donohue
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: