Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-4008

Solr leaves item unindexed if a field/term is too big.

    Details

    • Type: Bug
    • Status: Code Review Needed (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 5.9, 6.3
    • Fix Version/s: None
    • Component/s: Solr
    • Attachments:
      0
    • Comments:
      1
    • Documentation Status:
      Needed

      Description

      The following exceptions are produced:

       

      {{ERROR org.apache.solr.core.SolrCore @ org.apache.solr.common.SolrException: Exception writing document id <id> to the index; possible analysis error.}}

      Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="abstract_and_notification_and_subject_ac" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[...]...', original message: bytes can be at most 32766 in length; got 38285

      {{}}

      Our client found out that some of the new items didn't show up on the recent list, and after a bit of digging we found out, that this is due to the hardcoded size limit the underlying Lucene index has ([LUCENE-5472|https://issues.apache.org/jira/browse/LUCENE-5472),] which is 32kb, and which leaves the item unindexed.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              anismou Anis Moubarik
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: