Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-1857

Unhandled exception in BTE batch import when uploading CSV files with misconfiguration options

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1
    • Component/s: None
    • Attachments:
      0
    • Comments:
      11
    • Documentation Status:
      Not Required

      Description

      Demo server encounters downtimes and the log output before the crash is always the same:

      ==================================================
      1290223 [http-bio-80-exec-16] INFO org.apache.solr.core.SolrCore – [statistics] webapp=/solr path=/update params=

      {wt=javabin&version=2}

      status=0 QTime=5
      INFO: Dataloader will load data from the file specified in the command prompt (and not from the Spring XML configuration file)
      INFO: Dataloader gr.ekt.bteio.loaders.CSVDataLoader@5ec06863 will be used for the import!
      Exception
      java.lang.ArrayIndexOutOfBoundsException: 31
      at gr.ekt.bteio.loaders.CSVDataLoader.getRecords(CSVDataLoader.java:104)
      at gr.ekt.bteio.loaders.CSVDataLoader.getRecords(CSVDataLoader.java:120)
      at gr.ekt.bte.core.TransformationEngine.transform(TransformationEngine.java:87)
      at org.dspace.app.itemimport.ItemImport.addBTEItems(ItemImport.java:712)
      at org.dspace.app.itemimport.ItemImport.access$300(ItemImport.java:93)
      at org.dspace.app.itemimport.ItemImport$3.run(ItemImport.java:2028)
      Adding items from directory: /home/dspace/dspace/imports/20140106_1247_ojX47WYk/.bte_output_dspace
      Generating mapfile: /home/dspace/dspace/imports/20140106_1247_ojX47WYk/mapfile
      Error, cannot open source directory /home/dspace/dspace/imports/20140106_1247_ojX47WYk/.bte_output_dspace
      Jan 06, 2014 12:47:12 PM org.apache.coyote.AbstractProtocol pause
      INFO: Pausing ProtocolHandler ["http-bio-80"]
      Jan 06, 2014 12:47:12 PM org.apache.coyote.AbstractProtocol pause
      INFO: Pausing ProtocolHandler ["ajp-bio-8009"]
      Jan 06, 2014 12:47:12 PM org.apache.catalina.core.StandardService stopInternal
      INFO: Stopping service Catalina
      Jan 06, 2014 12:47:13 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
      SEVERE: The web application [/jspui] appears to have started a thread named [Thread-41] but has failed to stop it. This is very likely to create a memory leak.
      Jan 06, 2014 12:47:15 PM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
      SEVERE: The web application [/xmlui] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@513de7e9]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
      Jan 06, 2014 12:47:15 PM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
      SEVERE: The web application [/xmlui] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@513de7e9]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
      Jan 06, 2014 12:47:15 PM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
      ==================================================

      It seems that the problem is in BTE.
      BTE does not take care the case of misconfiguring the CSVDataLoader, that is, BTE does not raise an exception when someone says to CSVDataLoader to read a column that doesn’t exist in this CSV.

      Moreover, default configuration for CSVDataLoader in DSpace:

      https://github.com/DSpace/DSpace/blob/master/dspace/config/spring/api/bte.xml#L284

      has some random numbers (like the 31 that is mentioned in the error you sent) for the column indices.

      Thus, my guess is that someone is trying to upload a random csv file in the demo dspace without configuring the BTE at all (and how could he ), so the exception of "ArrayIndexOutOfBoundsException" is raised.

      The same applies for the TSVDataLoader as well.

      Another problem is that the aforementioned exception somehow brings down all the webapps.
      Also, this must have been launched from the UI, so we should make sure that the error bubbles up properly to the UI.

        Attachments

          Activity

            People

            • Assignee:
              kstamatis Kostas Stamatis
              Reporter:
              kstamatis Kostas Stamatis
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: