Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Won't Fix
    • Affects Version/s: 1.8.2
    • Fix Version/s: None
    • Component/s: OAI-PMH
    • Labels:
      None
    • Environment:
      Oracle Database
    • Attachments:
      1
    • Comments:
      9

      Description

      Patch to solve this problem:

      http://sourceforge.net/mailarchive/message.php?msg_id=29421592

      The new OAICAT version does not solve this problem.
      In attachment, you could see a patch to apply over the dspace-oai source:

      $ cd [DSPACE-SOURCE]
      $ patch -p0 < [PATH-TO]/patch-dspace-oai-oracle-db.diff

        Attachments

          Activity

          Hide
          helix84 Ivan Masár added a comment -

          Thanks for the workaround. I also think this workaround should get rid of the problem.
          I agree that upgrading OAICat won't help.

          I looked at the latest OAICat source and the from/until dates Oracle can't cope with are set here:

          oaicat-1.5.59/src/ORG/oclc/oai/server/verb/ListRecords.java:146

          So we have these options:
          1) Find out why exactly Oracle doesn't like this format. Then we could change the format in OAICat to an acceptable one.
          2) File an issue with OAICat and let them solve it (It's likely they won't respond: <http://code.google.com/p/oaicat/issues/list>)
          3) Apply the workaround in DSpace

          Show
          helix84 Ivan Masár added a comment - Thanks for the workaround. I also think this workaround should get rid of the problem. I agree that upgrading OAICat won't help. I looked at the latest OAICat source and the from/until dates Oracle can't cope with are set here: oaicat-1.5.59/src/ORG/oclc/oai/server/verb/ListRecords.java:146 So we have these options: 1) Find out why exactly Oracle doesn't like this format. Then we could change the format in OAICat to an acceptable one. 2) File an issue with OAICat and let them solve it (It's likely they won't respond: < http://code.google.com/p/oaicat/issues/list >) 3) Apply the workaround in DSpace
          Hide
          lyncode DSpace @ Lyncode added a comment - - edited

          Hi Ivan Masár,

          allow me to comment on your post.

          – OAICAT

          I think the correct way of solving this problem is changing the OAICat behavior, using tweaks as solutions isn't a good method. Theoretically (based on the OAI-PMH protocol) OAICAT behavior it is not correct. That is, assuming from=0001-01-01 and/or until=9999-12-31 on the absence of from/until parameters is not correct. If not defined, they should be considered as it: undefined.

          This behavior could generate two useless SQL WHERE clauses that, in practice, only adds more complexity (lower performance) to the OAI implementation.

          – ORACLE

          The Oracle problem, in fact, is related with the defined timezone.

          Look at here: http://grepcode.com/file/repo1.maven.org/maven2/org.dspace/dspace-api/1.8.2/org/dspace/search/Harvest.java#Harvest.toTimestamp%28java.lang.String%2Cboolean%29

          Oracle database should be defined with UTC timezone.

          – Conclusion

          OAICAT is working wrong.
          There are some ORACLE issues that must be documented in the DSpace manual.
          The patch (only) solves the OAICAT problem.

          Show
          lyncode DSpace @ Lyncode added a comment - - edited Hi Ivan Masár, allow me to comment on your post. – OAICAT I think the correct way of solving this problem is changing the OAICat behavior, using tweaks as solutions isn't a good method. Theoretically (based on the OAI-PMH protocol) OAICAT behavior it is not correct. That is, assuming from=0001-01-01 and/or until=9999-12-31 on the absence of from/until parameters is not correct. If not defined, they should be considered as it: undefined. This behavior could generate two useless SQL WHERE clauses that, in practice, only adds more complexity (lower performance) to the OAI implementation. – ORACLE The Oracle problem, in fact, is related with the defined timezone. Look at here: http://grepcode.com/file/repo1.maven.org/maven2/org.dspace/dspace-api/1.8.2/org/dspace/search/Harvest.java#Harvest.toTimestamp%28java.lang.String%2Cboolean%29 Oracle database should be defined with UTC timezone. – Conclusion OAICAT is working wrong. There are some ORACLE issues that must be documented in the DSpace manual. The patch (only) solves the OAICAT problem.
          Hide
          helix84 Ivan Masár added a comment - - edited

          Thank you for your insights, they are very useful.

          OAICAT:

          I agree with you that when from/until are not specified, OAICat shouldn't make the assumptions it makes. It should leave such assumption for the application layer (e.g. in our case DSpace).

          "Harvesting is restricted to the range specified by the from and until arguments, extending back to the earliest datestamp if from is omitted, and forward to the most recent datestamp if until is omitted."

          http://www.openarchives.org/OAI/openarchivesprotocol.html#SelectiveHarvestingandDatestamps

          I will file a bug with them and send them a patch removing these assumptions, but I'm afraid they won't be very responsive. I'll also try to contact particular authors.

          Anyway, we have an alternative to waiting for them to fix it - our OAICat is pulled from Maven, so we could put the patched version of OAICat in our Maven repository. However, I don't think this would be a better solution than applying the workaround in DSpace, because it could mislead someone else who would try to use org.dspace.oaicat to think it's an unpatched version.

          ORACLE:

          I understand the OAICat problem, but I still don't completely understand the Oracle problem. Why doesn't it accept these timestamps? What does timezone have to do with it? OAI-PMH mandates that all timezones used in the protocol must be in UTC. But why not convert them for use in the application to the local timezone?

          Would it help to detect local timezone and use it this way?
          df.setCalendar(Calendar.getInstance(TimeZone.getTimeZone(localTimezone)));

          MANUAL:

          This is the easiest thing to change, so if you have anything to put in the manual, please attach it here.

          Show
          helix84 Ivan Masár added a comment - - edited Thank you for your insights, they are very useful. OAICAT: I agree with you that when from/until are not specified, OAICat shouldn't make the assumptions it makes. It should leave such assumption for the application layer (e.g. in our case DSpace). "Harvesting is restricted to the range specified by the from and until arguments, extending back to the earliest datestamp if from is omitted, and forward to the most recent datestamp if until is omitted." http://www.openarchives.org/OAI/openarchivesprotocol.html#SelectiveHarvestingandDatestamps I will file a bug with them and send them a patch removing these assumptions, but I'm afraid they won't be very responsive. I'll also try to contact particular authors. Anyway, we have an alternative to waiting for them to fix it - our OAICat is pulled from Maven, so we could put the patched version of OAICat in our Maven repository. However, I don't think this would be a better solution than applying the workaround in DSpace, because it could mislead someone else who would try to use org.dspace.oaicat to think it's an unpatched version. ORACLE: I understand the OAICat problem, but I still don't completely understand the Oracle problem. Why doesn't it accept these timestamps? What does timezone have to do with it? OAI-PMH mandates that all timezones used in the protocol must be in UTC. But why not convert them for use in the application to the local timezone? Would it help to detect local timezone and use it this way? df.setCalendar(Calendar.getInstance(TimeZone.getTimeZone(localTimezone))); MANUAL: This is the easiest thing to change, so if you have anything to put in the manual, please attach it here.
          Hide
          lyncode DSpace @ Lyncode added a comment - - edited

          Hi Ivan Masár,

          my fault, i did not gave a good explanation of the Oracle problem, but i really don't have sure of it. The problem is with the interface between DSpace and Oracle, maybe the oracle jdbc driver specifics, maybe with the jvm (system)/oracle timezone.

          My point is here:

          parameters: 341,0001-01-01 01:00:00.0,10000-01-01 00:59:59.999 [1]

          The end date 9999-12-31 (assumed by OAICAT) is transformed to 10000-01-01, raising the oracle jdbc driver exception "Year out of range".

          "Oracle Database can store dates in the Julian era, ranging from January 1, 4712 BCE through December 31, 9999 CE (Common Era, or 'AD'). Unless BCE ('BC' in the format mask) is specifically used, CE date entries are the default." [2]

          So what is happening is related with timezones. Oracle database is getting the timestamp not into GMT but into another timezone (GMT+1 in this case) - something is happening, i can't well explain but the log at DatabaseManager.query is also formatting this date to an GMT+1 (maybe the defined jvm TimeZone?)

          Nevertheless, in our services, all system timezones are configured as GMT, with this, one have guaranteed that this date conversions problems does not happen. Maybe another workaround would solve the problem (using the date parse functions available in Oracle/PostgreSQL - allows one to specify the timezone giving time as a string - with this it will be possible to have control over those conversions).

          – Resources

          [1] http://sourceforge.net/mailarchive/message.php?msg_id=29421592
          [2] http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i1847

          Show
          lyncode DSpace @ Lyncode added a comment - - edited Hi Ivan Masár, my fault, i did not gave a good explanation of the Oracle problem, but i really don't have sure of it. The problem is with the interface between DSpace and Oracle, maybe the oracle jdbc driver specifics, maybe with the jvm (system)/oracle timezone. My point is here: parameters: 341,0001-01-01 01:00:00.0,10000-01-01 00:59:59.999 [1] The end date 9999-12-31 (assumed by OAICAT) is transformed to 10000-01-01, raising the oracle jdbc driver exception "Year out of range". "Oracle Database can store dates in the Julian era, ranging from January 1, 4712 BCE through December 31, 9999 CE (Common Era, or 'AD'). Unless BCE ('BC' in the format mask) is specifically used, CE date entries are the default." [2] So what is happening is related with timezones. Oracle database is getting the timestamp not into GMT but into another timezone (GMT+1 in this case) - something is happening, i can't well explain but the log at DatabaseManager.query is also formatting this date to an GMT+1 (maybe the defined jvm TimeZone?) Nevertheless, in our services, all system timezones are configured as GMT, with this, one have guaranteed that this date conversions problems does not happen. Maybe another workaround would solve the problem (using the date parse functions available in Oracle/PostgreSQL - allows one to specify the timezone giving time as a string - with this it will be possible to have control over those conversions). – Resources [1] http://sourceforge.net/mailarchive/message.php?msg_id=29421592 [2] http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i1847
          Hide
          diglesias Domingo Iglesias added a comment -

          Hi

          Another thing not directly related to the bug but to the "expected" funcionality is that I have the property in oaicat.properties:

          Identify.earliestDatestamp=2006-01-01T00:00:00Z

          that is properly retrieved when I issue and "Identify" request.

          So, two considerations:

          • I think that the normal "from date" to use in the request should be this one
          • Should it be useful to add a Identify.latestDatestamp (default null)??
          Show
          diglesias Domingo Iglesias added a comment - Hi Another thing not directly related to the bug but to the "expected" funcionality is that I have the property in oaicat.properties: Identify.earliestDatestamp=2006-01-01T00:00:00Z that is properly retrieved when I issue and "Identify" request. So, two considerations: I think that the normal "from date" to use in the request should be this one Should it be useful to add a Identify.latestDatestamp (default null)??
          Hide
          helix84 Ivan Masár added a comment -

          I'm attaching my email to Jeff Young, the developer of OAICat:

          Hi Jeff,

          we're using OAICat in DSpace. We've encountered an issue with OAICat
          presetting from and until dates when they are not specified, which we
          think doesn't conform to the OAI-PMH specification. This is causing us
          a problem, in particular that the "until" date is changed from
          9999-12-31 to 10000-01-01 during timezone conversion and this is out
          of range for Oracle. We were considering a workaround to nullify the
          from/until date you're setting in OAICat in our DSpace code, but as I
          said, we do not think OAICat behaves according to specs.

          You can find the details here:
          https://jira.duraspace.org/browse/DS-1195

          We'd appreciate your feedback. Would you be willing to remove the code
          setting from/until to 0001-01-01/9999-12-31?
          This is in src/ORG/oclc/oai/server/verb/ListRecords.java, lines 147 and 150.

          Thanks in advance.

          Regards,
          ~~helix84

          Show
          helix84 Ivan Masár added a comment - I'm attaching my email to Jeff Young, the developer of OAICat: Hi Jeff, we're using OAICat in DSpace. We've encountered an issue with OAICat presetting from and until dates when they are not specified, which we think doesn't conform to the OAI-PMH specification. This is causing us a problem, in particular that the "until" date is changed from 9999-12-31 to 10000-01-01 during timezone conversion and this is out of range for Oracle. We were considering a workaround to nullify the from/until date you're setting in OAICat in our DSpace code, but as I said, we do not think OAICat behaves according to specs. You can find the details here: https://jira.duraspace.org/browse/DS-1195 We'd appreciate your feedback. Would you be willing to remove the code setting from/until to 0001-01-01/9999-12-31? This is in src/ORG/oclc/oai/server/verb/ListRecords.java, lines 147 and 150. Thanks in advance. Regards, ~~helix84
          Hide
          lyncode DSpace @ Lyncode added a comment -

          Domingo Iglesias referred another OAICAT problem (in terms of OAI-PMH specification).
          The configured earliestDatestamp could not be the earliest (within the database) if users set this value to a wrong one.

          "earliestDatestamp : a UTCdatetime that is the guaranteed lower limit of all datestamps recording changes, modifications, or deletions in the repository. A repository must not use datestamps lower than the one specified by the content of the earliestDatestamp element. earliestDatestamp must be expressed at the finest granularity supported by the repository." [1]

          As Domingo Iglesias said, this value should be calculated at runtime, avoiding such problems.

          – References

          [1] http://www.openarchives.org/OAI/openarchivesprotocol.html#Identify

          Show
          lyncode DSpace @ Lyncode added a comment - Domingo Iglesias referred another OAICAT problem (in terms of OAI-PMH specification). The configured earliestDatestamp could not be the earliest (within the database) if users set this value to a wrong one. "earliestDatestamp : a UTCdatetime that is the guaranteed lower limit of all datestamps recording changes, modifications, or deletions in the repository. A repository must not use datestamps lower than the one specified by the content of the earliestDatestamp element. earliestDatestamp must be expressed at the finest granularity supported by the repository." [1] As Domingo Iglesias said, this value should be calculated at runtime, avoiding such problems. – References [1] http://www.openarchives.org/OAI/openarchivesprotocol.html#Identify
          Hide
          diglesias Domingo Iglesias added a comment -

          And one more thing. There is a side effect that provoques an invalid generation of the resumptionToken.
          These lines of code:
          if (from != null && from.startsWith("0001-01-01")) from = null;
          if (until != null && until.startsWith("9999-12-31")) until = null;
          generate this resumptionToken //hdl_myinstitutionhandle_2/oai_dc/100

          Show
          diglesias Domingo Iglesias added a comment - And one more thing. There is a side effect that provoques an invalid generation of the resumptionToken. These lines of code: if (from != null && from.startsWith("0001-01-01")) from = null; if (until != null && until.startsWith("9999-12-31")) until = null; generate this resumptionToken //hdl_myinstitutionhandle_2/oai_dc/100
          Hide
          joaomelo João Melo added a comment -

          The new OAI 2.0 interface solves it.

          Show
          joaomelo João Melo added a comment - The new OAI 2.0 interface solves it.

            People

            • Assignee:
              joaomelo João Melo
              Reporter:
              lyncode DSpace @ Lyncode
              Reviewer:
              Domingo Iglesias
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: