Uploaded image for project: 'Islandora'
  1. Islandora
  2. ISLANDORA-1624

Allow HOCR bounding box rendering from other sources than tesseract

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 7.x-1.8
    • Component/s: OCR
    • Labels:
      None

      Description

      --Use Case--
      Allow other OCR engine's HOCR to render bounding boxes within IAV and OpenSeadragon.

      --Technical Approach--
      Strip down the specific tesseract related things we rely upon in HOCR rendering code that aren't actually bubbled up and used in the theme layer for either of the viewers that render it. HOCR specification is varying loose as to what's required and the original implementation used tesseract's output as a basis for Islandora's implementation.

      --Test Case--
      Books:

      Test out HOCR generated via tesseract and have bounding boxes applied correctly when searching for terms within the IAV book.

      Newspapers:

      From search results from a specific indexed term bounding boxes are rendered correctly on the newspaper page when a result is clicked.

      Will provide sample assets of other generated HOCR and a JP2 when I receive an uncopyrighted source copy that I can attach.

      --Impact--
      None.

      ----
      Jordan Dukart
      Developer
      discoverygarden inc. | Managing Digital Content

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jordandukart Jordan Dukart
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: