Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-1387

Reports that Google Scholar is sometimes linking to DSpace extracted text (*.pdf.txt) files instead of original PDF

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: More Details Needed (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: XMLUI
    • Labels:
      None
    • Attachments:
      0
    • Comments:
      12
    • Documentation Status:
      Needed

      Description

      This ticket is a placeholder for several recent reports about PDF indexing oddities with Google Scholar and DSpace (seemingly XMLUI specific, though that is unconfirmed).

      In several cases, users have reported that Google Scholar is mistakenly linking to the internal extracted PDF text files (*.pdf.txt files). These internal ".pdf.txt" files are automatically generated by DSpace for its own indexing, and are not meant to be utilized by external search engines.

      Although the "*.pdf.txt" files are technically publicly accessible, they are currently not linked to from the main Item "splash page", so it's uncertain how they are being located by web spiders. (Some have speculated perhaps form the OAI interface, or from indexing of the XMLUI's "mets.xml" file)

      Here are a few threads describing this issues on dspace-tech mailing list:

      If anyone else has noticed this issue, we'd encourage you to provide examples in this JIRA ticket. It may help us to better track down whether this is a DSpace issue, a Google Scholar issue, or perhaps even a bit of both.

      When you add comments to this ticket, please provide the DSpace version you are using and whether you are using XMLUI or JSPUI and whether you have OAI enabled. If you have any examples you can link to in Google Scholar or any other oddities you've noticed, please note those as well.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tdonohue Tim Donohue
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: