See original description here...
Here are a pair of mediafilters to process PDF files with the
XPDF suite (see http://www.foolabs.com/xpdf/ ) replacing the
one based on PDFBox. They invoke an external command, which
must be configured. It has been tested on Unix and the concept
ought to work on Windows (and certainly on MacOS X).
XPDF2Text is a replacement for the existing PDF media filter, it
creates extracted text using the pdftotext program. I've observed it
is about 3 times as fast, and much more reliable, than PDFBox.
XPDF2Thumbnail creates a thumbnail image for the first page of
the PDF. This is especially effective for 3D PDF renderings of
engineering models, but works fine for any document.
See the instructions in xpdf-filters.html to install it.
The thumbnail filter needs an additional image library, but
the text extractor doesn't need anything else.
This code has been tested with DSpace 1.5.1