[AccessD] Scanning pdfs

Wed Feb 16 15:50:25 CST 2005

On 16 Feb 2005 at 13:50, John W. Colby wrote:

> Can data be extracted from a pdf?  I am looking at getting data into a
> database, but the data is coming in a pdf format.
> 

I use PsToText with Ghostscript to extract the text for my SeachPDF 
utility. DOn't know how good it would at preserving formatted data, but it 
would be worth a try.

<quote>
=================================================================== 
pstotext.txt    5 February 2000
=================================================================== 
pstotext 1.8h - PostScript text extractor.  Requires Ghostscript.

The files pstotxt1.dll (Win16), pstotxt2.dll (OS/2) pstotxt3.dll (Win32), 
and pstotext.zip (sources) constitute the pstotext package, which was 
written by Paul McJones and Andrew Birrell of Digital Equipment 
Corporation's Systems Research Center.  These files are copyright by 
Digital Equipment Corporation.  You may use them subject to the attached 
END USER LICENSE AGREEMENT.

The source files are available as pstotext.zip in the GSview source
distribution, or directly from the authors: 
    http://www.research.digital.com/SRC/virtualpaper/pstotext.html
<quote>

-- 
Stuart