Stuart McLachlan
stuart at lexacorp.com.pg
Wed Feb 16 15:50:25 CST 2005
On 16 Feb 2005 at 13:50, John W. Colby wrote:
> Can data be extracted from a pdf? I am looking at getting data into a
> database, but the data is coming in a pdf format.
>
I use PsToText with Ghostscript to extract the text for my SeachPDF
utility. DOn't know how good it would at preserving formatted data, but it
would be worth a try.
<quote>
===================================================================
pstotext.txt 5 February 2000
===================================================================
pstotext 1.8h - PostScript text extractor. Requires Ghostscript.
The files pstotxt1.dll (Win16), pstotxt2.dll (OS/2) pstotxt3.dll (Win32),
and pstotext.zip (sources) constitute the pstotext package, which was
written by Paul McJones and Andrew Birrell of Digital Equipment
Corporation's Systems Research Center. These files are copyright by
Digital Equipment Corporation. You may use them subject to the attached
END USER LICENSE AGREEMENT.
The source files are available as pstotext.zip in the GSview source
distribution, or directly from the authors:
http://www.research.digital.com/SRC/virtualpaper/pstotext.html
<quote>
--
Stuart