[dba-Tech] PDF data extractor recommendation

MartyConnelly martyconnelly at shaw.ca
Fri Mar 18 11:42:49 CST 2005


Do you mean text extraction or image extraction, and then it depends if 
text has been rendered by OCR software.
I believe Acrobat 7 holds a internal xml file of searchable text if the 
underlying image has been OCRed.
You could use this PDF Converter Professional   from ScanSoft $99
Converts PDF to Word or text files(I think not sure there was a sendto 
command) or visversa
 http://www.scansoft.com/pdfconverter/professional/

If you just want to index and  search non OCR'ed PDF files on a disk  ie 
a fax scanned into a PDF you could use this new Beta
The ScanSoft OmniPage Search Indexer enables you to search text found in 
image files using world-leading optical character recognition (OCR) 
technology. For example, it enables you to search the text found in 
electronic fax documents that you may receive via email, as well as 
other image formats including PDF, TIF, JPG, BMP, and Paperport.
http://desktop.google.com/plugins/omnipagesearch.html

It requires installation of this 3 month beta  from ScanSoft plus Google 
Destop Search Engine
This requires a fair amount of disk space and memory to do the indexing 
if you plan to index a couple of thousand image PDF's
plan on running it overnight.

jmoss111 at bellsouth.net wrote:

> Can anyone recommend a reasonably priced, or free pdf extraction tool?
>
>Thanks,
>
>Jim
>
>
>_______________________________________________
>dba-Tech mailing list
>dba-Tech at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-tech
>Website: http://www.databaseadvisors.com
>
>  
>

-- 
Marty Connelly
Victoria, B.C.
Canada






More information about the dba-Tech mailing list