[dba-Tech] PDF data extractor recommendation

jmoss111 at bellsouth.net jmoss111 at bellsouth.net
Fri Mar 18 13:54:17 CST 2005


Thanks James, but that sounds like too much manual labor for what I have to do. I will most likely go ahead and purchase something because I end up extracting columnar data from pdfs a couple of times a year.

The last time I had to extract from pdf, I had a 30 day demo of a very nice tool that let you define your columnar data by drawing around the columns and saving that as a model, kind of like Monarch. The problem was the price.... $249.00.  If extracting columns from pdf files was something I did on a daily basis, it wouldn't be a problem. Oh well, no bucks No Buck Rogers.

> 
> From: James Barash <James at fcidms.com>
> Date: 2005/03/18 Fri PM 02:42:44 EST
> To: "'Discussion of Hardware and Software issues'"
> 	<dba-tech at databaseadvisors.com>
> Subject: RE: Re: [dba-Tech] PDF data extractor recommendation
> 
> Jim:
> 
> If you only need to do this once, there is a Select Text tool in Acrobat
> Reader 6.0 that will allow you to highlight and copy text, including columns
> within a document, and then paste it into Excel, Word or any other windows
> program. It's a manual process but it does work. Clearly if you need to do
> this on an ongoing basis, it's not the best solution, but it is free.
> 
> James Barash
> 
> -----Original Message-----
> From: dba-tech-bounces at databaseadvisors.com
> [mailto:dba-tech-bounces at databaseadvisors.com] On Behalf Of
> jmoss111 at bellsouth.net
> Sent: Friday, March 18, 2005 1:01 PM
> To: Discussion of Hardware and Software issues
> Subject: Re: Re: [dba-Tech] PDF data extractor recommendation
> 
> Thanks Marty.  I want to extract columnar text from pdf's created by the
> invoicing module in QuickBooks Pro 2004. I would imagine that the pdf's are
> probably somewhere between v3 and v6, whichever engine Intuit uses to send
> invoices by email. I'm only talking about twenty 25 page pdf files.
> 
> I will look into the ScanSoft product. Once again, Thanks.
> 
> Jim
> 
> 
> > 
> > From: MartyConnelly <martyconnelly at shaw.ca>
> > Date: 2005/03/18 Fri PM 12:42:49 EST
> > To: Discussion of Hardware and Software issues
> <dba-tech at databaseadvisors.com>
> > Subject: Re: [dba-Tech] PDF data extractor recommendation
> > 
> > Do you mean text extraction or image extraction, and then it depends if 
> > text has been rendered by OCR software.
> > I believe Acrobat 7 holds a internal xml file of searchable text if the 
> > underlying image has been OCRed.
> > You could use this PDF Converter Professional   from ScanSoft $99
> > Converts PDF to Word or text files(I think not sure there was a sendto 
> > command) or visversa
> >  http://www.scansoft.com/pdfconverter/professional/
> > 
> > If you just want to index and  search non OCR'ed PDF files on a disk  ie 
> > a fax scanned into a PDF you could use this new Beta
> > The ScanSoft OmniPage Search Indexer enables you to search text found in 
> > image files using world-leading optical character recognition (OCR) 
> > technology. For example, it enables you to search the text found in 
> > electronic fax documents that you may receive via email, as well as 
> > other image formats including PDF, TIF, JPG, BMP, and Paperport.
> > http://desktop.google.com/plugins/omnipagesearch.html
> > 
> > It requires installation of this 3 month beta  from ScanSoft plus Google 
> > Destop Search Engine
> > This requires a fair amount of disk space and memory to do the indexing 
> > if you plan to index a couple of thousand image PDF's
> > plan on running it overnight.
> > 
> > jmoss111 at bellsouth.net wrote:
> > 
> > > Can anyone recommend a reasonably priced, or free pdf extraction tool?
> > >
> > >Thanks,
> > >
> > >Jim
> > >
> > >
> > >_______________________________________________
> > >dba-Tech mailing list
> > >dba-Tech at databaseadvisors.com
> > >http://databaseadvisors.com/mailman/listinfo/dba-tech
> > >Website: http://www.databaseadvisors.com
> > >
> > >  
> > >
> > 
> > -- 
> > Marty Connelly
> > Victoria, B.C.
> > Canada
> > 
> > 
> > 
> > _______________________________________________
> > dba-Tech mailing list
> > dba-Tech at databaseadvisors.com
> > http://databaseadvisors.com/mailman/listinfo/dba-tech
> > Website: http://www.databaseadvisors.com
> > 
> 
> 
> _______________________________________________
> dba-Tech mailing list
> dba-Tech at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-tech
> Website: http://www.databaseadvisors.com
> 
> _______________________________________________
> dba-Tech mailing list
> dba-Tech at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-tech
> Website: http://www.databaseadvisors.com
> 





More information about the dba-Tech mailing list