James Barash
James at fcidms.com
Fri Mar 18 13:42:44 CST 2005
Jim: If you only need to do this once, there is a Select Text tool in Acrobat Reader 6.0 that will allow you to highlight and copy text, including columns within a document, and then paste it into Excel, Word or any other windows program. It's a manual process but it does work. Clearly if you need to do this on an ongoing basis, it's not the best solution, but it is free. James Barash -----Original Message----- From: dba-tech-bounces at databaseadvisors.com [mailto:dba-tech-bounces at databaseadvisors.com] On Behalf Of jmoss111 at bellsouth.net Sent: Friday, March 18, 2005 1:01 PM To: Discussion of Hardware and Software issues Subject: Re: Re: [dba-Tech] PDF data extractor recommendation Thanks Marty. I want to extract columnar text from pdf's created by the invoicing module in QuickBooks Pro 2004. I would imagine that the pdf's are probably somewhere between v3 and v6, whichever engine Intuit uses to send invoices by email. I'm only talking about twenty 25 page pdf files. I will look into the ScanSoft product. Once again, Thanks. Jim > > From: MartyConnelly <martyconnelly at shaw.ca> > Date: 2005/03/18 Fri PM 12:42:49 EST > To: Discussion of Hardware and Software issues <dba-tech at databaseadvisors.com> > Subject: Re: [dba-Tech] PDF data extractor recommendation > > Do you mean text extraction or image extraction, and then it depends if > text has been rendered by OCR software. > I believe Acrobat 7 holds a internal xml file of searchable text if the > underlying image has been OCRed. > You could use this PDF Converter Professional from ScanSoft $99 > Converts PDF to Word or text files(I think not sure there was a sendto > command) or visversa > http://www.scansoft.com/pdfconverter/professional/ > > If you just want to index and search non OCR'ed PDF files on a disk ie > a fax scanned into a PDF you could use this new Beta > The ScanSoft OmniPage Search Indexer enables you to search text found in > image files using world-leading optical character recognition (OCR) > technology. For example, it enables you to search the text found in > electronic fax documents that you may receive via email, as well as > other image formats including PDF, TIF, JPG, BMP, and Paperport. > http://desktop.google.com/plugins/omnipagesearch.html > > It requires installation of this 3 month beta from ScanSoft plus Google > Destop Search Engine > This requires a fair amount of disk space and memory to do the indexing > if you plan to index a couple of thousand image PDF's > plan on running it overnight. > > jmoss111 at bellsouth.net wrote: > > > Can anyone recommend a reasonably priced, or free pdf extraction tool? > > > >Thanks, > > > >Jim > > > > > >_______________________________________________ > >dba-Tech mailing list > >dba-Tech at databaseadvisors.com > >http://databaseadvisors.com/mailman/listinfo/dba-tech > >Website: http://www.databaseadvisors.com > > > > > > > > -- > Marty Connelly > Victoria, B.C. > Canada > > > > _______________________________________________ > dba-Tech mailing list > dba-Tech at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-tech > Website: http://www.databaseadvisors.com > _______________________________________________ dba-Tech mailing list dba-Tech at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-tech Website: http://www.databaseadvisors.com