Java Mailing List Archive

http://www.junlu.com/

Home » Post all your questions about iText here »

Re: [iText-questions] Itextsharp extact text

Mark Storer

2010-09-02

Replies: Find Java Web Hosting

Author LoginPost Reply
Ouch.  If you cannot copy and paste the text from Reader successfully, that shows that it is Very Hard or impossible.
 
In your case, it is probably impossible.  The font used is a subset, and Reader's failure to translate the glyph indexes into characters leads me to believe that the subset doesn't contain character mapping information (quite legal, just a royal pain).
 
Your only real recourse in cases like this is OCR (optical character recognition).  Fortunately, such cases aren't all that common.  It's entirely possible however, that you're working with nothing but this type of PDF, so that may be small consolation.
 
I wish you luck.
 
--Mark Storer
  Senior Software Engineer
  Cardiff.com
 
import legalese.Disclaimer;
Disclaimer<Cardiff> DisCard = null;
 
 


From: Paul Durrant [mailto:Paul.Durrant@clarksons.com]
Sent: Thursday, September 02, 2010 9:48 AM
To: 'itext-questions@lists.sourceforge.net'
Subject: [iText-questions] Itextsharp extact text

 

 

I'm trying to use  iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1);

on the attached PDF but I don't get the text back, if I take the byte array and look at the contents then

the text block is not not in ASCII form although all the co-ordinate structure is correct eg anything between the () is not in ASCII form, how is it possible to get the text from this pdf

 

 

 

thanks Paul

 

 

 




This message is private and confidential. If you have received it in error, you are on notice of its status. Please notify us immediately by reply email and then delete this message from your system. Please do not copy it or use it for any purposes, or disclose its contents to any other person: to do so could be a breach of confidence.

Emails may be monitored.

Details of Clarkson group companies and their regulators (where applicable) can be found at this url: Disclosure


No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 09/01/10 23:34:00

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
iText-questions mailing list
iText-questions@(protected)
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
©2008 junlu.com - Jax Systems, LLC, U.S.A.