Thursday, August 25, 2005

Making a filter - Part 3

Our goal in this series of posts is to make a filter which extracts text from PDF file. This is going to be far simpler than you might have thought. In fact, we are using PDFDocument class in Quartz framework to read in PDF content.

PDFDocument is a newly introduced class in Tiger. This class sugar-wraps tons of code that would be necessary to render PDF file. For our PDF filter, we only need the following 2 methods of the class:

- (id)initWithData:(NSData *)data
- (NSString *)string

For detailed information about PDFDocument class, go visit the ADC Reference Library. Now, using the above methods, our unarchiveFilter:context: method will be something like this:

- (NSMutableData *)unarchiveFilter:(NSMutableData *)data context:(NSDictionary **)context
    PDFDocument *document = [[[PDFDocument alloc] initWithData:data] autorelease];
    return [[[[document string] dataUsingEncoding:NSUnicodeStringEncoding] mutableCopy] autorelease];

On the other hand, we will not do anything in archiveFilter:context: method just because it does not make sense writing the content to PDF without layout information. So this method should look like:

- (NSMutableData *)archiveFilter:(NSMutableData *)data context:(NSDictionary *)context
    return data;

The other two methods will also be defined in the same way. If you want a challenge, you can help yourself arrange them to bring some fancy format for the content.

Part 4 is going to be the last of this series. We will see how this simple filter works with AppleTrans.


Blogger Jamie said...


It is wonderful that all these filters exist, but I'm a translator, not a programmer. I need filters for Word and especially PDF, but I can't make sense of what you're telling us to do with this code. Do I just dump a certain portion of it into a text file, suffix it ".filter" and load it? Couldn't someone just post these filters somewhere as downloadable files for translators who are not programmers?

Thanks for any help or explanation you can give me.

8:53 AM  
Blogger hiruneko said...

Hi Jamie,

No, the filter things should be done by Xcode, so you need to have some programming knowledge. I will post an Xcode project template, if there is someone who is really like to challenge making one.

As I mentioned in a post titled "Compatibility with Word documents", it is not easy task to make a filter that fully supports Word document. At this moment, saving Word files in RTF format is the best way.

As for PDF filter, you can find a copy in AppleTrans SIG's Files section. Although it only means to extract text out of PDF file.

Hope it helps.

11:51 AM  
Anonymous Anonymous said...

Thank you for your answer.

The link in your last post ( gives me a page that says I am not a member of the appletrans_sig group. When I search Yahoo Groups for "appletrans_sig" in order to join it, it tells me there is no such group. I don't know what I'm doing wrong.


11:27 PM  
Blogger hiruneko said...

Ah, I guess the SIG is not listed in the yahoo group. Try open the top page, and see "Join" button there.

1:58 AM  
Anonymous Anonymous said...

Thanks, Hiruneko, that worked.

10:38 PM  

Post a Comment

<< Home