Thursday, August 25, 2005

Making a filter - Part 3

Our goal in this series of posts is to make a filter which extracts text from PDF file. This is going to be far simpler than you might have thought. In fact, we are using PDFDocument class in Quartz framework to read in PDF content.

PDFDocument is a newly introduced class in Tiger. This class sugar-wraps tons of code that would be necessary to render PDF file. For our PDF filter, we only need the following 2 methods of the class:

- (id)initWithData:(NSData *)data
- (NSString *)string

For detailed information about PDFDocument class, go visit the ADC Reference Library. Now, using the above methods, our unarchiveFilter:context: method will be something like this:

- (NSMutableData *)unarchiveFilter:(NSMutableData *)data context:(NSDictionary **)context
{
    PDFDocument *document = [[[PDFDocument alloc] initWithData:data] autorelease];
    return [[[[document string] dataUsingEncoding:NSUnicodeStringEncoding] mutableCopy] autorelease];
}


On the other hand, we will not do anything in archiveFilter:context: method just because it does not make sense writing the content to PDF without layout information. So this method should look like:

- (NSMutableData *)archiveFilter:(NSMutableData *)data context:(NSDictionary *)context
{
    return data;
}


The other two methods will also be defined in the same way. If you want a challenge, you can help yourself arrange them to bring some fancy format for the content.

Part 4 is going to be the last of this series. We will see how this simple filter works with AppleTrans.

4 Comments:

Blogger hiruneko said...

Hi Jamie,

No, the filter things should be done by Xcode, so you need to have some programming knowledge. I will post an Xcode project template, if there is someone who is really like to challenge making one.

As I mentioned in a post titled "Compatibility with Word documents", it is not easy task to make a filter that fully supports Word document. At this moment, saving Word files in RTF format is the best way.

As for PDF filter, you can find a copy in AppleTrans SIG's Files section. Although it only means to extract text out of PDF file.

Hope it helps.

11:51 AM  
Anonymous Anonymous said...

Thank you for your answer.

The link in your last post (http://groups.yahoo.com/group/appletrans_sig/files/) gives me a page that says I am not a member of the appletrans_sig group. When I search Yahoo Groups for "appletrans_sig" in order to join it, it tells me there is no such group. I don't know what I'm doing wrong.

Thanks.

11:27 PM  
Blogger hiruneko said...

Ah, I guess the SIG is not listed in the yahoo group. Try open the top page http://groups.yahoo.com/group/appletrans_sig, and see "Join" button there.

1:58 AM  
Anonymous Anonymous said...

Thanks, Hiruneko, that worked.

10:38 PM  

Post a Comment

<< Home