Tuesday, August 30, 2005

PDF filter in action

We have seen the overview of the AppleTrans filter API, and you must have a good picture of it by now. If you get used to working with Xcode for making loadable bundles, the information given here must be adequate to make a filter by yourself.

To wrap up this session, I would like to show you the PDF filter in action. In the following movie, you will see how to install the plugin, post-edit the PDF content, and do some segmentation. Click the picture to play the movie in a separate view.

click to play the movie

Thursday, August 25, 2005

Making a filter - Part 3

Our goal in this series of posts is to make a filter which extracts text from PDF file. This is going to be far simpler than you might have thought. In fact, we are using PDFDocument class in Quartz framework to read in PDF content.

PDFDocument is a newly introduced class in Tiger. This class sugar-wraps tons of code that would be necessary to render PDF file. For our PDF filter, we only need the following 2 methods of the class:

- (id)initWithData:(NSData *)data
- (NSString *)string

For detailed information about PDFDocument class, go visit the ADC Reference Library. Now, using the above methods, our unarchiveFilter:context: method will be something like this:

- (NSMutableData *)unarchiveFilter:(NSMutableData *)data context:(NSDictionary **)context
    PDFDocument *document = [[[PDFDocument alloc] initWithData:data] autorelease];
    return [[[[document string] dataUsingEncoding:NSUnicodeStringEncoding] mutableCopy] autorelease];

On the other hand, we will not do anything in archiveFilter:context: method just because it does not make sense writing the content to PDF without layout information. So this method should look like:

- (NSMutableData *)archiveFilter:(NSMutableData *)data context:(NSDictionary *)context
    return data;

The other two methods will also be defined in the same way. If you want a challenge, you can help yourself arrange them to bring some fancy format for the content.

Part 4 is going to be the last of this series. We will see how this simple filter works with AppleTrans.

Tuesday, August 23, 2005

Segment ever faster

Have you ever thought why AppleTrans segmentation cannot go any faster? Here is a tip. When you segment the content with the segment tool, click Segment All while holding option key. That should make it different. The drawback of the speed gain is that you cannot undo this segmentation.

Segmentation in AppleTrans is a somewhat slow process, because the segment tool is based on pattern matching with regular expression. This concept, on the other hand, offers more flexibility in parsing the contents. The predefined rules are good samples you can start with to make your own custom rules.

Saturday, August 20, 2005

Making a filter - Part 2

AppleTrans plugins are all written in Objective-C. For a filter plugin, you simply implements the following 4 methods as defined in the protocol:

@protocol FilterPlugin
- (NSMutableData *)unarchiveFilter:(NSMutableData *)data context:(NSDictionary **)context;
- (NSMutableData *)archiveFilter:(NSMutableData *)data context:(NSDictionary *)context;
- (NSMutableAttributedString *)importFilter:(NSMutableAttributedString *)content context:(NSDictionary **)context;
- (NSMutableAttributedString *)exportFilter:(NSMutableAttributedString *)content context:(NSDictionary *)context;

The first two methods are used to perform binary-text conversion. A unarchiveFilter:context: message is sent right after reading a file, and an archiveFilter:context: message is sent right before writing a file. If you do not process the data passed in data, just return it unmodified.

The text encoding conversion is done by AppleTrans. If you want to ensure a safe conversion, use Unicode (NSUnicodeStringEncoding) when you convert binary data to text format.

The next two methods are used to reformat the text content. An importFilter:context: message is sent after unarchiveFilter:context:, and an exportFilter:context: message is sent before archiveFilter:context:. If you do not process the data passed in content, just return it unmodified.

Both unarchiveFilter:context: and importFilter:context: pass a pointer to an NSDictionary object in context, where you can store persistent data that lives with AppleTrans document. At the time you get these messages, context might point to nil. You may then need to create an NSMutableDictionary and set context to points to the new object.

Well, that's pretty much about it. In Part 3, we will design a simple filter which converts PDF to plain text.

Monday, August 15, 2005

Making a filter - Part 1

If you are to translate files in a complicated format, it is sometimes easier to write a filter plugin than processing the files by going through complicated steps.

Basically you make a filter to simplify the data structure so that the translation parts can be easily parsed by AppleTrans. If you like, you can let it do more complicated tasks.

The version 1.1 of AppleTrans has 2 filters pre-installed — one for the plist files that I wrote about in an earlier post, and the other for (AppleGlot's) WG files.

The WG filter actually does not alter the file format, but it embeds the hyperlinks that connect each translation entry to a certain item in the original nib file.

The filter program is made as a loadable bundle with a file name that represents the target file type (i.e. the extension of the target files) with the extension .filter.

The filters should be copied within PlugIns directory of the application bundle. AppleTrans will load all the filters found there and create an instance for each one.

Okay, let's stop here today. In Part 2, we will see the AppleTrans filter API.

Saturday, August 06, 2005

How to edit binary plist?

Have you ever tried to edit application preference (.plist) files? Those files are usually found in Preferences folder in your home directory. If you have a little knowledge about how the XML plist is constructed (and knows what options you like to change), it might be easier to edit the file directly than typing defaults terminal command.

Some of you may have already noticed that you cannot open those plist files by your favorite text editor any more since you upgraded to Tiger. That is because Tiger got a little concern about the disk space and performance, and converted them to binary format.

But, don't worry about it. AppleTrans has a built-in filter for the plist files. When you import a plist file, the filer will convert it to XML format, which you must be familiar with. And when you save the file as "Original Format", it will convert the file back to binary.

In this blog, we will explore the basics of creating a custom filter for AppleTrans sometime soon.