Wednesday, June 29, 2005

Not just a memory

AppleTrans' corpus offers a little more than what you would expect from a translation memory. You may store graphics and movies along with text information in a number of languages in a single record.

You can make your recipe book with a corpus, for example. Type the ingredients you store in the fridge, and see (fuzzy match) what dishes you can make with them. It would be nice if you have pictures of those dishes, don't you think?

You can also extend the corpus features by installing a piece of software called "corpus accessory" plugin. Unfortunately, there is no reference available yet showing how to create one by yourself.

There is an Ester Egg in the corpus. If you like to view multiple accessories at a time (that I really wanted to), try clicking the accessory icons while holding Option key. You can build the accessory views in the corpus window.

Saturday, June 25, 2005

Segment different

AppleTrans has a good tool that helps you segment the document content in many different ways. There are some segmentation rules built in the application, such as for dividing the content by sentence or paragraph.

There are also found special rules to be used for translating program resources extracted from Mac OS X application bundle. When you look at the rule named "AppleGlot Localization", which is actually made for WG files generated by AppleGlot, you can see how easy it would be to customize the rule by yourself.

Those rules are written by standard regular expression. It should not like that you script a filtering tool. I have created a rule for my two-column, tab delimited glossary file, in order to segment only the 2nd column text for translation.

Prefix: .*\t
Segment: [^\n]+
Suffix: (blank)


It is way simple! With the Next or Previous button, you can test how the selection moves in the text. Once you have done the rule, you can segment all at once. AppleTrans allows you to undo "segment" you have done in the editor, so you do not worry about making any mistake.

Wednesday, June 22, 2005

No worry about text encoding?

How many people got troubled with text encoding of your document? It may not be an issue for those who can live with a single application. But for localization engineers like me, it can be a serious issue sometime.

When you open up an application bundle, you see variety of files that are encoded in different ways. Those are UTF-8, UTF-16, and traditional Mac encodings. Even so-called Unicode files may vary by the endian (byte-order) and the presence of BOM.

Good news is that you do not need to worry about that with AppleTrans, well most of the time. The application quickly scans the file before opening it, and determines the encoding, byte-order, and line-endings.

The most difficult (and sometimes annoying) case, however, is that the file appears to be ASCII text, but actually is UTF-8. That case, you have to tell the application which encoding it is.

No wonder why many engineers love to use UTF-8. That is simply because they believe they can handle vast of Unicode characters just like traditional ASCII text. Such engineers always get into serious trouble when people start to feed non-Roman text in UTF-8 format.

Monday, June 20, 2005

Wanna talk?

If you want to get in touch with the blogger, leave your message here.

Thursday, June 16, 2005

Opening message

Opening a new blogger dedicated to AppleTrans, a translation memory application running on Mac OS X.