Wednesday, June 22, 2005

No worry about text encoding?

How many people got troubled with text encoding of your document? It may not be an issue for those who can live with a single application. But for localization engineers like me, it can be a serious issue sometime.

When you open up an application bundle, you see variety of files that are encoded in different ways. Those are UTF-8, UTF-16, and traditional Mac encodings. Even so-called Unicode files may vary by the endian (byte-order) and the presence of BOM.

Good news is that you do not need to worry about that with AppleTrans, well most of the time. The application quickly scans the file before opening it, and determines the encoding, byte-order, and line-endings.

The most difficult (and sometimes annoying) case, however, is that the file appears to be ASCII text, but actually is UTF-8. That case, you have to tell the application which encoding it is.

No wonder why many engineers love to use UTF-8. That is simply because they believe they can handle vast of Unicode characters just like traditional ASCII text. Such engineers always get into serious trouble when people start to feed non-Roman text in UTF-8 format.

3 Comments:

Anonymous Anonymous said...

AppleTrans doesn't seem to have any problems reading various UTF files, but I sure wish it would write a BOM on the TMX exports it does. BBedit, and probably other text editors, can't tell it's unicode, so it comes across as gibberish if you double click it to open it. You can go to BBedit File>Open, then manually tell it it's UTF-8, No BOM, then set it to UTF-8 (with BOM), but it sure would be nice if it just wrote a unicode BOM like a nice well behaved app. :)

12:20 AM  
Blogger hiruneko said...

There is a hidden option that tells AppleTrans to set or remove the BOM in UTF-8 file. Quit the application, and in the Terminal, type the following command:

% defaults write com.apple.AppleTrans PrefsRemoveUTF8BOMForNonUnicodeSavvyReader -bool NO

The preference name tells the story...

3:06 AM  
Anonymous Anonymous said...

Thanks, that works great. I'm adding it to my notes...

1:44 PM  

Post a Comment

<< Home