Saturday, July 30, 2005

Leave the segments alone

When you finish translating a document, you might cleanup all the segments before you save it for delivery? Remember that the cleanup is needed only when you want to save it as an RTF. Other words, you can leave the segments when you save it as a plain text.

Most people keep the AppleTrans native documents, but mainly for backup or snapshot of the work. There are several more reasons that you want to keep the segmented documents:

• Make a fresh copy of the corpus from the documents

• Use the documents as reference corpora in a project

• Make correction in the translation and refresh the corpus

Keep in mind that the segmented documents are convertible to corpus, which means that you do not necessarily spend extra time to keep your working corpus up-to-date.

Sunday, July 24, 2005

Creating Apple glossary

If you are about to translate a material which is related to Macintosh product, you may need to get Apple glossary so you will not go out of tune. You can find almost all UI data from Tiger OS for several languages here.

When you open the downloaded disk image for your target language, you will find hundreds of so-called AD files in there. Those files are generated by AppleGlot, and basically they are intended to be used by that tool.

Now, how you can make them accessible in your favorite TM program? The easiest way is to use AppleTrans and compile them into a corpus. You can export the corpus to TMX later if you like. Here are the steps:

1. Create a new project

2. Add all AD files to the reference view
You Just drag the volume icon of the disk image.

3. Choose Join Corpora from the Project/Utilities menu
When it's finished, you get a new untitled corpus open in the screen.

4. Save the corpus, or export it directly to TMX
When you save it as a corpus, you'd better set the mismatching attributes value in the corpus options to zero.

There are a couple things you must bear in mind. First, you cannot stop conversion in the middle. And second, you need to watch the progress, because the application would ask you to specify the file encoding time to time (you got AD files in UTF-8 but missing BOM.)

Sunday, July 17, 2005

One click to sneak the style

What is Quick Capture? It is AppleTrans way of stylesheet, so to say. It is not fancy at all like that you find in those high-end word processors, but it lets you style the text in a very quick action.

As the name suggests, the basic idea is to copy the style attributes from a text, and apply them to the current selection. Quick Capture works in document and corpus views (except accessory views). You can capture styles from a different view.

If you have a pre-formatted text sample, just open it next to the working document and use it as a stylesheet. The operation is quite easy. Just select the text you want to style, and click the text with styles you want to capture while holding command key.

Saturday, July 16, 2005

Tiger kills?

Wondering how many apps get sacrificed for those 200+ new features. AppleTrans too was suffering from unexpected changes made deep in Mac OS X frameworks.

The side-effects would appear in a variety of ways. In the public version 1.1, AppleTrans still had issues to address in its Quick Capture feature, background batch processing, and matching long tokens indexed in Panther.

Among them, Quick Capture must be a huge loss to the version 1.1 users. That is my favorite tool actually. Those who runs the version 1.1 on Tiger, you can try this workaround to bring the feature back:

1. In the Terminal, type the following command

% defaults write com.apple.AppleTrans NSProhibitMultipleTextSelectionByMouse -bool YES

This command is required only once for a new user.

2. Launch AppleTrans, then choose About AppleTrans from File menu

You need to do this for each session. You can close the about box right away.

That's it. This should do the magic.

Tuesday, July 12, 2005

Movie in a corpus

When I say "you can put movies in your corpus", most people never believe it. Maybe they do not see why I should do that in the first place. Well, a dictionary with lots of drawings appears more attractive than the one without, at least for me.

Again, AppleTrans corpus is not just a translation memory. You can arrange it for your needs other than translation. Here I got a sample to show you how it works. The picture below is "Movie Finder" corpus, a catalog of movie trailers in my archive.

It is more like a free form database. This snapshot shows the corpus pulling out a record by a couple of actors name. Note that the movie shown in the record is an alias to the actual file on a server. This is cool, don't you think?

Movie Finder snapshot

Thursday, July 07, 2005

Making multilingual corpus

There is a couple of ways to make a multilingual corpus with AppleTrans. One way is to align multiple documents translated into languages. The other way is simply to make a tab-delimited glossary table using a spreadsheet program, and import it to a corpus. The latter is somewhat limited in that you can only enter plain text.

Aligning segments in multiple documents is not as hard as you might think. AppleTrans allows you to split or merge segments as you like. You can even reorder segments if necessary (the difference in the logic does that in certain languages.)

You can also append new language text to existing records in the corpus by switching the target language. In this case, you have to find the record first by the source language text before entering new translation you want to attach to it.

Monday, July 04, 2005

Porting segment rule

If you want to bring a segment rule that you wrote from one machine to another, you do not need to copy the definition text from the Segment view. Instead, you can copy the rule using a project file as a carrier.

Create a new project, select the segment rule you want to port, and then save the project with a name. When you open this file on another user's system, the rule definition is copied to AppleTrans preferences of that user.

What if the target user happens to have a rule with the exact same name? In that case, AppleTrans temporarily override the existing rule by the new one during the session (unless you do not open the Segment view, that would make it persistent.)

Sunday, July 03, 2005

Code your dialect

AppleTrans internally uses the language specifiers defined by ISO 639, so-called two-letter language codes, to access a specific translation in a corpus record. Interestingly those codes never appear in the user interface. The language selection in the Preferences presents a list of language names instead.

The application converts your selection of language to two-letter code transparently. You may check that out by saving a corpus as a TMX file. Now what about four-letter language codes? You often find them in TMX files generated by other programs.

You might need to explicitly specify the language code after you import such TMX file into corpus. If you want to get the translation in Canadian French tagged "FR-CA" off the memory, just type "FR-CA" in the language option field. It works too when you export a corpus.

Using this technique(?), you can come out a language code such as "JA-JP-OS" meaning "Japanese, Japan, Osaka dialect", which is apparently different from Tokyo's. Now you can make a memory to do Tokyo-Osaka dialect conversion.

By the way, does TMX need to be so strict distinguishing the languages by regions?

Saturday, July 02, 2005

You sure make round trip safe?

There are still many people out there using Carbon based application on Mac OS X for working with Unicode text. When it comes to localization business, it is critical to use Unicode based editor because the most materials you get come from Cocoa application.

When you open a Unicode file with a Carbon application, it converts the text into classic encodings, utilizing the system services that suggest which script system a particular string belongs to, and which font best suits for it. Let's assume it works okay for you now.

The problem is that there are unknown number of characters in Unicode (depending on the language you deal with) that cannot make a round-trip conversion back from Carbon. It might be very difficult for you to tell what goes wrong when you get "looks the same but doesn't match" letters.

Or maybe you would say there is no such trip without trouble...