Monday, August 15, 2005

Making a filter - Part 1

If you are to translate files in a complicated format, it is sometimes easier to write a filter plugin than processing the files by going through complicated steps.

Basically you make a filter to simplify the data structure so that the translation parts can be easily parsed by AppleTrans. If you like, you can let it do more complicated tasks.

The version 1.1 of AppleTrans has 2 filters pre-installed — one for the plist files that I wrote about in an earlier post, and the other for (AppleGlot's) WG files.

The WG filter actually does not alter the file format, but it embeds the hyperlinks that connect each translation entry to a certain item in the original nib file.

The filter program is made as a loadable bundle with a file name that represents the target file type (i.e. the extension of the target files) with the extension .filter.

The filters should be copied within PlugIns directory of the application bundle. AppleTrans will load all the filters found there and create an instance for each one.

Okay, let's stop here today. In Part 2, we will see the AppleTrans filter API.

11 Comments:

Anonymous Anonymous said...

You have a great blog here! I'm definitely going to bookmark you!
I have a Wedding Photos NorthamptonUK site/blog. It pretty much covers Wedding Photos NorthamptonUK related stuff.
Come and check it out if you get time :-)

4:54 PM  
Anonymous Anonymous said...

Hi Hirunenko!

I was an early adopter of OS X, so stopped using Alair along time ago. AppleTrans was recentlæy mentioned on MacLingua (MacLingua@yahoogroups.com) a discussion list for translators who work on MACs.

I was delighted to find that AppleTrans is an update of Alair and i will start to use it as my main tranlsation tool again as soon as I can work out how to import all my Wordfast corpora and glossaries and get all the functions I need to work in AppleTrans.

I am finding the documentation rather confusing, however, and cannot remember how I made everything work years ago in Alair, so progress is slow. The Getting Started documentation and manual does not really help me with my questions/problems unfortunately.

If there is no other forum where we can discuss issues related to AppleTrans I was wondering whether you would care to join us on MacLingua and help out occasio9nally wiht answers to questions.

Thanks for a great tool - I am sure i will get it all working very soon.

Le duracch tam

7:31 PM  
Blogger hiruneko said...

Hello Tam, welcome back!

Yes, I must admit that the manual could be better than that. I had to give up to cover all possible scenarios you could follow, since my resource was so limited. Sorry about that.

There is a yahoo group that one of AppleTrans fellows helped me set up for sharing support files. We may use it as a communicator if you like.

I have spent years working with Apple and finally made the app available to public. But it does not necessarily mean that I am fully committed to future development or user support. Please bear in mind this complexity.

Now, it looks like you have a trouble reading Wordfast TM, is that right? What did you get as a result when you attempt to open the file?

12:27 AM  
Anonymous Anonymous said...

I fully appreciate the limitations on time, etc. It also amazes me that Apple don't see the commercial potential in your work.

We could also post enquiries here on the blog if you prefer but that involves potential users making the effort to get here.

I will post 3 very quick questions at the end of this note if you don't mind.

Today, I did manage to import a WordFast.tmx file. The problem seems to have been my patience - not your prpogramming. Even this relatively small corpus took almost an hour to import. I will set the machine to import my huge TM over the weekend.

Another piece of progress today was to import my 65,000 Scandinavian glossary. I can Alt-Click on words and the appropriate seldction comes up. BTW - the response times are fantastic!

My questions are

- 1) is there a keyboard shortcut for ALT-Clicking on the selected word?

2) Is there a shortcut for adding terms to the glossary as you go along? (At the moment I am copying and pasting into the glossary file, entering the translation and then clicking on record. If there is a simpler way it would be nice to know.

3) When you enter a segment, is there any way of getting AppleTrans to 'auto-assemble' a translation from the glossary? - i.e. replacing the source language with the target language whenever a word is recognised from the glossary. I am aware I can do this word at a time but would like to avoid too much mouse-clicking if possible.

Keep up the good work whenever you have the time and relax the rest of the time. Greetings from Edinburgh - Tam

2:48 AM  
Anonymous Anonymous said...

Tam,

I am working on a Community Wiki for AppleTrans, which I hope to get up soon. It will be another place for users to share tips and hints, and will contain some answers that I found when starting to use AppleTrans. Maybe I'll get up what I have so far, tonight.

Hiruneko may not have time to help us answer all our questions, so the more the community can do here, the more time Hiruneko will have to spend working on improving AppleTrans (ie, the important stuff :-)

4:33 AM  
Blogger hiruneko said...

One hour for importing a TMX? Wow, you must have got some precious memory, Tam. Well, it took such long time for mostly indexing, which would give you a faster retrieval of your memory later on. Letting your machine spin while you rest is a good idea. Also better to spare some physical memory (freeing other apps) and secure disk space, as it is such a greedy process for computer resources.

Another strategy about managing memory, if I may suggest, is split your memory into smaller files by project or category if at all possible. Using the corpus sharing, you can still browse all open corpora through a single proxy corpus. This is also good when you run a batch translation in a project, in that you can prioritize the reference corpora by sorting them in order.

Now, here are some answers to your questions.

> keyboard shortcut for the glossary lookup

No shortcut for that operation. Sorry.

> keyboard shortcut for recording a term

No shortcut for this one either. By reserving the terms in your documents (and leave them untranslated), you can generate a list of all reserved terms afterward using a project. You can then translate the list and convert it to a corpus for use with the glossary tool.

> 'auto-assemble' a translation from the glossary

If you meant word-to-word pre-translation within a segment, the answer is "no" unfortunately. It sure is an opportunity for another corpus plugin. Instead, AppleTrans offers post-translation using the glossary tool. You just reserve and leave the terms in your translation. Once you complete the whole document, let the glossary tool do the post-translation (resolve) for all reserved terms.

Actually, you can use the glossary tool in a corpus view during translation. When you load a new segment into your working corpus, reserve (command + control + s) all terms in the target text, then deselect current selection and invoke "resolve all" (command + control + t). You do not need to open the glossary tool window. That would make it any better?

7:40 PM  
Anonymous Anonymous said...

I have given up on importing the other TMX files. All of them - even the smallest ones of only a couple of hundred entries made AppleTrans "unexpectedly quit". It is a bit of a shame having to start all over again with new ones but the speed gains involved in using AppleTrans over Wordfast are already making up for that :-)

Pity about the keyboard shortcuts but thanks for the explanations.

You were saying:
"If you meant word-to-word pre-translation within a segment, the answer is "no" unfortunately. It sure is an opportunity for another corpus plugin."

I'm not sure what a copus 'plug-in' is but such a thing would speed up my work massively :-)

"Instead, AppleTrans offers post-translation using the glossary tool. You just reserve and leave the terms in your translation. Once you complete the whole document, let the glossary tool do the post-translation (resolve) for all reserved terms."

Excellent - it is just a different way of thinking about the same task. Since it is easy in AppleTrans to go back and edit/make changes, etc., this should be a relatively easy a way to work - albeit with a lot of keystrokes instead of automation.

"Actually, you can use the glossary tool in a corpus view during translation. When you load a new segment into your working corpus, reserve (command + control + s) all terms in the target text, then deselect current selection and invoke "resolve all" (command + control + t). You do not need to open the glossary tool window. That would make it any better?"

Can you get this to work for you? The reserve all terms works for me but not the resolve all terms. If I only reserve certain terms in the segment then the resolve all command works but only from the pull-down menu.

Thanks you for all your help. I am already well pleased with progress. I had my first 'repetitive' text of the week today (the first since I started using AppleTrans). I reckon that AT did the original job in about half the time of WordFast and the repetitive text was generated by AT in seconds but would have taken minutes in WordFast. As I have a major project coming up with a lot of internal revision I have more or less decided already to shift to AT more or less permanently.

Le durachd Tam

1:58 AM  
Anonymous Anonymous said...

Tam,

Check the wordfast tmx, looking for the language codes. If the lang codes look like this: "<tuv lang="EN-US">", then that may be what's causing AppleTrans to hang up on import. Use BBedit or something and do a search/replace, so that they read "<tuv xml:lang="EN-US">", and switch the line at the top of the file that says "<tmx version="1.1">", to read <tmx version="1.4">.

I had trouble, did that little hack, and everything imported pretty quickly. Wordfast and Trados 5.5 don't actually write valid TMX code, that's part of the problem.

(What's important is not the language code, but the format of the tag)

5:12 AM  
Anonymous Anonymous said...

Thanks Yak - that did the trick. I look forward to your discussion and tips forum.

My next experiment will be to try and work in "Projects". So far, every time I have tried to launch one, the beachball of doom has spun around on my screen fro several minutes before I have had to give up.

3:29 PM  
Blogger hiruneko said...

Thanks, Yak! You hit right on the mark!

Well, I should have been more careful about the compatibility with the older version of TMX. The version 1.4 of TMX specification was issued only 3 years ago, so there must be lots of legacy TM left. If you do not hesitate opening up the application package, here is an alternative workaround:

In the AppleTrans application bundle, you can find Encodings.nib among other resources. Opening this nib file with Interface Builder, you can find the definition of TMX parser written in regular expression. And in the definition, you can locate the following line:

variant="<tuv".*"xml:lang".*["']{language}["'].*">".*{segment}.*"</tuv>"

Change this particular line as shown below (around the part of "xml:lang"). This change will allow AppleTrans reading TMX with the old fashioned language attribute.

variant="<tuv".*("xml:lang"|"lang").*["']{language}["'].*">".*{segment}.*"</tuv>"

The patching work must be fairly easy for people who are familiar with Interface Builder. In case you have not installed Xcode, don't bother yourself doing that. You can get a patched sample in the repository. You just copy this file in English.lproj directory.

> the beachball of doom

Please refer to our earlier discussion for "batch enabler".

6:00 PM  
Anonymous Anonymous said...

Great - I will check ut the batch enabler thread as soon as possible.

FYI - colleagues on MacLingua and on TransMUG have e-mailed me to say they are following my postings about progress with AppleTrans with great interest. Hopefully, some of them will be downloading soon :-)

3:37 AM  

Post a Comment

<< Home