Wednesday, April 04, 2007

Wordfast TM+X?

Recently I came across an interesting thread on a yahoo group, discussing alternative ways to convert Wordfast TM to TMX format. Needless to say, Wordfast should do the job better than anything else. Or does it not?

Of course, my preference is to use AppleTrans if you ask me (thanks to Steven for summarizing the alignment tool.) If you're really running out of time, here's a one-liner for you to make it quick in Terminal:

% (iconv -f UTF-16 -t UTF-8 | tr '\r' '\n' | sed "s/&/\&amp;/g;s/&amp;'/\&#x/g" | awk -Ft 'BEGIN{print"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<tmx version=\"1.4\">\n<header creationtool=\"unknown\" creationtoolversion=\"1\" datatype=\"unknown\" segtype=\"sentence\" adminlang=\"en\" srclang=\"en\" o-tmf=\"unknown\">\n</header>\n<body>"}{if(1<NR)print"<tu>\n<tuv xml:lang=\""$4"\"><seg>"$5"</seg></tuv>\n<tuv xml:lang=\""$6"\"><seg>"$7"</seg></tuv>\n</tu>"}END{print "</body>\n</tmx>"}') < YOURSOURCE.TXT > YOURTARGET.TMX

Actually if you have a little knowledge about Unix basic commands plus TMX specification, this is not really hard to come out. There's no need to hire the power of perl or other fancy scripting languages.

This one-liner doesn't preserve the attributes such as user id or creation date. That's not a big deal for AppleTrans user. I will leave those missing pieces for you to challenge.

For those who want it sugarcoated, I will post a droplet to the SIG, maybe on a good sunny weekend. Stay tuned.


Blogger hiruneko said...

There's an AppleScript droplet called WFTM2TMX posted to the SIG. It's worth looking just for some AppleScript tips.

11:11 PM  

Post a Comment

<< Home