Saturday, July 12, 2008

A little dose of Olifant can save your life

Well, maybe not your life, but certainly your life’s work.

For technical translators, there are few resources as precious as your (or, more likely, your team’s) translation memory. It represents all the work you’ve done since you started using translation memories, which in the case of translation teams can easily represent hundreds or even thousands of hours of collective labor, even over a relatively short period of time.

The makers of computer-assisted translation (CAT) tools often count on this investment of time and effort to keep their customers locked into whichever translation memory format their application uses, a strategy that is successful more often than not because of the difficulty involved in converting translation memory files saved in one format to another format. Although there is a standard format for translation memories, known as Translation Memory Exchange (TMX), the reality of the market is that each vendor’s support for this format is often less wholehearted than their users would probably like. After all, what incentive do these businesses have to make sure that their customers can easily transfer their existing translation memory files to another company’s translation tools? There are certainly translation tools that use standard TMX files as their default memory format (OmegaT is one example), but most tend to default to their own proprietary formats, for a variety of reasons.

Most translation tools will accept TMX files as input, but most of these are quick to convert them to their own proprietary formats. Sometimes the reasons for this are easy to appreciate. Wordfast, for example, has a long tradition of using tab-delimited text files as its memory format, which has the advantage of being easy to work with (although it does suffer quite a bit when it comes to TM stability). Likewise, Felix defaults to its own TM format, which is also XML based, but is a bit more streamlined than TMX and probably a lot more comprehensible to everyday users. In other cases, the rationale for using a proprietary TM format is often somewhat less clear, but vendor lock-in is certainly one reason. In any case, most users of more than one translation memory tool can readily attest that, when you need to transfer your memory files from one tool’s format to the format preferred by another tool, the process is not always a painless one.

Enter Olifant, one component of the Okapi Framework, a suite of open source translation support tools, which exists solely to manage translation memories stored in various formats and to perform conversions between them. The Okapi project aims to promote open standards where they exist and to offer its own open standards where none currently exist. As part of that project, Olifant serves as a general purpose TM-management tool, enabling users to convert TMs from one format to another, merge TMs together, edit their contents, filter entries based on SQL queries (a very useful function), flag duplicate entries, perform complex search-and-replace operations based on regular expressions (a particularly useful technique that has saved me countless hours on many occasions), search for and eliminate characters that are considered invalid in other formats, and a remarkable number of similar functions designed to make the complex task of managing multiple TMs in varying formats less daunting.

Just recently, I was faced with the task of converting a TM of some 80,000 translation units in Wordfast’s relatively forgiving tab-delimited text format into Déjà Vu X’s much more stringent relational database format. After more hours than I care to count spent vainly trying and failing to make a direct conversion between those two formats, I called in Olifant to mediate the conversion, which it did flawlessly and gracefully, reducing a task that I had been struggling with for quite some time to something that could be accomplished in about an hour.

In other cases, Olifant has helped me satisfy the requests of clients who have contacted me after a job is done to ask if I can provide the translation memory along with the finished translation. In some cases, the format they request is different from the one produced by the tool I used for that particular job, but Olifant makes it easy to provide the memory in the format they want. In other cases, the translation units for the job in question are mixed in with those for other jobs from that same client. Cases like this are often a bit trickier, but clever use of Olifant can help fish the relevant translations out of the many other ones in the same memory.

If your workflow is irrevocably wedded to a specific tool, you may never have any use for the kind of functionality that Olifant offers, but the reality of the translation market is that no single tool offers a complete solution to all the possible problems that translators face (not yet, anyway), so the need to move fluidly from one application to another is a common one. For translators who need this flexibility, Olifant is quickly evolving into a indispensable resource that a technical translator really cannot afford not to have in his or her toolkit.

Posted by Sako in • Technology
(0) Comments | Permalink
Next entry: Get a sneak peek of Wordfast 6.0 Previous entry: Take Your Tools With You With PortableApps

Post a comment

Name:

Email:

Location:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below: