Reading time: 5 min

Machine translation (MT), a novel technology with questionable results less than a decade ago, has expanded dramatically in terms of usability and popularity in recent years. This has inevitably led to a shift in the translation industry, as many language service providers have gradually adopted MT into their workflow and their service roster.

While it is true that most translations benefit from proofreading or revision, machine translations should always be thoroughly edited to ensure accuracy and coherence if they are to be used publicly or in an official capacity. This practice is called post-editing and differs in several key ways from other types of text editing or proofreading. The way a person or a machine approaches translation is fundamentally different, resulting in distinct errors and quirks that a good editor knows how to recognise and mend. Let us have a look at some of the mistakes human translators rarely make, but an MT likely will.

Literal translation

You may not be aware how often we express ourselves in idiomatic language. While we are for the most part perfectly able to recognise a string of words as an idiom or figure of speech, a machine may not be able to connect the dots, resulting in a literal translation. If, for example, your text includes the Slovene saying meaning to die:Šel je rakom žvižgat.”, an MT may translate it literally as something like “He went to whistle to the crabs.” and not figuratively “He kicked the bucket.”, which would be an English idiomatic equivalent.

While wrong translations of idioms are usually glaringly obvious, word-for-word translations may also be more subtle. Oddly sounding word order or sentence structure, grammatically somewhat correct but too literally translated from one language into another, lower the quality of the machine translated text and demand a stylistic touch-up from a good post-editor.

Mistranslation

If you open any dictionary, you will notice how there are several entries under most common words. The English word set has a whopping 430 definitions in the 1989 edition of the Oxford English Dictionary[1]. It is thus not hard to imagine how an MT may confidently opt for the wrong homonym without considering the much needed context. You have surely encountered such errors before, which are (sadly) quite common in translations of manuals or user interfaces.

The latter are especially prone to mistranslation due to short phrases with little context.

Take command expressions such as:

  • save (a file) – reši (rescue) instead of shrani,
  • (go) back hrbet (body part) instead of nazaj,
  • run (a program) – teci (move quickly) instead of zaženi,
  • close (the window) – blizu (near) instead of zapri,

Technically grammatical

Renowned linguist Noam Chomsky gave a famous example of a sentence that is perfectly grammatical, but entirely nonsensical: Colorless green ideas sleep furiously. If you carefully read an unedited MT, you will doubtless notice that it has a tendency to craft elaborate sentences that at first glance seem fine, but have little to no substance to them. An eagle-eyed post-editor must recognise and rework such a sentence to make it make sense.

Agreement and negation

Context awareness still presents a big challenge to machine translation engines. In strings of sentences or even one long sentence, where the grammatical gender of the subject is expressed at the beginning and may not be overtly mentioned throughout the paragraph, MT may use pronouns inconsistently. Similarly, negation in some languages may not be explicitly expressed, leading to the opposite meaning in translation. This is an issue especially when translating from a language with few inflections (like English) to a highly inflected language (like Slovene), where most words in a sentence must be in agreement with the subject’s grammatical gender and number. With no context, it is unclear who the pronoun they in “They went to the beach.” is referring to – two men, two women, more than 2 people, a singular non-binary or unknown person…?

Inconsistency

Ideally, all texts should be clear and consistent in terms of terminology and tone of voice. A machine translation – again, due to its struggles with context awareness – may fail to use the same register and terminology throughout the text. However, since human writers are not perfect either, it is possible that the source text for translation will already be a tad inconsistent, exacerbating the problem. As terminological consistency is non-negotiable in highly specialised texts, such as legal contracts or clinical trial documentation, a thorough post-edit is crucial before the document can see the light of day.

Register and cultural nuances

The register or tone of voice of a text must take the target reader into account – is it intended for patients, healthcare professionals, the general public, or a specific age or cultural group? A good MT may be able to emulate the tone of voice of the source document, but may struggle to keep it consistent or adapt it to the target culture, rendering it potentially insensitive. It is generally advisable that any proofreader, reviewer or – indeed – a post-editor is a native speaker of the target language, best suited to localise the text.

Proper names or titles

Mistranslations of proper names, especially names that are also common nouns, can have hilarious results (unless you are the person on the receiving end). Perhaps you remember the renovation of the Municipality of Maribor’s website, which made the news a few years ago with its machine translations. The Deputy Mayor, Dr Samo Peter Medved, became Dr Only Peter the Bear in English[2]. While this mishap entertained the Slovene public for the better part of a week, it is nonetheless a testament to the importance of post-editing.

If used incorrectly, any tool can be just as dangerous as it is handy – machine translation is no different. While its role in the modern world is becoming undeniable, it is important to be aware that it is not without its quirks and drawbacks and should always be post-edited by an experienced and qualified linguist. As luck would have it, you are reading the blog of a company that can help you decide on the solution tailored to your needs!

This post is also available in: Slovenščina