Not that long ago, machine translation (MT) occupied a place in the popular conscience the same as a 4k video camera on your phone. Flash forward to the COVID era, where Zoom is now a generic name for any video call that we use daily, machine translation)—or the translation of text from one language into a different language using translation software—has finally hit the mainstream. It is now a key function used on many of the global social media platforms and it is integral in the use of artificial intelligence in our daily lives.
How is Machine Translation Already in Your Daily Digital Experience?
One of the most common places to run into machine translation is on social media. Facebook has its own machine translation platform that it uses to offer automatic translation to its users. You may have seen the “See Translation” link at the bottom of posts from friends who use another language. LinkedIn offers a similar text translation option as does Twitter with the “Translate Tweet” function. If you are engaged in Internet research and come across a website in another language, all you need to do is paste the web address into translate.google.com and the whole website will appear—translated into your natural language.
Google has leveraged machine learning to enhance the capabilities of Google Translate, its popular translation service. ML algorithms have significantly improved the quality of translations across numerous language pairs, facilitating more seamless communication across linguistic barriers. The service can instantly translate web pages, documents, and even real-time conversations, making it an invaluable tool for users worldwide.
Similarly, Microsoft's translation service, Microsoft Translator, relies heavily on machine learning. It not only provides text translation but also offers features like multi-language group conversations, essentially breaking down language barriers in collaborative settings.
Netflix's recent collaboration with Virginia Tech on an AI model for simplifying and translating subtitles is another example of machine learning being utilized to enhance user experience. This approach not only improves the accessibility of content for international audiences but also paves the way for more advanced translation systems in the media industry.
Furthermore, companies like Amazon are using machine learning for localization, ensuring their product listings and descriptions are accurately translated and culturally appropriate for various markets.
Advancements in machine learning have also led to the development of language learning apps like Duolingo, which uses ML to personalize learning paths for users, helping them to understand and learn new languages more effectively.
What is Machine Translation?
A machine translation tool is created by computer software that can predict patterns in a target language relative to patterns in the source language. For example, if you run a German newspaper article through a machine translation engine online (such as Google Translate or Microsoft Translator), the translation application will analyze the source text and, based on data that has already been used to train the machine translation engine, it will create a translation by extrapolating on the patterns it already recognizes.
In the earliest versions of machine translation technology, the MT engines used rules to map structures in one language to structures in another. These systems proved to be highly inaccurate and difficult to use, forcing authors to change their writing for the MT engine to have any chance of producing a somewhat readable automated translation.
Early Advancements
By the end of 2010, statistical machine translation engines represented the cutting edge of MT technology. These machine learning systems relied on statistical translation models to generate translations.
By 2015, neural machine translation entered the machine translation space, having spent decades being researched. Neural networks were first proposed in the 1940s, but their use as the basis of machine translation was not formally presented in academic research until 2014. In the intervening years, research both academically and commercially has exploded. Neural machine translation has now become the language industry standard for automatic translation. By 2017, attention mechanisms started being implemented in neural translation models, enhancing the ability of these models to focus on relevant parts of the input sequence while generating the output. This technology was first introduced in the "Transformer" model, a new type of neural network architecture, by Vaswani et al. It improved the accuracy of translations, particularly for longer sentences.
Learn more: What's the Difference Between Localization & Translation?
How Good is Neural Machine Translation, Really?
When Google announced its first publicly available neural MT system in 2016, it did so with some hype. Google claimed that “Human Parity” had been achieved. Regardless of whether this was objectively true, a jaded language translation industry met the claim with a mix of hope and derision. Anyone in the language service field had heard it all before. Previous machine learning systems had been hyped too—only to offer disappointing MT output that led to marginal gains in productivity.
The truth is that statistical machine translation was a critical phase, but for producers of translations, it was fraught with risk. Quite simply, many statistical machine translation engines just could not be trusted. Not until you were well into a large project could you really assess if the engine would make the translation process more productive or not. This, however, was a critical period in the evolution of production models for translation providers—it brought human post-editors to the heart of the process. Without reliable post-editors, a machine translation service just would not work—all content had to be reviewed and revised by a human translator to create content worthy of publication and to provide high-quality data to train the statistical MT engines.
By 2017, attention mechanisms started being implemented in neural translation models, enhancing the ability of these models to focus on relevant parts of the input sequence while generating the output. This technology was first introduced in the "Transformer" model, a new type of neural network architecture, by Vaswani et al. It improved the accuracy of translations, particularly for longer sentences.
In 2019, the development of the 'BERT' model (Bidirectional Encoder Representations from Transformers) by Google further revolutionized the field. By better understanding the context of words in sentences, BERT significantly improved machine understanding of natural language. Its integration in machine translation systems resulted in even more precise and natural translations.
Moving into 2023, the focus has started to shift towards customization and specialization of machine translation systems. More advanced models are being developed, capable of adapting to specific domains or styles of language, thereby producing even higher quality translations tailored to specific use-cases. OpenAI and their innovations with ChatGPT4 and other generation tools are also paving the way, making things easier for people to access correct translations as processing models become more advanced.
At the end of the day, there still needs to be a human at the wheel ready to intervene when the tech fails. The same is true of current-day machine translation. The rule best followed is trust, but verify.
Is Machine Translation Right for Your Company’s Content?
This is the question to be asked. Despite the incredible strides that MT has made in the last five years, there are still specific types of content that the machine may not handle very well. One example is software user interface strings. Such short phrases that stand alone without context within software resource files are difficult for human translators to translate. The MT engine will also struggle. Neural MT engines perform best when processing longer sentences that exist with larger textual context.
One recent example: When using MT to translate the content of a website from English to German, the Google Translate engine translated “About” to “Etwa” (approximately) in German instead of the Web-standard “Über uns” or About us, also commonly used on English websites.
MT engines tend to not perform quite as well on short phrases and sentences. But this can be influenced by training the translation model engines with good data that fit the context well. The advice here is to evaluate the results first and have a qualified linguist or professional translator who knows the software intimately review the results of the machine translation prior to committing to use MT on your product’s software strings.
This approach should be employed regardless of the type of content you may want to run through an MT engine. In the case of software Help content, there’s a good chance results will be better than just for the strings, but be prepared to have post-editors focus a lot on terminology that needs to appear in the interface. The MT engine will choose whichever terminology was present in its original training data—it will have no way of knowing your firm’s specialized terminology.
A good rule of thumb for using MT to create translations for your company is that highly sensitive content that involves human safety or entails significant physical or economic risk should only be translated with MT if there is a robust post-editing process in place, and the post-editors have subject matter expertise.
Marketing content may be translatable using MT, but to avoid embarrassment and ensure that the machine translation output is effective, post-editing by a human translation expert is still a requirement. User-created content or otherwise low-risk content, such as knowledge base articles, can be translated using MT where no post-editing may be needed. But, assessing the results first before committing to publishing low-risk content is still a critical step to ensure that the content will be acceptable and usable.
MT may work for your organization’s content if it…
- Is not directly involved in user safety
- Involves information that poses no physical or economic risk to users
- Could be important to users, but would otherwise go untranslated
- Does not include heavy jargon or technical terminology not available in the public domain
- Does not need to be translated into rare languages that are predominantly aural and may be rarely written
- When user safety or risk of liability may exist, relying on machine translation without employing human post-editors to review and verify the automatic translation could expose your company to liability.
- Low-value content that may be of interest to users and customers may have typically gone untranslated in the past due to the cost, but now with machine translation, this content can be made available to that audience with minimal cost and effort.
- If your content is highly specialized and uses arcane terminology, then the likelihood of a machine translation generating a reliable translation is far less. The reason is that the engine will have limited data related to your company’s area of specialization and will not have the information it needs to produce an accurate translation.
Are Machine Translation Systems Secure?
Security is a major consideration when contemplating using MT for your content. There is an infamous case of a corporate privacy breach due to the use of Google Translate by Translate.com in 2017. In this case, the state-run oil company of Norway, Statoil, had employees who used Translate.com’s free online machine translation service, which in turn used Google Translate. Google Translate in its user license agreement says that it will use and potentially place in the public domain any information users pass into the system. This ultimately led to sensitive corporation information making its way into Google Search results. The lesson here is that if you engage a free public service for machine translation, your content becomes the property of the service provider in most cases and security cannot be guaranteed.
Many of these tech companies, however, provide secure systems which will require greater technical commitment and tools for processing your content. Translation service providers who use machine translation software in the delivery of their services—if they are professional and responsible—will rely on secure systems to translate your content. If your organization has a large volume of content to be machine-translated, then it may be easier to engage a service provider who already has these systems in place. This will make the overall process smoother and faster, and your organization will not have to develop its own workflow.
In addition to security, you need to consider how much content you may want to process through machine translation. If large volumes (more than the occasional email in another language, for example) then you will need a way to pass those documents efficiently and securely to the MT engine. You can’t copy & paste your way through this!
Translation service providers have translation management systems that connect to machine translation engines via encrypted API connections. These translation management systems are designed to ingest all different types of content from PDF files to structured XML content. The TMS parses this content, so it is easy to pass to the MT engine and keeps tags and other formatting data, which also eases publishing of the translated content. Even more sophisticated scenarios for delivering the content back to cloud-based publishing systems and corporate intranets can be constructed to not only automate translation but also publishing.
How to Get Started with Machine Translation of Your Content?
The best way to get started with using machine translation for your content is to engage a translation service provider who uses machine translation as part of their standard process. Even within the language services industry machine translation adoption varies between service providers. The best thing you can do is engage translation service companies and find out how they translate their content and if your organization can benefit from their use of machine translation. Note that many translation providers, even if they do use MT as part of their production processes may not offer to sell you raw machine translation—that is, unedited content produced by a machine translation because of the potential liability. Even if they do, there will be a cost to set up the process for your content and likely a small per-word rate that will be charged, so do not expect this service to be offered for free. Also do not be surprised if they ask you to sign a waiver that indemnifies their company for any errors or defects. This is a reminder that machine translation quality, albeit miraculous today as compared to just five years ago, is not the “Universal Translator” made famous by Star Trek. The machines still have a way to go to match the uniquely human aspects of language.