Practical strategies for creating localization-friendly content
Embrace Localization-Friendly Design from the Start
If you design your content and layout with localization in mind, anticipating variances in text length, directionality (for languages like Arabic or Hebrew), and cultural appropriateness of images and icons, then you set yourself on a great path to achieving the Dos and automatically avoiding the Don’ts.
When it comes to your documentation you should be doing the same thing.
If your source content is set up with localization in mind, then the translation process becomes completely transparent.
- Think Terminology
- Optimize variant management
- Translate once, reuse everywhere
- Streamline your workflows
- Use Metadata & attributes
Creating content with localization in mind lays a solid foundation. Consider text length variances, directionality for languages like Arabic or Hebrew, and the cultural appropriateness of images and icons. This anticipatory approach simplifies translation and adaptation to various markets.
Think Terminology
Uniform terminology across your content streamlines the translation process. Utilizing a multilingual term base, which includes allowed terms, their disambiguation, and forbidden terms, ensures consistency. This base should be accessible to your Language Service Provider (LSP) and integrated with your Component Content Management System (CCMS) for seamless automation.
If not harmonized between source and target, your content will increasingly become more difficult to translate and even more so to localize. To do this you need to be able to automate some of the tasks.
- Invest in a multilingual term base
- Terminology in any organization will vary from department to department, and sometimes within the same department for example, revision vs version
- Your term base should not only contain allowed terms but also disambiguation of each term and forbidden terms
- For each allowed term, the equivalent translation must exist
- Provide this term base to your LSP
- Perform routine maintenance of your term bases on a regular basis. Clean up term bases, source content, and update to translation memories
- As this continued maintenance is required, and is a lot of work, get your CCMS to help
- Term bases can be included in your CCMS or may be an external system connected via API
- External specialized systems like Acrolinx, Congree, can talk to the TMS through the CCMS
Optimize Variant Management
Leveraging DITA's capabilities, such as conditional processing, allows for the creation of dynamic content tailored to specific markets. This not only enhances the user experience but also optimizes costs by localizing only the necessary content for each audience.
While we all use Conditional processing for delivering dynamic content. Managing variants of the source file before translating will give you better results. If you optimize variant management, you localize the right content for the right audience. This will also help you reduce costs as you do not have to translate all your content to all languages.
- Leverage Conditional Processing or use Dynamic Branching of your source content to create localization packages for Global Audiences.
For example, for your French for France (fr-FR) content, should you translate content that meets US requirements that are never going to be in the product destined for only for France? - Use DITA conditional processing to tailor content dynamically for different regions, rather than creating separate versions manually. This approach streamlines the localization process and ensures consistency across versions.
- One example of this could be to use Dynamic Release Management
- If you have One set of content but regulatory requirements are different. Easier to manage it as branches as opposed to conditional processing.
- One example of this could be to use Dynamic Release Management
An example of where DRM will come in more useful than simple conditional processing is this. Consider you have a product that needs regulatory approval. You send off the registration documents, including the Instructions for Use (IFU) to the US FDA, to the China FDA, and so on. You then obtain clearance for release of the product by one of those markets. 3 months later, you finally receive a request from thus FDA, requesting modifications, some of them major. Using conditions to manage this will start eventually to look like spaghetti code in your XML document. A myriad of markers to show which version goes to what authority, and a plethora of DITAVals to drive them. You “invest in an Excel spreadsheet with a matrix to determine which DITAVal and/or keys to use to produce the right output. Or you could just create a branch from the original IFU sent, and then make those modifications there. You then manage US FDA related changes in that branch alone. If the changes are common to all, that can be done at the trunk level, and then flowed down to the respective branches. Wasn’t that easy?
Doing this with DITAVals will drive you nuts. I almost went nuts. The CCMS we were using had difficulty with Branching.
Translate once, Reuse Everywhere
DITA's reuse capabilities, such as content referencing (conref), facilitate efficiency but come with challenges in localization. Ensuring that each piece of reused content fits seamlessly into its new context requires careful management and adherence to best practices in indirect referencing.
Handle DITA Content Refs With Care!
The biggest advantage of DITA is re-usability using indirect reference attributes such as DITA conref (content referencing) and conkeyref.
When conref is used in DITA it acts as a placeholder for the element that it references. When the DITA content is published, the conref elements will be replaced by the content that they reference. This way this content can be reused throughout the documentation.
However, unique problems can arise with conref during localization.
The fragment of content in translated DITA documentation does not always fit back into the context of the translated content block that referenced it. It may have a series of inconsistencies such as grammatical errors, directional errors, and cultural errors. For example “She asked Bob” is correct, but “Bob asked she” is wrong.
The best way to avoid errors in your content reuse strategy with DITA indirect referencing is by avoiding complex usage and maintaining a few good practices:
- If using at an inline level it should be used for things like <UICONTROL>, Company Name, Product
- Avoid using <conref> for incomplete phrases. Every conreffed sentence should be grammatically complete
- Avoid using <conref> for common nouns.
- While using proper nouns, keep the proper noun in the nominative case and use it as the subject of the sentence
- Reusing content at a topic level is preferred, at the phrase level is to be avoided, and at the word level only in exceptional cases like user interface elements
Streamline Your Workflows
An agile approach to localization—where content is packaged and sent for translation as soon as it is ready—accelerates the process. This method relies on well-defined workflows, clear communication with your translation provider, and meticulous version control to ensure translations remain current with source content updates.
Some rules when streamlining the workflow
- Let your translation provider know about the DITA version and tools you are using
- Before sending files for translation, make sure you have fully optimized and fixed all the errors, validations, extra spaces, conref attributes, conkeyref attributes, and the DITA folder structure.
- When sending your files for translation, also send a PDF published version of your English content for reference. That way, translators can understand the context of the content.
However if you have a robust collaborative review tool provided with your CCMS, that is an even better way to go. Translators can sign in and see rendered versions of the source content for context. - Clarify the format and standard of the file your Language Service Provider (LSP) will deliver. It must not have any broken links or code errors and must retain the original mark-up of the source document.
- Clarify beforehand if you are required to put the xml:lang attribute to specify the language in your DITA files of the translated copy, or they will do it.
- Properly mark the elements and topics with the translation attribute set to “No” if you do not want them to be translated. You can also create a list of such non-translatable elements.
Use Metadata & Attributes
Semantic markup enhances the accuracy of machine translations by providing context. This reduces the need for post-translation editing and streamlines the localization process. Elements like UI controls and keywords benefit from being marked up semantically, allowing for automated translation tools to better understand and translate content.
Implement Semantic Markup for Enhanced Machine Translation:
Use semantic tags extensively to provide context for machine translation tools. By marking up content semantically (e.g., <uicontrol>, <keyword>), you improve the quality of automated translations, reducing post-translation editing work.
- Elements like UICONTROL should be translated separately and then injected at publication
- Means you are No longer waiting for translation of UI before launching doc translation
- Terms should be locked to prevent inadvertent change during translation
- Translators should be provided with a “pre-translated” version that has those translated terms already prepopulated giving them the Context of the terms used
You Gain in Quality Assurance and Go-To-Market time.
Use Metadata to Streamline Localization Workflows
For this article I want to identify three types of Metadata, there are probably many more.
- System Metadata: Managed by the CCMS. For example, revision/version numbering, product, author, last modified date etc. This can for example, allow you to easily match the authoring version to the translated version. A visual check in CCMS enables you to quickly confirm that x = y.
- Content Metadata: Inside of the dita file. For example, audience, conditional processing instructions, product names, measurement units (imperial vs metric).
- Build Metadata: output instructions. For example, publication date, personalization, revision history, target Content Delivery Platform (CDP). Delivering Service manuals to a system accessible only by field engineers, and even splitting that into categories of engineer, from Novice to experienced, or internal vs external engineers. Much like Apple does with its Apple Authorized Service Providers and Independent Repair Providers.
DITA elements also have several attributes that support localization and translation. Using these attributes will provide better context for the translator, but even better would be for you CCMS to be able to perform actions based on the value of the attribute.
- @xml:lang Identifies the language of the content, using the standard language and country codes.
For instance, French Canadian is identified by the value fr-CA. The @xml:lang attribute asserts that all content and attribute values within the element bearing the attribute are in the specified language, except for contained elements that declare a different language. When sending content for translation, you can set the workflow to package and send only relevant files to be localized (or translated). So, my fr-FR topic would go to the French from France translator, or would evoke the correct Translation Memory - @translate Determines whether the element requires translation. A default value can often be inferred from the element type. For example, <uicontrol> might be untranslated by default, whereas <p> might need to be translated by default.
Another common example is @translate=NO.
Set up the workflow to render the content tagged by that attribute to be read-only. Do not exclude the content, because it might provide context for the translator. Your workflow could also revert that content to the source language on import, so just in case someone did not get the instruction not to translate the content. - @dir Determines the direction in which the content should be rendered. Not all languages use left-to-right (LTR) script like English. Many languages use right-to-left (RTL) scripts such as Arabic, Hebrew, and Urdu, and a dozen more. It is best practice to set the Dir attribute for the publication system to optimize the translations. The @dir comes into play if you are really “thinking global”! Do not wait to sign that Qatari contract before you then instruct your writers to set the rendering direction. This will save a lot of pain in the future.
Use permissions to control who does what
Your source control system, usually a CCMS, must allow you to manage permissions, at a granular level.
- Who is allowed to create values for attributes?
- Who is allowed to apply them?
And finally
- Try out Controlled Language. Simplified Technical English for example is far easier to localize than freeform English.
- Invest in the linguistic review process. Keep in-country reviewers or pay the LSP for a localization review.
- Do not forget tools for content quality control. Checking for passive versus active voice, or forbidden terminology can be done on the fly.
- Strategize to maintain translation memories.
- Plan for managing multilingual screenshots. This is a topic for a full article.
Ongoing Localization Maintenance
Localization is not a one-off task but a continuous process that adapts to product updates, legal changes, and cultural shifts. Planning for regular updates and maintenance of localized content is essential for keeping documentation relevant and engaging across all markets.
Planning for localization from the beginning of your documentation process ensures that your content is adaptable, relevant, and effective across global markets. Following these guidelines not only improves user experience but also optimizes your content production workflow for efficiency and compliance. Remember, localization is an ongoing process that requires diligence, a deep understanding of your target audience, and a commitment to continuous improvement and adaptation.
For more insights and resources on localization and translation, explore MadCap Software's blogs and services, where you will find a wealth of information tailored to help you navigate the challenges of creating globally accessible content.