What is Linguistic Quality Estimation for machine translation?
As part of Smartling’s AI Toolkit, Linguistic Quality Estimation (LQE) can help you achieve cost savings for content that goes through a machine translation step, followed by human Post-Edit, Edit, or Review steps (commonly referred to as MTPE).
Linguistic Quality Estimation uses AI to predict the quality of each string translated via machine translation. This means that Smartling can estimate the quality of machine translations, giving insight into how much editing may be required by a linguist in any post-translation steps before the string can be published.
Based on this analysis, human validation can be utilized in the most efficient way, for example by setting up a dynamic workflow based on the predicted quality of the translation.
How are strings assessed?
There are two options available for Linguistic Quality Estimation (LQE) assessment:
Standard
Smartling uses a Large Language Model to assess the machine translation output based on four criteria:
- Grammatical correctness
- Fluency
- Semantic coherence
- Lexical accuracy
The machine translation output is also checked against your Linguistic Assets:
- The machine translation output is compared to matches from your Translation Memory that may be available.
- The machine translation output is evaluated based on the criteria established in your Quality Check Profile. If any quality check errors are detected, it is assumed that a high level of human effort will be needed during the editing process.
If Glossary Compliance checks are enabled, Linguistic Quality Estimation also verifies if any available glossary terms were applied in the machine translation output. - If a Style Guide using the Smartling template is available, the machine translation output is compared against elements such as the “Do not translate” language conventions. For optimal performance, it is also recommended to specify the industry (domain) of your organization or brand in the Style Guide.
Fine-tuned model
Smartling offers the option to purchase a custom fine-tuned model for Linguistic Quality Estimation (LQE) assessments. This provides higher-quality estimation results, allowing the three levels of effort to be assigned with much greater accuracy, as the model is trained on your specific data. Instead of relying on a large language model (LLM) for LQE, fine-tuned models use machine learning through Cross-lingual Language Modeling (XLMR), similar to training a custom MT engine.
However, unlike custom MT engine training, where you need a different model for each locale pair, you only need one fine-tuned model for LQE, which will be used for all your locale pairs based on one source language and your target language(s).
Once a fine-tuned model has been created, you can select it for LQE within the workflow step configuration.
If you are interested in using fine-tuned models for LQE, please reach out to your Customer Success Manager to discuss this option.
Linguistic Quality Estimation levels
During the machine translation step, each string gets assigned a label, based on the predicted quality level of the translation.
-
High:
High translation quality predicted.
For strings with the label "High", the human post-translation step can potentially be skipped as part of a Dynamic Workflow. -
Medium:
Medium translation quality predicted.
These strings are often understandable in the target language, but require some validation or light editing to ensure an idiomatic translation. -
Low:
Low translation quality predicted.
For strings with the label "Low", we recommend never to skip the human post-translation step. These strings typically require more extensive human validation and editing.
Where are Linguistic Quality Estimation levels displayed?
Once your content has gone through LQE, the level assigned to each string can be used to analyze and take action on your MTPE workflows.
Linguistic Quality Estimation in the Strings View
The level assigned to each machine translated string can be checked in the Strings View:
- In Smartling's default view, the Linguistic Quality Estimation level is displayed in the Translation column, below the saved translation.
- Alternatively, the Linguistic Quality Estimation level can be displayed as a separate column in the Strings View, by creating a Custom View.
The Linguistic Quality Estimation filter allows you to filter the Strings View by the predicted quality level.
Linguistic Quality Estimation in the Word Count Report
To help you achieve cost savings for content going through a human post-translation step, your Word Count Report displays the Linguistic Quality Estimation levels for content that was submitted from a machine translation step.
Linguistic Quality Estimation Level Discounts in your Fuzzy Match Profile
Based on your vendor agreement with your Language Services Provider, lower post-editing rates may apply for content with a higher estimated quality level.
Once the AI Toolkit has been enabled for your Smartling account, you can create a Fuzzy Match Profile that takes into account potential discounts based on Linguistic Quality Estimation levels. You can enter the payable rate percentage for each level, as per your agreement with your translation vendor.
As soon as the AI Toolkit has been enabled for your account, you also have the option to update your existing Fuzzy Match Profiles with Linguistic Quality Estimation Level Discounts.* Please note that this can be done only once. As soon as Linguistic Quality Estimation Level Discounts have been populated and saved for your existing Fuzzy Match Profile, it is not possible to edit them at a later stage. To avoid altering Fuzzy Match Profiles during an ongoing billing period, a new profile needs to be created if Linguistic Quality Estimation Level Discounts need to be amended at a later point.
In order for Linguistic Quality Level Discounts to take effect, ensure to apply your Fuzzy Match Profile also to the post-translation step.
*It is not possible to add Linguistic Quality Estimation levels to the system-generated default profiles, 'Default' and 'No Discount'. If you need to use one of these default profiles as your account default or for certain workflows and add Linguistic Quality Estimation discounts, you will need to create a copy of the system-generated profile and use that one to add Linguistic Quality Estimation discounts.
Linguistic Quality Estimation in Cost Estimates
Linguistic Quality Estimation levels and discounts are shown in the cost estimate for your translation Job:
- If Linguistic Quality Estimation is enabled for the workflow used to generate the cost estimate, and
- if Linguistic Quality Estimation Level Discounts are set up in the associated Fuzzy Match Profile.
In order for Linguistic Quality Estimation levels to be displayed, the cost estimate is refreshed automatically once all content has gone through the LQE process in the machine translation step.
Linguistic Quality Estimation in the CAT Tool
The linguists working in a post-translation step can see the LQE level assigned to each string in the CAT Tool. This helps them gauge the quality of the translation.
Info: For strings with a fuzzy match discount, only the fuzzy match percentage is displayed in the Word Count Report and in the CAT Tool. The Linguistic Quality Estimation level is not displayed in the Word Count Report or in the CAT Tool if the available translation memory match qualifies for a discount, as per your Fuzzy Match Profile.
Content routing based on Linguistic Quality Estimation levels
When creating a Dynamic Workflow, Linguistic Quality Estimation levels can be used as a criteria to route content to different workflow steps:
- Linguistic Quality Estimation levels are applied during the machine translation step.
- A Dynamic Workflow with a post-translation Decision step can be created to route the machine translated content based on the estimated quality level.
- Based on the quality level, the strings can be sent to different branches in the workflow.
For example, strings with the label “High”, which are unlikely to require any human intervention, could be sent directly to the Published step, while strings with the label “Medium” are likely to require some light editing and could be sent to a Review step for final touches. Strings with the label “Low”, which are very likely to need a medium or high level of editing, could be sent to a Post-Edit step.
Example workflow:
Routing content to the right step based on their estimated quality level can help save translation costs and utilize human editing in the most efficient way.
How to enable Linguistic Quality Estimation
Linguistic Quality Estimation is only available as part of Smartling’s AI Toolkit. To learn more or to enable the AI Toolkit for your account, please reach out to your Smartling Customer Success Manager.
Once the AI Toolkit has been activated for your account, Linguistic Quality Estimation then needs to be enabled on a workflow step level:
- In your Smartling project, navigate to Settings > Workflows.
- On a workflow that uses machine translation, click Manage Step on the Translation step.
- Ensure that the toggle to Linguistic Quality Estimation is switched On.
By default, this setting is disabled.
Tip: If this setting is not visible for your workflow, please ensure to select an MT Profile in the "Translation Provider" dropdown first.
- [Optional Add-On] If you have purchased a fine-tuned model for LQE, select your custom model from the dropdown menu.
Supported workflows
We would recommend enabling Linguistic Quality Estimation only on MTPE workflows, i.e. workflows with a machine translation step, which is followed by a step for human Post-Edit, Edit or Review.
Linguistic Quality Estimation is not available for workflows managed by Smartling Language Services, including AI-Powered Human Translation.
Important considerations
SmartMatched strings are not evaluated
If a string uses SmartMatch, no Linguistic Quality Estimation level is assigned, as the string bypasses the machine translation step.
Supported languages
While Linguistic Quality Estimation is available for all locales supported by Smartling, we would recommend using it for the most common, high-resource languages for the best results.