Smartling's Translation Memory Optimization feature enables you to create an optimized copy of an existing TM or merge multiple TMs into one optimized copy. This process helps generate a clean dataset for translating with Google Adaptive.
Requesting an Optimized TM is a five-step process in which you define the criteria for removing, repairing, or flagging specific TM units:
- Language Selection: Select the languages in the TM you want to retain and optimize.
- Cleaning Preferences: Specify which units should be removed.
- Repair Preferences: Specify which units should be repaired.
- Flag Preferences: Specify which units should be flagged for review.
- Review Configuration: Review a summary of your TM optimization request before submitting.
This article outlines the cleaning, repair, and flag preferences, along with examples to help you better understand how each option works.
TM Optimization is currently available as a beta feature. If you're interested in optimizing your TM for translating with Google Adaptive, please contact your Smartling Customer Success Manager to have this feature enabled in your account.
Cleaning Preferences
Remove unbalanced units
Removes units where the translation has a different number of sentences than the source.
Example: This unit would be removed because the translation is one sentence, while the source is two sentences.
Remove units created before a date
Removes units created before a specified date. You can see when a unit was first created via the Activity Tab.
Example: This unit was created on Jan 29, 2024.
Remove units created after a date
Removes units created after a specified date. See above.
Remove units by user
Removes units first translated by specified users. You can see who translated a unit by checking the author in the activity tab.
Example: This unit was first translated by Smartling Auto Select MT.
Remove units by project
Removes units tied to specific projects. One unit can be tied to multiple projects. All units tied to the project(s) you select will be removed in the optimized TM. You can see what project a unit is tied to by checking the source in the activity tab.
Example: This unit would be removed if you selected either "Test Project" or "Example Project".
Remove units where source and target are the same
Requires the target to be exactly the same as the source.
Example: This unit would be removed.
Delete old units with the same text and keep only the latest version
If there are multiple units with the same source text or translation, only the most recently translated unit will be kept. You can check when a unit was last translated by viewing the date under "Last Updated."
Example: Only the first entry listed below would be kept, the second entry would be removed.
Remove units based on length
Removes units where the translation is either too short or too long based on the specified word count.
Remove misaligned target entries according to length
Removes units with target text that is longer (in characters) than a given percentage of its source text. Set a minimum length ratio, maximum length ratio, or both.ratio = length of translation / length of source string
Example: If the source string is 40 characters and the translation is 2 characters, the length ratio would be 0.05 (2 / 40 = 0.05). This low ratio means there is a high probability that the translation is incorrect.
- If you set a minimum length ratio of 0.05, units with translations shorter than 2 characters would be removed.
- If you set a maximum length ratio of 4, units with translations longer than 160 characters would be removed (160 / 40 = 4).
- If both minimum and maximum length ratios are specified, any units with translations shorter than 2 characters or longer than 160 characters will be removed.
Repair Preferences
Fix whitespaces
Removes any extra whitespaces in the source, target, or both.
Example: The extra whitespaces would be removed from this unit's source and translation.
Remove list bullet points
If a unit contains bullet points, the bullet points will be removed from the source and from the translation.
Example: The bullet points would be removed from this unit's source and translation.
Remove em and en dashes
If a unit's source or translation begins with em dashes (—) or en dashes (–), the dashes will be removed.
Example: The em dash (—) would be removed from this unit's translation.
Fix HTML entities
Fixes escaped HTML entities such as non-breaking spaces or other symbols in the source, target, or both.
Example: The HTML entity & in the translation would be fixed.
Flag Preferences
Flag repeated words in both source and target units
Flag units where a word in the source or the target repeats more than one time sequentially.
Example: This unit would be flagged.
Match tags and placeholders
Flags units where target tags and placeholders do not match those in the source.
Example: This unit would be flagged because the translation is missing the placeholder.
Match bracket pairs
Flags units where bracket pairs in the translation do not match those in the source. Brackets include square brackets, curly brackets, or parentheses. This option only considers the quantity of brackets and does not check for inconsistencies in the types of brackets used.
Example: This unit would be flagged.
Match URLs
Flags units where target URLs do not match those in the source. URLs need to have www, http, or https prefixes to be recognized as a URL.
Example: This unit would be flagged because the URL in the translation does not match the URL in the source.
Match quotation marks
Flags units where the number of quotation marks in the translation does not match the number in the source. This option only considers the quantity of quotation marks and does not check for inconsistencies in the types of quotation marks used.
Example: This unit would not be flagged.
This unit would be flagged.
Match numerical tokens
Flags units where the number of numerical tokens in the translation does not match the number in the source.
Example: This unit would be flagged because the number “1050” is missing from the translation.
Match alphanumerical tokens
Flags units where the number of alphanumerical tokens in the translation does not match the number in the source.
Example: This unit would be flagged because the alphanumerical token “T2273W” is missing from the translation.
Match uppercase words
Flags units where the number of capitalized words in the translation does not match the source. Capitalized words refer to words written entirely in uppercase (e.g., “NEW STYLES”).
Example: This unit would be flagged.
Match CamelCase words
Flags units where the number of camelCase words in the target does not match the source.
Example: This unit would be flagged because the source contains two camelCase words, but none are present in the translation.