🪡 Introduction to Strings & String Uniqueness

The basic unit of content in Smartling is the ‘string.’ Strings are extracted from source files such as documents and resource files that are sent to Smartling for translation. Depending on the format of source content and how it is processed, and how the file was prepared, a string could be a single character, a word, a sentence, or a paragraph of text.

Repetitions

When the same piece of text appears multiple times in the source content, it is referred to as a ‘repetition.’ Repetitions found in source content sometimes need to be deduplicated and translated once as a single string in Smartling. Sometimes they need to be kept separate and translated individually.

For example, on a real-estate website, the word ‘Home’ might refer to both the ‘home page’ and to a ‘house.’ Since the source word ‘Home’ might be different in translation for each of these cases, we must ensure that the same source text can appear as separate strings in Smartling to facilitate different translations depending on context. On the other hand, large numbers of unnecessary repetitions can inflate translation costs and make estimating and managing workloads challenging. So, it also needs to be possible to remove duplicates when they’re not needed.

Continue reading for Recommendations for Controlling Repetition Behavior.

How Smartling Creates Unique Strings

After content is uploaded to Smartling, but before it is made available to translators, it goes through the steps of ‘parsing’ and ‘deduplication’. Parsing breaks the source content into strings. Deduplication removes strings that are considered to be duplicates.

Take the HTML file, sample.html, below as an example:

<html>

<head></head>
<body>

  <div>Here is some text to translate</div>
  <div>Here is some text to translate</div>
  <div>Here is some text to translate</div>
</body>

</html>

When this file is uploaded to Smartling, the parsing process extracts three copies of the string ‘Here is some text to translate’ from the file; then the deduplication process discards two of the copies, and presents a single copy of the string for translation. When that single string is translated, the same translation will be used for each of the three strings found in the original file.

Deduplication is seen only in the Smartling platform, and not in the translated file.

In deciding which strings to discard as duplicates, the following attributes are considered for each string:

Text

Strings with the same words are (shared/deduplicated). If there are differences in any of the source text, including tags, placeholder, spacing etc. it is automatically considered a new string in Smartling. In the above example, the text is: ‘Here is some text to translate’.

Variant

A variant is an attribute of a string that allows you to save two or more different translations for strings that have the same source text. Variants are required to differentiate strings with the same source text but with different contextual meaning, e.g. "home" a house, and "home" a homepage on a website (link to module). To summarize, the same source text with different variants are considered two strings in Smartling.

Variants can be set to an arbitrary value for some file types, and for others is set automatically based on the string’s key. In the above example, the variant is not set to anything (i.e., is null). The variant of a string is displayed in the Smartling dashboard in the Strings View, under the source string.

For more information, read Strings Variants.

Namespace

A namespace is an additional metadata or unique identifier for a file (or URL) of content in Smartling. When uploading files manually, Smartling automatically uses the file name as namespace. This cannot be modified. However, if you upload files to Smartling via an API integration, you have control over the namespace and can specify it for each file you push to Smartling. The namespace is set to null by default in the case of GDN content.

How you choose to control the namespace, impacts string uniqueness in Smartling. If you have strings with same text, same variant coming from files with two different namespaces, they are considered two strings in Smartling.

For more information, read Namespaces.

Source (GDN only)

The type of content the string was ingested from (e.g., HTML or Javascript).

In the example, the parser presents the following strings to the deduplication process:

Text	Variant	Namespace
Here is some text to translate	null	sample.html
Here is some text to translate	null	sample.html
Here is some text to translate	null	sample.html

Since all three rows are the same, they are considered duplicates of each other and are collapsed into a single unique string in Smartling. To allow different translations for each copy, the variant or namespace of the strings would need to be modified.

For more information, read Creating Unique String Variants in the GDN.

Parsing Process

String creation and variant setting can be influenced through the parsing process. In the example above, the content is broken into strings based on ‘block-level’ tags (’DIV’ tags in this case) and the variant is to null.

Parsing rules differ depending on the file type being processed.

For more information, read How Your Content Is Broken Into Strings

Recommendations for Controlling Repetition Behavior

First, become familiar with your content: what kind of repetitions it contains and how they need to be handled. You may have different needs for different content types.

If duplicate content needs to be kept separate, then it should have variants defined. Ideally, it should be in a file format that allows keys to be specified so that the key can be customized and used for the variant. Content that has to be kept separate like this often has a key associated with it anyway because the application using it must identify the correct string. So, it’s best to make sure this key is being captured by Smartling. This type of content should not be translated in Excel because although Excel will keep strings separate, it auto-generates the variant instead of using the key. CSV is usually a better alternative; or better still might be the native file format of the source application.

If repeated content needs to be deduplicated, the simplest approach is to include it all in the same file without setting a variant (or setting the same variant). This can be done using a CSV file without keys defined.

If content needs to be deduplicated across files, then a custom namespace (or no namespace, i.e., ‘string sharing’) will have to be used, in addition to keeping the same variant for those strings.

Match your content type with the most suitable file format and attributes.

How long are strings stored in Smartling?

Indefinitely, until the string is deleted from the Strings View actions (GDN, Connector) or when the source file is removed from Smartling. If a string is deleted in the translation step, translations will not be saved to the translation memory, even if the translator saved it in the CAT Tool.

If the source file containing a string is deleted or a new version of the file is uploaded without the string, the string will be marked as inactive. If the string has no published translation, it will be removed from the Smartling dashboard entirely. However, if the string has a published translation, it will remain in the Published queue and will not be deleted; it will simply be marked as inactive. Inactive strings—those with a published translation but no longer linked to a source file in Smartling or a website using the GDN—cannot be deleted.

The string remains in the Smartling database for at least 6 hours after the delete action was taken in Strings View. After 6 hours, it is complete removed from our systems. For this reason, there is a chance that the string could be recaptured by GDN, and it could reappear wherever it was before the action, including position in the workflow, job, strings view, meaning, the string does not get deleted.

Hey! Hoi! ¡Oye! Ciao ! 你好! Hallo! Salut ! Hey! How can we help?