Preparing Content for Translation

Strings

The basic unit of content in Smartling is the ‘string.’ Strings are extracted from source content such as documents and resource files that are sent to Smartling for translation. Depending on the format of source content and how it is processed a string could be a single character, a word, a sentence, or a paragraph of text. Below are some examples of strings:

  • Home
  • 21 January 2020
  • What a beautiful home!
  • This is a <span>beautiful</span> home. It’s where I grew up.

Repetitions

When the same piece of text appears multiple times in the source content it is referred to as a ‘repetition.’ Repetitions found in source content sometimes need to be deduplicated and translated as a single string in Smartling; and sometimes they need to be kept separate. 

For example, on a real-estate website, the word ‘Home’ might refer to both the ‘home page’ and to a ‘house.’ Since the source word ‘Home’ might be different in translation for each of these cases, we need to ensure that the same source text can appear as separate strings in Smartling to facilitate different translations depending on context. On the other hand, large numbers of unnecessary repetitions can inflate translation costs and make estimating and managing workloads challenging. So it also needs to be possible to remove duplicates when they’re not needed.

The following sections explain the elements that influence this behavior and how it can be controlled.

Creation of Unique Strings

After content is uploaded to Smartling, but before it is made available to users, it goes through the steps of ‘parsing’ and ‘deduplication’. Parsing breaks the source content into strings. Deduplication removes strings that are considered to be duplicates. 

Take the HTML file, sample.html, below as an example: 

<html>

<head></head>
<body>

  <div>Here is some text to translate</div>
  <div>Here is some text to translate</div>
  <div>Here is some text to translate</div>
</body>

</html>

When this file is uploaded to Smartling, the parsing process extracts three copies of the string ‘Here is some text to translate’ from the file; then the deduplication process discards two of the copies, and presents a single copy of the string for translation. When that single string is translated, the same translation will be used for each of the three strings found in the original file.

Deduplication is seen only in the Smartling platform, and not in the translated file.

In deciding which strings to discard as duplicates, the following attributes are considered for each string: 

  • Text. In the above example, the text is: ‘Here is some text to translate’.
  • Variant. An attribute of a string that can be set to an arbitrary value for some file types, and for others is set automatically based on the string’s key. In the above example, the variant is not set to anything (i.e., is null). The variant of a string is displayed in the Smartling dashboard in the Strings View, under the source string (see example below).
  • Namespace. Another attribute of a string. The namespace is set to the name of the uploaded file by default, or is set to null in the case of GDN content. The namespace is not visible in the UI.
  • Source (GDN only). The type of content the string was ingested from (e.g., HTML or Javascript).

In the example, the parser presents the following strings to the deduplication process:

Text

Variant

Namespace

Here is some text to translate

null

sample.html

Here is some text to translate

null

sample.html

Here is some text to translate

null

sample.html

 Since all three rows are exactly the same, they are considered duplicates of each other and are collapsed into a single unique string in Smartling. To allow different translations for each copy, the variant or namespace of the strings would need to be modified.

Variant

A variant is an attribute of a string that is extracted from the source content during the file parsing process. It can also be set using GDN rules or via the Strings API for strings created in that way. 

In our sample.html example, the variant was not set to anything. This is the default behavior for HTML content, but other file types behave differently (see table below Parsing). For example, had we placed the same text into three cells of an Excel file, three separate strings would have appeared in Smartling. This is because the Excel parsing process automatically assigns a different variant attribute to each string extracted from the file. For other file types, such as Java Properties, the key in the source file is automatically copied as the variant attribute. And for our sample.html example, we could choose to set the variant by adding a variant attribute to the DIV tags.

Read how Smartling creates varients for each content-type here.

Using Smartmatch with Variants

You can set up Smartmatch to match variants of strings in your Translation Memory. Enabling 100% match with variants will only match strings with identical text when the variant metadata is the same. Enabling 100% match without variants will ignore variant metadata and match any strings with identical text.

Variant displayed in Smartling dashboard (New Experience) 

The value of a string’s variant attribute is displayed in the Strings View of the Smartling dashboard as shown in the screenshots below;

Screenshot_2020-06-22_at_14.48.45.png

You can also use the Key/Variant Filter to find a string with variants and take bulk action. See Actions in Strings View for more.

Screenshot_2020-06-22_at_18.08.32.png

Variant displayed in Smartling dashboard (Classic) 

Since Smartling’s List View displays the plain text of a string, it can be difficult to distinguish one Variant from another. Here are a few tips for working with Variants.

  1. In the List View, click the gear wheel to turn on Show Key (files only) and Show Variant. This will display any available Variant metadata next to each string and can assist you in distinguishing strings with identical text. Note: for Business Documents, no variant data is available in the List View.
    key_string.png
  2. If you have used GDN or file integration to mark variants, you can search for your variant names in the Key (files only) or Variant search box.
    key_variant2.png
  3. If there is no variant metadata, the context and code views in the CAT tool may reveal the differences in the inline tags that caused two unique strings to be created.

Variant displayed in CAT Tool

In the Smartling CAT Tool, the value of a string’s variant attribute is displayed Translation Resources in the Additional Details panel (top right);

Screenshot_2020-06-22_at_15.28.10.png

 

Namespace

Like variant, namespace is an attribute of a string used in deduplication. The namespace of a string is not shown in the dashboard, but is set by default to the file URL. This means that the same text uploaded in different files, even with the same variant values, will be considered different strings in Smartling. 

Take the following two Java Properties files for example:

Key1 = A string to translate

sample1.properties

 Key1 = A string to translate

sample2.properties

 If sample1.properties and sample2.properties are uploaded to Smartling, the parser produces these strings for deduplication:

Text

Variant

Namespace

A string to translate

Key1

sample1.properties

A string to translate

Key1

sample2.properties

Although the text and variants are the same, the namespace is different, and so the strings are considered different and are not deduplicated, i.e., two strings are presented for translation. 

While the default behavior of automatically using the file URI as the namespace is usually desirable, it can be overridden by specifying a namespace via API when uploading the file. 

An example use case for specifying a custom namespaces is when multiple developers need to translate the same resources file in different code branches. In order for each developer to have their own version of the file translated they upload it separately using the full path to the file as the URI in Smartling, including the branch. This prevents overwriting of files uploaded by developers working in other branches which could cause strings to disappear from the translation workflow. But to avoid unnecessary string duplication across all of the copies of this file uploaded from different branches, the namespace can be set to the file path excluding the branch. This results in the deduplication of all strings that have the same key and text values across all copies of the file in Smartling.

String Sharing

Some Smartling connectors provide options for overriding the default namespace behavior. In addition, it is also possible to configure Smartling to always set the namespace to null for all strings in a project. This is referred to as ‘string sharing’ and results in all strings with the same text and variant being deduplicated in a project, regardless of what files the strings came from. 

For GDN projects, the namespace is set to null for all strings; and for Strings API it is set via the API.

Applying variants to identical strings will create unique strings, even if string sharing is enabled.

Ask your Smartling CSM if you want to enable namespace behavior for an existing account.

Parsing Process

String creation and variant setting can be influenced through the parsing process. In the example above, the content is broken into strings based on ‘block-level’ tags (’DIV’ tags in this case) and the variant is to null.

Parsing rules differ depending on the file type being processed. With an Excel file, for example, the content of each cell becomes a separate string in Smartling and each string gets a different variant. The table below shows the basic string segmentation approach for various file types, as well as how the variant attribute is set.

Content format

What the parser uses to define the boundaries of the string

How the variant attribute is set

Excel

Cell

Automatically generated.

Word

Paragraph (but not newline)

Automatically generated.

Powerpoint

Paragraph/text box

Automatically generated.

CSV

Cell. Optionally segment HTML strings into additional strings based on block-level tags.

Copies the key, if defined; otherwise set to null.

Text file

Newline

Automatically generated.

HTML

Block-level tag (e.g., DIV or LI, but not SPAN or A). See Capturing Content for a full list of block-level tags.

Uses the variant attribute of the HTML element, if defined; otherwise set to null.

Key-based file formats, such as Java Properties, Xliff, iOS, Android

Key. Optionally split HTML strings into additional strings based on block-level tags.

Copies the key.

JSON

Element or object. Optionally segment HTML strings into additional strings based on block-level tags.

Copies the key, if defined; otherwise set to null.

XML

Element or object. Optionally segment HTML strings into additional strings based on block-level tags

Copies the key, if defined and variants enabled; otherwise set to null.

GDN HTML

Block-level tag

Uses the variant attribute of the HTML element, if defined; also can be set using advanced GDN rules; otherwise set to null.

Strings API

Defined explicitly in API call

Set explicitly in the API call.

Markdown

Markdown codes corresponding to block-level HTML tags

Set to null.

Parsing behavior can be modified through the use of directives and rules. For example, specifying which columns of a CSV file to extract as source text for translation and which to use as keys/variants is done through the use of file-parsing directives, eg;

# smartling.source_key_paths=1

# smartling.paths=2

Directives can be included within the file itself or can be supplied through other methods, such API. The directives available for each file type are listed on the Smartling help page for that file type. See Preparing Content for Translation. 

For more on Content Parsing in the CAT Tool, click here.

Recommendations for Controlling Repetition Behavior

First, become familiar with your content: what kind of repetitions it contains and how they need to be handled. You may have different needs for different content types.

If duplicate content needs to be kept separate, then it should have variants defined. Ideally, it should be in a file format that allows keys to be specified, so that the key can be customized and used for the variant. Content that needs to be kept separate like this often has a key associated with it anyway because the application using it needs to identify the correct string. So it’s best to make sure this key is being captured by Smartling. This type of content should not be translated in Excel because although Excel will keep strings separate it auto-generates the variant instead of using the key. CSV is usually a better alternative; or better still might be the native file format of the source application.

If repeated content needs to be deduplicated, the simplest approach is to include it all in the same file without setting a variant (or setting the same variant). This can be done using a CSV file without keys defined.

If content needs to be deduplicated across files, then a custom namespace (or no namespace, i.e., ‘string sharing’) will have to be used, in addition to keeping the same variant for those strings. 

Match your content type with the most suitable file format and attributes.

 

Was this article helpful?