How Your Content Is Broken Into Strings

The process of breaking content up into translatable strings is known as string parsing. Whenever Smartling captures content from your file, app, or website, the system parses or breaks it down into small, discrete entities. This makes it easier to re-order a translation, whether for formatting or linguistic reasons. There are two levels of parsing: strings and segments.

Strings

A string is a unit of content. Smartling extracts strings from the captured content based on the paragraph markers found in your content. Those markers vary depending on the content or file type (see below table).

Segments

A large string, containing many words, may be parsed into segments, defined by end of sentence punctuation markers, i.e. periods, semicolons, interrogation points. Segments are only visible in the CAT Tool.

Please note:
If a section of text comprised of multiple sentences is surrounded by a single set of tags, it cannot be split into multiple segments.
For example:

In order for the string to be parsed into segments, the opening and closing tags must both be contained within the same segment:

Parsing Per File Format

Parsing rules differ depending on the file type being processed. With an Excel file, for example, the content of each cell becomes a separate string in Smartling and each string gets a different variant. The table below shows the basic string segmentation approach for various file types, as well as how the variant attribute is set.

Content format	What the parser uses to define the boundaries of the string	Notes
Excel	Cell	In the same column is recommended.
Word	Paragraph or line break	Not a new line or sentence ending in a full stop/period.
Powerpoint	Paragraph/text box	The strings are arranged in order of when each text box was created, not where it is placed on the slide.
inDesign	Paragraph within a text frame
CSV	Cell	Optionally segment HTML strings into additional strings based on block-level tags.
Text file	Newline
HTML	Block-level tag	Examples; DIV or LI, but not SPAN or A. See Capturing Content for a full list of block-level tags.
Key-based file formats, such as Java Properties, Xliff, iOS, Android	Key	Optionally split HTML strings into additional strings based on block-level tags.
JSON	Element or object	Optionally segment HTML strings into additional strings based on block-level tags.
XML	Element or object	Optionally segment HTML strings into additional strings based on block-level tags.
GDN HTML	Block-level tag
Strings API	Defined explicitly in API call
Markdown	Markdown codes corresponding to block-level HTML tags

Directives

Parsing behavior can be modified through the use of directives and rules. For example, specifying which columns of a CSV file to extract as source text for translation and which to use as keys/variants is done through the use of file-parsing directives, eg;

# smartling.source_key_paths=1
# smartling.paths=2

Another important file-parsing directive which can modify the format of strings is:

# smartling.string_format_paths= [file format type, e.g.: HTML]

This directive can be used to specify a single path for a format, or a comma-separated array of paths. You might use the latter if you have a resource file containing disparate kinds of strings that are used in different contexts. For example, a JSON file containing application UI strings, with standard string formatting and placeholders, together with one or few keys that are large HTML documents (example; TOS or Privacy Policy).

The advantages of specifying the string path format as HTML in a non-HTML file is that large strings are broken down in smaller strings. This will benefit both your translation resources and your translation memory leverage. It is worth noting that in doing so, you lose any "keys", however, you still have "variants". In most cases, such as translation Jobs, this is acceptable, however, this is not acceptable if you are importing translations from a file, as keys are critical for alignment.

Furthermore, if you specify multiple string formats, and choose ICU and HTML, ICU formatting rules takes precedence over HTML.

Strings and Segments in the CAT Tool

Larger strings may be further divided into segments, only visible in the CAT tool. A segment is usually a sentence, with a sentence-ending punctuation mark such as a period (.), exclamation point (!) or question mark (?) creating a new segment.

The following example shows an entire string (denoted with a green vertical bar) that has been parsed or broken down into two segments. A Translator, Editor, or Reviewer will then be able to translate or edit each of the corresponding segments.

Merge Segments

If you're translating a string with multiple segments, you have the option to merge segments. Mouse over the Merge segment into next icon. Alternatively, you can use the shortcut that you've set in your keyboard settings.

Hey! Hoi! ¡Oye! Ciao ! 你好! Hallo! Salut ! Hey! How can we help?