Preparing Content for Translation

Content Parsing

Whenever Smartling captures content from your file, app, or website, the system parses or breaks it down into small, discrete entities. This makes it easier to re-order a translation, whether for formatting or linguistic reasons. There are two levels of parsing: strings and segments.

Strings

A string is a unit of content. Smartling extracts strings from the captured content based on the paragraph markers found in your content. Those markers vary depending on the content or file type (see below table).

Segments

Strings are parsed into segments, defined by end of sentence punctuation markers, i.e. periods, semicolons, interrogation points.

Parsing Process

Parsing rules differ depending on the file type being processed. With an Excel file, for example, the content of each cell becomes a separate string in Smartling and each string gets a different variant. The table below shows the basic string segmentation approach for various file types, as well as how the variant attribute is set.

Content format

What the parser uses to define the boundaries of the string

Notes

Excel

Cell

In the same column is recommended.

Word

Paragraph or line break

Not a newline

Powerpoint

Paragraph/text box

The strings are arranged in order of when each text box was created, not where it is placed on the slide.

inDesign

Paragraph within a text frame

 

CSV

Cell. 

Optionally segment HTML strings into additional strings based on block-level tags.

Text file

Newline

 

HTML

Block-level tag

Examples; DIV or LI, but not SPAN or A.

See Capturing Content for a full list of block-level tags.

Key-based file formats, such as Java Properties, Xliff, iOS, Android

Key

Optionally split HTML strings into additional strings based on block-level tags.

JSON

Element or object

Optionally segment HTML strings into additional strings based on block-level tags.

XML

Element or object

Optionally segment HTML strings into additional strings based on block-level tags.

GDN HTML

Block-level tag

 

Strings API

Defined explicitly in API call

 

Markdown

Markdown codes corresponding to block-level HTML tags

 

 

Directives

Parsing behavior can be modified through the use of directives and rules. For example, specifying which columns of a CSV file to extract as source text for translation and which to use as keys/variants is done through the use of file-parsing directives, eg;

# smartling.source_key_paths=1

# smartling.paths=2

 

Another important file-parsing directive which can modify the format of strings is:

# smartling.string_format_paths= [file format type, e.g.: HTML]

This directive can be used to specify a single path for a format, or a comma-separated array of paths. You might use the latter if you have a resource file containing disparate kinds of strings that are used in different contexts. For example, a JSON file containing application UI strings, with standard string formatting and placeholders, together with one or few keys that are large HTML documents (example; TOS or Privacy Policy).

The advantages of specifying the string path format as HTML in a non-HTML file is that large strings are broken down in smaller strings. This will benefit both your translation resources and your translation memory leverage. It is worth noting that in doing so, you lose any "keys", however, you still have "variants". In most cases, such as translation Jobs, this is acceptable, however, this is not acceptable if you are importing translations from a file, as keys are critical for alignment.

Furthermore, if you specify multiple string formats, and choose ICU and HTML, ICU formatting rules takes precedence over HTML.

 

Strings and Segments in the CAT Tool

Larger strings may be further divided into segments, only visible in the CAT tool. A segment is usually a sentence, with a sentence-ending punctuation mark such as a period (.), exclamation point (!) or question mark (?) creating a new segment.

The following example shows an entire string (denoted with a green vertical bar) that has been parsed or broken down into two segments. A Translator, Editor, or Reviewer will then be able to translate or edit each of the corresponding segments.

Merge Segments 

If you're translating a string with multiple segments, you have the option to merge segments. Mouse over the Merge segment into next icon. Alternatively, you can use the shortcut that you've set in your keyboard settings.

merge_segment.png

Was this article helpful?