Preparing Content for Translation

Content Parsing

Whenever Smartling captures content from your file, app, or website, the system parses or breaks it down into small, discrete entities. This makes it easier to re-order a translation, whether for formatting or linguistic reasons. There are two levels of parsing: strings and segments.

Strings

A string is a unit of content. Smartling extracts strings from the captured content based on the paragraph markers found in your content. Those markers vary depending on the content or file type (see below table).

Segments

Strings are parsed into segments, defined by end of sentence punctuation markers, i.e. periods, semicolons, interrogation points.

Parsing Process

Parsing rules differ depending on the file type being processed. With an Excel file, for example, the content of each cell becomes a separate string in Smartling and each string gets a different variant. The table below shows the basic string segmentation approach for various file types, as well as how the variant attribute is set.

Content format

What the parser uses to define the boundaries of the string

Notes

Excel

Cell

In the same column is recommended.

Word

Paragraph or line break

Not a newline

Powerpoint

Paragraph/text box

The strings are arranged in order of when each text box was created, not where it is placed on the slide.

inDesign

Paragraph within a text frame

 

CSV

Cell. 

Optionally segment HTML strings into additional strings based on block-level tags.

Text file

Newline

 

HTML

Block-level tag

Examples; DIV or LI, but not SPAN or A.

See Capturing Content for a full list of block-level tags.

Key-based file formats, such as Java Properties, Xliff, iOS, Android

Key

Optionally split HTML strings into additional strings based on block-level tags.

JSON

Element or object

Optionally segment HTML strings into additional strings based on block-level tags.

XML

Element or object

Optionally segment HTML strings into additional strings based on block-level tags.

GDN HTML

Block-level tag

 

Strings API

Defined explicitly in API call

 

Markdown

Markdown codes corresponding to block-level HTML tags

 

 

Parsing behavior can be modified through the use of directives and rules. For example, specifying which columns of a CSV file to extract as source text for translation and which to use as keys/variants is done through the use of file-parsing directives, eg;

# smartling.source_key_paths=1

# smartling.paths=2

Directives can be included within the file itself or can be supplied through other methods, such API. The directives available for each file type are listed on the Smartling help page for that file type, see Preparing Content for Translation. 

Strings and Segments in the CAT Tool

Larger strings may be further divided into segments, only visible in the CAT tool. A segment is usually a sentence, with a sentence-ending punctuation mark such as a period (.), exclamation point (!) or question mark (?) creating a new segment.

The following example shows an entire string (denoted with a green vertical bar) that has been parsed or broken down into two segments. A Translator, Editor, or Reviewer will then be able to translate or edit each of the corresponding segments.

Merge Segments 

If you're translating a string with multiple segments, you have the option to merge segments. Mouse over the Merge segment into next icon. Alternatively, you can use the shortcut that you've set in your keyboard settings.

merge_segment.png

Was this article helpful?