Translating Subtitle Files

Smartling supports the translation of subtitles via SubRip (SRT) files and WebVTT files. These files are typically uploaded to a Media project. However, they can be uploaded to any Files project. Subtitle files contain dialogue and other information to be displayed on-screen during a video. They make audio content accessible to the deaf or hard of hearing and viewers who speak different languages.

Subtitle files include subtitles and their timings, also known as timestamp entries. Timestamp entries contain timecodes that indicate when each subtitle should appear and disappear on the screen. These timecodes ensure that the text is synchronized with the corresponding audio and visual content in the video. Each timestamp entry specifies the exact start and end times for displaying a particular subtitle line or block.

Standard Subtitle Parsing vs. Enhanced Subtitle Parsing

Oftentimes, linguistic units (e.g., a sentence) are spread across more than one timestamp entry. As a result, sentences may be separated into multiple strings in Smartling, limiting the ability to use Machine Translation (MT) or leverage Translation Memory.

Smartling's parsing modes help prevent these issues:

When you upload a subtitle file to Smartling, you have the option to use Standard Subtitle Parsing or Enhanced Subtitle Parsing. With these parsing modes, you can control how content from subtitle files is parsed, allowing you to translate subtitle files using MT and effectively utilize your Translation Memory.

With Standard Subtitle Parsing, each timestamp entry in the subtitle file will be parsed as a separate string.
Example:
With Enhanced Subtitle Parsing, strings in the subtitle file will be parsed based on sentence markers, instead of by timestamp entry. This parsing mode is optimized for using Machine Translation.
Example:

If no upload option is selected, the default mode for your account will be used:

For accounts created before May 2024, the Standard Subtitle Parsing Mode will be used by default.
For accounts created in May 2024 or later, the Enhanced Subtitle Parsing Mode will be used by default.

MT Mode Directive

Subtitle parsing can also be controlled using the mt_mode directive for SRT and WebVTT files.
This directive can be applied via API file upload; it is not supported inline:

SRT - srt_mt_mode
WebVTT - vtt_mt_mode

If this directive is enabled (on), the file will be uploaded with Enhanced Subtitle Parsing.
If this directive is disabled (off), the file will be uploaded with Standard Subtitle Parsing.

How Content Is Parsed Into Strings

When Enhanced Subtitle Parsing is enabled, if a sentence is split across multiple timestamp entries in the subtitle file, they will be combined to form a single string in Smartling.

For example, your SRT file might look like this:

1
00:00:00,300 --> 00:00:04,066
The ocean covers more than 70% of the Earth's surface.

2
00:00:04,066 --> 00:00:07,566
Reaching as far down as 36,000 feet in some places,

3
00:00:07,566 --> 00:00:11,100
the waters of our planet occupy a staggering volume.

If you select Enhanced Subtitle Parsing, or use the directivesrt_mt_mode=on, each sentence will be parsed as one string, resulting in two strings in Smartling. This allows the content to be machine-translated and generates effective translation memory leverage.

String 1	The ocean covers more than 70% of the Earth's surface.
String 2	Reaching as far down as 36,000 feet in some places, the waters of our planet occupy a staggering volume.

If you select Standard Subtitle Parsing, or use the directive srt_mt_mode=off, each timestamp entry will be parsed as one string, resulting in three strings in Smartling.

String 1	The ocean covers more than 70% of the Earth's surface.
String 2	Reaching as far down as 36,000 feet in some places,
String 3	the waters of our planet occupy a staggering volume.

How Translated Files Are Generated

When text from multiple timestamp entries are combined to create strings, Smartling retains the original timestamps and the number of entries from the source file. This information is later used to generate the translated file. Smartling uses an algorithmic decision-making process to map strings back to their original timestamp entry (or entries) from which they came. The length of the new translation is assessed and broken up to be distributed across the original timestamps.

This process includes logic to ensure the translated output is congruent with the source file. This logic helps verify the translated file does not contain long lines of text, appropriate line wrapping is used, timestamp entries do not contain more than two lines of text, words are not split, and strings are broken up at the appropriate word boundaries.

Visual Context

When translating subtitle files, you should always add a video file for Visual Context. Otherwise, translators will not see any Visual Context in the CAT Tool. See Add a Video for Subtitle Translation. This allows linguists to visualize how the translated subtitles will appear to the end user once the subtitle file is translated.

When the Enhanced Subtitle Parsing mode is enabled, the source text from multiple timestamp entries can be combined to form one string in Smartling. In these cases, the context service retains the original timecode data from the source file. It then uses this data to determine when to display the appropriate video snippet and subtitles in the CAT tool context viewer.

When a linguist selects a string by clicking within the target field, the context viewer will play the video snippet with the subtitles displayed at the bottom of the video. When the translation length exceeds what can be displayed in one line, the video context viewer will display the second line of text only when the cursor is moved to the text that would be part of that second line.

context cursor.gif

Additionally, if a linguist uses the Play button to play the video for this subtitle, the video player will loop through the entire section of the video linked to these subtitles and will correctly display each line of the subtitle.

play context video.gif

Without the Enhanced Subtitle Parsing mode, a separate string is created for each individual timestamp entry from the source file. When a linguist works in the CAT tool, the context viewer will simply display the corresponding video snippet and subtitle text based on the timestamp entry timecodes.

Hey! Hoi! ¡Oye! Ciao ! 你好! Hallo! Salut ! Hey! How can we help?

Standard Subtitle Parsing vs. Enhanced Subtitle Parsing

MT Mode Directive

How Content Is Parsed Into Strings

How Translated Files Are Generated

Visual Context

Table of Contents