Capturing Visual Context

Video Upload for Image Context Extraction

This article describes using video screen recordings to contextualize mobile, desktop and embedded apps. For subtitles and video, see the video subtitle article.

Not all application user interfaces are rendered as HTML. Native mobile and desktop apps often do not have any HTML version at all. Smartling supports Image Context (screenshots) when HTML is not available.  However, these options may not best suited if your application has lots of screens and user interfaces where the strings can display. 

An easy solution is to generate a high-quality video screen capture of your product. Screen capture/recording functions are commonly built into Android, iOS and MacOS. There are also a number of apps available. Recording a video while running your app to display the various UIs and screens, is much easier than stopping to take screenshots as you navigate from screen to screen.

Smartling's now supports screen capture video upload for image context extraction. It automatically processes the screen capture video to extract Image Context (screenshots) and also connects the strings from your project to the images that are extracted from the video. Translators will see the context for those strings as screenshots (still images) in the CAT Tool.

Translators will not see the video in the CAT Tool, just the extracted screenshots.

Since the process produces standard Image Context resources, all the features for Image Context are available after your video is processed. You can edit the Image Context items to add or remove the strings associated with that image as needed to correct for anything the automation didn't get quite right.

See Review and Troubleshooting OCR for more information.

Optical Character Recognition

Smartling uses OCR (Optical Character Recognition) to automatically match the text in context to your content. You can choose to match the text in the image with strings in your entire project, or in a specific file in the project. Matching can be reviewed and modified manually, after the automatic recognition has taken place.

For OCR to work, make sure your source content (strings) have been uploaded to Smartling before you upload your screenshot.

Technical Requirements

Video length and resolution

Up to 5 minutes for 2000x1100px (or 1100x2000px) 

Up to 2 minutes for 3200x2100px (or 2100x3200px) 

Video file size 

Up to 500Mb


Up to 60fps

File format


For API upload, the video must be posted at a publicly available URL, no login should be required to access the video file. YouTube or other hosted solutions are not supported. 

How to generate visual context for your native apps

Record the App

  1. Using the screen recording app of your choice, step through your application while recording the entire screen or the app.
    * please use test/dummy data to avoid exposing sensitive or confidential information
  2. Decide on your method of upload - via the API or Smartling Dashboard


  1. Post the video at a publicly available URL 
  2. Use the API reference to upload the video to the correct Smartling project

Smarting Dashboard

  1. Send/save/download the video to your local drive
  2. Log into Smartling and go to the Project you want to upload the context in
  3. Click the Context Tab
  4. Click Upload ContextScreenshot_2021-09-08_at_19.47.46.png
  5. Drag and drop the video file directly to the wizard, or click Select File to open and select the file to upload from your local drive
    • Multi-select and bulk-upload are supported
  6. Click Upload
  7. Next, select covert the video to images
  8. Alternatively, if you want to use this video file as visual context for SRT translation, select Use full video for subtitle files
  9. Choose to match text in the context file with all strings in the current project, or select a specific file from the dropdown list of all files in the project
  10. Click Upload
  11. A success message will appear at the bottom of the screen, confirming the number of strings that have been match to the context fileScreenshot_2021-09-08_at_19.51.25.png

What happens next?

After processing is complete, a set of Image Contexts will be in your project. They will use the same URL to identify the source file and a unique title like "Frame 1 – 01:10.300".

These will have strings matched using OCR. You can then update the strings as normal via Image Context API or dashboard features. 

Reviewing Matched Strings

You can review the strings in one of two ways:

  1. On the Context tab for your Project, all the images will use the URL and a time stamp; enter the original URL in the context Title/URL search field to narrow down the context assets to just those of the video.
  2. In the Strings tab for your Project, enter the original URL in the Context Search filter (do *not* click "exact") to narrow down to the strings those that have been associated with Image Context from the video.

See Review and Troubleshooting OCR for more information.

API reference

You will make a single POST API call with the video’s URL to the to the /context-api/v2/projects/{projectId}/contexts/upload-and-match-async API endpoint. The API response will contain an matchId you can use to check on the video processing progress and get results using the /context-api/v2/projects/{projectId}/match/{matchId} endpoint.

Example POST call

POST  /context-api/v2/projects/{projectId}/contexts/upload-and-match-async

Content-Type: multipart/form-data; boundary


Content-Disposition: form-data; name="name" 


If there are no strings matched to the video then the original video will be deleted and screenshots will not be created.
/match/{matchId} endpoint will return empty bindings in this case:
"bindings": []

Was this article helpful?