Translating PDF documents
This article is for Account Owners and Project Managers.
Understand what to expect and how to get the best results for translating PDF documents.
Use the original file when possible
For the best results Smartling recommends using the original file if it is available in a format that Smartling supports. For example if the original document was actually a Microsoft Word or InDesign document, using the file in its original format typically produces better translation results than using a PDF version of the same document.
If translating infographics, user interfaces, or other highly stylized or formatted content, it’s always better to use a supported file format. Smartling has support a number of design tools for such use cases.
Sometimes the PDF documents you want to translate was not created using a supported application or the original file just is not available. Some PDF documents might even be "scans" or "images" of written documents for which there is no original digital file at all. This is a good time to take advantage of Smartling's support for PDF documents.
As with any other document, make sure the language the document is written in aligns with the source language of the Smartling project where it is uploaded. When you upload a PDF document here is what you and your translators can expect:
Translation flow for PDF files
Smartling handles PDF documents by first converting them to Microsoft Word format. Translation then follows the standard flow for Word documents. As such, when the translation is completed you will get back a Word document, not a PDF.
As a convenience, we will “attach” the PDF file that you upload as a reference to the converted document, allowing translators to download and review it. The converted document will use the same file name as the PDF with a “docx” extension appended to it. Visual context in the Smartling CAT tool will depend on the layout of the converted Word document.
Native PDFs vs. scanned documents
A native PDF is one that is created by an application using digital source content and is ‘saved’, ‘printed’, or ‘exported’ as PDF from a software application. A scanned document is produced by a document scanner or digital camera; it’s effectively an “image”.
The strings and formatting of native documents should be highly accurate after conversion. The formatting and layout of the converted document should be fairly similar to what you visually see in the PDF. This includes headings and titles, lists, paragraphs and even tables. As with all our standard supported file formats, text that is embedded in images will not be available for translation. Only the native text in these files is extracted when they are converted to Word documents.
Scanned documents will be processed using OCR to extract the text. Layout is not retained for such documents. The extracted content will be simply formatted as a series of paragraphs. The strings in the document may not be accurate compared to what you believe the content is in the original PDF file.
Previewing the strings and formatting
After you upload a PDF and Smartling has converted the content into a Word document you may want to review the strings or the Word document before authorizing translation. This is an opportunity to review the quality and accuracy of the strings as well as the formatting are suitable to begin translation. If you find that the content is not ready for translation you can download and edit the converted Word document then re-upload it before authorizing it to be translated. You can do this in the Smartling project or job. Download the converted Word document in the source language. Alternatively you can review the strings in the source language in the Smartling strings view, but if you want to make changes you'll need to download and edit the file.
Post translation formatting - AKA: Desktop Publishing (DTP)
While not unique to PDF documents, DTP may be more important than for other file formats after translation is completed. As a best practice for PDF; first make sure the content strings that are extracted from your document are accurate and complete before authorizing translation, as noted above. Don’t worry too much about formatting at that point. After the translations are complete is a good time to make final adjustments to formatting and layout if it’s important for your document.