LLM Fine-Tuning by Smartling

LLM fine-tuning is a service offered by Smartling that customizes a large language model (LLM) to produce translations tailored to your brand's voice, terminology, and style. Rather than relying on a generic translation engine, fine-tuning teaches a powerful foundational model to translate the way your brand speaks, not just accurately, but with the right voice and style.

Smartling also offers Custom MT Engine Training for traditional Neural Machine Translation engines. LLM fine-tuning offers several advantages over traditional custom MT engine training. Fine-tuning combines long-term memory (from training on your data) with short-term, segment-level memory (from TM and glossary matches retrieved via RAG at translation time).

	Custom NMT	Fine-tuned LLM
Language understanding	Translates each sentence independently, without broader context	Understands context beyond the sentence, including tone, register, and stylistic consistency across a document
Customization	Limited to what was in the training data	Can be adjusted on the fly through prompts and RAG, with style instructions and terminology injected into each translation request, without retraining the model
Training data	Needs tens of thousands of parallel segments	Effective with smaller, carefully curated datasets
Strengths	High-volume, straightforward translation	Brand voice, idiomatic adaptation, and cultural localization

How it works

Smartling offers LLM fine-tuning as a full-service solution. The process includes:

Asset collection & scoping: You provide your translation memory, glossary, style rules, and the language pairs you want to fine-tune. Your Customer Success Manager (CSM) will walk you through the intake checklist (see below).
Data curation: The Smartling AI team cleans and curates your training data to ensure quality and consistency. If your data requires beyond-standard cleaning, this will be scoped and agreed upon separately with your CSM.
Prompt & RAG configuration: Your brand voice, formality preferences, and translation instructions are captured in a targeted prompt with RAG examples from your translation memory and glossary that augments every translation request. This means the model sees relevant prior translations and approved terminology as in-context examples for each segment.
Model training: The LLM is fine-tuned on your curated data so it learns not just how to translate, but how to follow your specific translation instructions.
Evaluation & deployment: The fine-tuned model is evaluated for quality, then deployed and monitored within your Smartling workflows.

What you need to provide

The quality of your fine-tuned model depends directly on the quality and completeness of the assets used for training. Below is an overview of what you'll be asked to provide.

1. Translation memory (required)

Your translation memory is the primary data source used to train the model. The Smartling AI team will perform standard cleaning and curation before training begins. Provide the name(s) of the TM(s) you want used for training.

Important: The fine-tuned model will reflect whatever is in your training data. Mixed, inconsistent, or low-quality data (for example, TM entries that use both formal and informal registers) will produce inconsistent output. Before training begins:

Review your TM for consistency, particularly around formality register, style, and accuracy.
Flag any known data quality concerns to your CSM so they can be addressed in advance (e.g., "We switched translation providers on a specific date; only include TM entries after the switch").

2. Glossary (highly recommended)

Your active glossary entries teach the model your brand-specific terminology and ensure key terms are translated consistently. If your glossary is of poor quality, we recommend cleaning it up or excluding it from training. Provide the link or name of the glossary (or glossaries) to include.

3. RAG prompt information form & style rules (required)

This captures the additional context used to augment the LLM's translation prompt, including brand tone, formality, specific instructions, and constraints for the model. You will need to complete and return a RAG Prompt Information Form provided by the Smartling team. If you have additional rules or requirements beyond what the form covers, add them there as well.

4. Language pairs (required)

Fine-tuning is scoped per language pair. Confirm the full list of source → target locale pairs you want included (e.g., en-US → fr-FR, en-US → de-DE).

If multiple content types have been agreed upon, specify the applicable languages for each content type and provide separate assets and requirements per type.

What you get

A custom fine-tuned model that:

Reflects your brand voice and terminology consistently across every language and content type
Usually shows a 10+% reduction in Translation Edit Rate (TER) compared to generic LLMs of the same family, meaning less post-editing to reach publish-ready quality
Can be steered at inference time (when the model processes a translation request) through prompts and RAG, with no retraining needed to adjust instructions
Improves dynamically as your TM and glossary grow, since updated matches are retrieved via RAG at translation time. Smartling also recommends retraining the fine-tuned model every 6–12 months to incorporate new data.

Questions?

Reach out to your Smartling CSM to get started or to learn more about whether LLM fine-tuning is the right fit for your translation program.

Get answers instantly