🧮 Word Counts & Estimates Explained

Word counts represent the sum of words in source strings, excluding tags and placeholders. They represent the quantity of work done by translators, editors and reviewers in a Smartling account over a particular time period. Word count reports are intended for tracking and invoicing of work completed. The counts reflect the number of source words and weighted words handled by each user and are broken out by language, workflow step type, and fuzzy tier. SmartMatch information does not appear on word count reports, since it does not constitute work done by users.

Sample Word Count Report

Account

Project/
Job

Translation
Resource

Agency

Target
Language

Workflow
Step

Fuzzy
Profile

Fuzzy
Breakdown

Word
Count

Weighted
Words

Character Count

Company

Mobile App
Release X

John X

Agency XYZ

French (France)

Translation

Agency XYZ

profile

95 - 99.9%

120

480

Company

Mobile App
Release X

Mary Y

French (France)

Edit

120

500

This report is available in the Smartling dashboard, where it can also be downloaded in CSV format. Agencies can download reports for anything their team worked on, and individual translators, editors, and reviewers can download word count reports on their own work.

Character counts

Like word counts, character counts represent the sum of characters in source strings, excluding tags and placeholders. Character counts are particularly useful to represent the quantity of work done by translators, editors and reviewers translating into Chinese and Japanese, as these languages do not use spaces to separate words.

The method in which your files are ingested into Smartling, via file upload with directives or API, could affect how characters are counted. For example, if entities in the file are captured as an entity or as a character. The following table should help clarify some of these rules:

Code Point	Rule Description
Simple Whitespace	Sequential base whitespace characters [ \t\n\x0B\f\r] will be collapsed to one character All other raw Unicode whitespace characters are not collapsed, such as NBSP
Tags	All valid XML tags are not counted as characters All invalid XML tags are counted as characters
Placeholders	All valid placeholders are not counted as characters Any other type of invalid, unprocessed code or placeholder are counted as characters
Unicode Characters / HTML Entities	A specific set of entities are not counted as characters:(No-break space \| En space \| Thin space \| Zero width non-joiner \| Pahawj Hmong Sign Zwj Thaj \| Zero width Joiner \| Left-to-right mark \| Right-to-left mark \| `(&(?:nbsp\|#160\|#xA0\|ensp\|#8194\|#x2002\|emsp\| #8195\|#x2003\|thinsp\|#8201\|#x2009\|zwnj\|#8204\| #x200C\|zwj\|#8205\|#x200D\|lrm\|#8206\|#x200E \|rlm\|#8207\|#x200F);)` Other entities are counted as characters
All other code points	All other code points are counted as characters

Smartling does not process mixed character and word-based languages for word counts. It chooses either character-based or word-base counting, based on the source locale of the project. It is applied to all characters and words in the project. For this reason, there may be a difference in the word count displayed here vs in a document tool, such as Microsoft Word.

Fuzzy matches

When assessing the size of a piece of translation work, it’s customary in the industry to consider previous translations that were available to the translator at the time they did the work. For example, translating the string “Once upon a time” when a previous translation was available for the string “Once upon a time,” (with a comma) would be considered less work for the translator than if it had to be translated from scratch. A pre-existing entry in the translation memory similar to the string being translated is called a ‘fuzzy match’.

Fuzzy scores

The measure of similarity between one string and another is known as the ‘edit distance’ and is calculated using an industry-standard method called Levenshtein. Edit distances for fuzzy matches are surfaced in Smartling as percentages called ‘fuzzy scores’, with higher percentages signifying greater similarity. In the example above, the fuzzy score for the string being translated is above 90%. A high fuzzy score means that the translation memory already contains a translation for a string very similar to the one being translated, and usually results in reduced translation costs.

Fuzzy profiles

Pricing for translation work is typically based on the number of source words translated along with the fuzzy scores of the strings when translated. Fuzzy score ranges are organized into pricing bands or ‘tiers’, with higher-score tiers receiving progressively greater discounts. This is illustrated below.

Fuzzy Tier (fuzzy score range)	Fuzzy discount example
0% - 84.9%	Full per-word translation rate (no discount)
85% - 94.9%	60% of full per-word translation rate
95% - 99.9%	30% of full per-word translation rate
100% - 100%	10% of full per-word translation rate

Sample Fuzzy Pricing Tiers

The definition of a set of fuzzy pricing tiers like the above is known in Smartling as a ‘fuzzy profile’. Multiple fuzzy profiles can be defined for an account, but currently a single profile must be chosen for each language in a given project. This may change in the future to allow fuzzy profiles to be associated with agencies instead of languages in a project.

Your Smartling Customer Success Manager can assist with configuring new fuzzy profiles for your projects if required.

The fuzzy profile that was active when a string was translated is included in the word-count report, along with the string’s fuzzy tier (called ‘fuzzy breakdown’ in the report).

By default, fuzzy tier-based pricing applies only to work done in the translation step, with editing charged at the full per-word editing rate, regardless of fuzzy matches. This is the industry standard.

Language Quality Estimation levels

The Fuzzy Breakdown column in the Word Count report will display either the fuzzy tier for available TM matches or the Language Quality Estimation level (Low, Medium, or High) if a translated string underwent Language Quality Estimation. See Language Quality Estimation Agent for Machine Translation for more details.

Weighted words

The concept of ‘weighted words’ exists to simplify price calculations and thus reduce the chance of errors. Instead of multiplying the number of source words in each fuzzy tier by the tier’s discount and the per-word price, simply multiply the weighted word count by the full per-word price. The weighted-word count has the discount built in. For the sample word count report above, assuming a per-word price of 10 cents, here is how to calculate the price of the work represented in the first row of the report using weighted words:

Calculate price: weighted word count x price per word:
36 x 10 = 360 cents

Weighted words counts are rounded up to the nearest whole word.

Page counts for DTP

Desktop Publishing (DTP) usually incurs an additional cost, invoiced as a separate line item based on the number of pages that require DTP. To make budgeting easier, Smartling lets you include the estimated DTP cost directly in your cost estimates. Learn more in Cost Estimates for Desktop Publishing.

How is work credited?

The main rules for how work is credited are:

Translation work is credited to the translator who submits it, at the time of submission. If two translators work on a string, the one who submits it gets the credit. However, if any other user submits a translator’s work, or moves it out of the step, the translator gets the credit.
For editing and review steps, work is credited on submission, regardless of whether any changes were made to the translations. Work is not credited when translations are rejected back to earlier steps; it’s credited when it is eventually submitted.
If an Account Owner, Project Manager, Agency Account Owner, or Translation Resource Manager (that is not assigned to the step) submits an editor’s or reviewer’s work, the submitter gets the credit in the word count report—not the editor (even if the editor made changes to the translation). If a Smartling Admin (such as your CSM or SA) submits an editor’s or reviewer’s work, nobody gets credit as Admins are not counted in word count reports.
If a string is rejected, re-worked and submitted again, it is credited once. However, if a second freelancer or a translator from a different agency works on the rejected translation, both translators receive credit.
If a string is unauthorized before a saved translation is submitted (e.g., when a job is cancelled or a file is replaced by an updated version without the string), the following applies:
- If a string was saved and subsequently unauthorized while it was still in the Translation step, the last translator to save it receives credit at the time the string disappears - unless that user was an Account Owner, Project Manager, or Smartling Admin, in which case, no one gets credit.
- If a string was saved and subsequently unauthorized in a post-translation step (Edit, Review or Post-Edit), the last linguist to save the translation receives credit only if they had made an edit to the translation before saving it. If an existing translation is saved in a post-translation step without any changes and the string is then unauthorized before being submitted, the linguist who had saved the unedited translation will not be credited.
If the string subsequently reappears (e.g., becomes active again), and is retranslated and submitted by the original translator, it’s counted only once. If a second translator translates and submits the re-activated string, then both translators receive credit.

Plural forms in the Word Count Report

The additional work required of translators when translating strings ingested with plural forms is reflected in the Word Count Report. Plurals are counted with a locale-specific multiplier. The number of source words is multiplied with the number of plural forms for each target locale.

Cost estimates

A cost estimate is a prediction of a job’s final word-count totals and cost if the job were to be translated at the point when the estimate is run. It's based on the fuzzy estimate for the content in the job, along with the appropriate fuzzy profiles and rate cards. It indicates how large a job is and what it would be likely to cost if translated now, approximately. Below is an example of a cost estimate.

estimate details.png

Cost estimates like the above are available only to Account Owner and Project Manager roles. They are accessible in the Job Summary pane of a Job.

In addition to the estimate for the whole job that is available in the Job Summary, a high-level estimate is automatically run during the process of authorizing a job. This estimate includes only the unauthorized content, and so might differ from the total estimate for the job if some content has already been authorized. This estimate only displays the top-level totals for source words, weighted words and cost.

It’s possible to compare estimates for different workflows by selecting them in the dropdown on the Authorization dialog; separate workflows per language can be chosen using the ‘show details’ option.

Agencies and translators can run fuzzy estimates for all workflow steps they have access to once content reaches those steps. Fuzzy tiers will only be applied if the workflow step has a Fuzzy Match Profile associated with it; otherwise, the estimate will simply display the Word Count for that step. Published content is not included in estimates.

Actuals not included

With one exception, actual historical data from the job is not factored into the estimate. For example, it’s possible to run an estimate on a completed job; and if it happened that in this job a user not assigned to the step moved the content past the edit step and so removed the cost of editing from it, this would be ignored by the estimate. Instead, the estimate process would proceed on the assumption that if the job were translated now, the content would go through editing. The one exception to this rule about ignoring actuals is that for any translated strings, the fuzzy score that was available at the time of translation is used in the estimate instead of the fuzzy score available in the current leveraged translation memory.

Cost and prices

Cost figures are included in the estimate if the appropriate rate cards have been added to the platform. A valid rate card, i.e., one that includes a rate corresponding to the workflow step and content type, must be available for at least one of the assigned resources in every step of every workflow included in the estimate. If some rate cards are missing, the figures are displayed where possible (e.g., for certain languages or workflow steps), and a warning is displayed next to the overall cost estimate amount. If the relevant rate cards include multiple currencies, the total is not calculated and a ‘multiple currency’ message is displayed instead.

Only Smartling Admins and Account Owners can enter rates.

Internal fuzzies and repetitions

As translation work on a job progresses, the translation memory is gradually populated with entries from the job itself, resulting in these entries becoming available as potential fuzzy matches for future translations in the same job. Future fuzzy matches that come from the job itself are known as ‘internal matches’ and are included in the estimate in the appropriate fuzzy tiers.

Identical (100%) internal matches are known as ‘repetitions’, and are broken out separately in the estimate as it’s common for a separate price to be agreed for repetitions.

Repetitions and internal matching are only displayed for untranslated content. If repetitions have not yet been translated in the CAT tool, they will appear in the estimate, even after MT translation. However, once a linguist saves a translation for a repetition in the CAT tool, the repetition will no longer be visible in the estimate if it is regenerated.

Understanding 100% matches: estimates vs. CAT Tool

You may notice that estimates show 100% matches, but none appear when you open the job in the CAT Tool.

This is because estimates include internal "fuzzy" matches—segments that match within the same job. These only appear in the CAT Tool as you translate and save segments.

Estimates assume all possible internal matches and repetitions (100% matches) will be used with auto-propagation. In the CAT Tool, these matches show up after a segment is translated and auto-propagation is applied.

Key points:

Estimates assume all repetitions use auto-propagation.
Work is logged as a repetition in the Word Count report only if auto-propagation is used. Without it, you may still get a 100% internal match.
Internal matches don't appear in the CAT Tool until translation work is done and segments are saved.
Bulk-saving repetitions without auto-propagation may miss internal matches, as all saves happen at once.

SmartMatch

Strings that can SmartMatch are broken out separately in the estimate and are counted as zero weighted words for the translation step, since the translation will be automatically applied. If the strings will SmartMatch to Published, they are not included in the counts for any post-translation steps. However, if the strings will SmartMatch to a post-translation step other than Published, they are assumed to go to the first post-translation step, and thus will be included in the counts for all post-translation steps. This can result in an over-estimate for this content.

Dynamic workflows

Estimates assume that all content destined for a dynamic workflow will travel through the default branch of the workflow. This is true even if some of the content is already in a different branch. Therefore, consideration should be given to how best to estimate dynamic workflows.

Machine or AI Translation steps

No costs are estimated for machine translation steps, where the translation provider is an MT engine or LLM, or for AI Translation steps, such as those in Smartling Language Services AIT and AIHT workflows, as no human linguists are involved. Although the fuzzy breakdown is still visible for these steps, it is not used in cost calculations.

Plural forms in cost estimates

Plural forms are not reflected in cost estimates. Instead, the string is counted as if it were a single source string with a corresponding single translation. Please note that in the Word Count Report, plural forms are counted and a multiplier is applied to the source word count.

How estimates are calculated

The following steps are used in calculating estimates:

Determine workflows. For each string in the job, identify a workflow for each language in order to establish which steps to include and which rate cards to use:
- For unauthorized content, the default workflow for each language is used for the estimate. If the default workflow is a dynamic workflow, the default branch of the dynamic workflow is used.
- For authorized content, the actual workflows in which the string sits are used for the estimate. If the workflow is a dynamic workflow, the default branch is used in the estimate even if the content is currently in a different branch.
Calculate TM match scores and weighted words. If the string in question already has a translation, then the match score that was available at the time of translation is used in the estimate. For untranslated strings, the following are checked and then the weighted word counts for the string can be calculated:
- SmartMatch: SmartMatches to Published are counted at zero cost. SmartMatches to a post-translation step (other than Published) are counted as full rate words and will not appear under the SmartMatch line item for post-translation steps in the cost estimate. If the post-translation step applies fuzzy match weighting (e.g., AI Review, Post-Editing, or any other post-translation step depending on the Fuzzy Match Profile configuration), SmartMatches are mapped to the 100% tier.
  
  To summarize, if SmartMatch is configured to send matches to a post-translation step instead of directly to Published, the estimate will show them in the first post-translation step—regardless of the specific step configured. They will either: be counted as full rate words, or if fuzzy match weighting is applied to the step, be mapped to the 100% tier, and the SmartMatch word count for that step will be 0.
- Fuzzy match: Check the fuzzy score of the (untranslated) string using the leveraged translation memories as they are at the time the estimate is run. Determine which fuzzy tier this falls into and calculate the corresponding weighted words. Large strings (>10K characters) are excluded from fuzzy-match estimates due to the expense of calculating; they’re assumed to have no fuzzy matches but are included in the total word count of the estimate.
- Internal Match: Check the fuzzy score of each string against that of strings in the same job that will be translated before it. The higher of the internal and TM match scores is used.
Calculate cost. Use the rate card corresponding to the workflow content type and the rate corresponding to the step type to determine the cost (or cost range if multiple differing rate cards could apply) for each workflow step. The calculation is done to four decimal places and then rounded up to two at the end.
Add up totals. Add up counts for all strings, steps and languages. Source word count and weighted word count for each string that is likely to be translated; source word count for each string that is likely to pass through editing and review steps.

Why estimates can change over time

Estimates that are re-run at different times over the life of a job can produce significantly different results. This is because an estimate is a real-time prediction of the job’s final word counts based on the current situation, which may have changed from when previous estimates were run. Below are some of the factors that contribute to estimates changing:

Translation memory changes. As translation work proceeds in an account across multiple translation jobs, the translation memory is continually updated, and these updates can provide new fuzzy-match and Smart Match options to the current job, resulting in reduced estimates. In addition, new TMs can be imported to the leverage configuration. It’s also possible, though less likely, that entries get deleted from a TM resulting in fewer match options than were originally available, which could result in an estimate increasing.
Job content changes. Files in a job can be updated while the job is under way resulting in content being added or removed from the job, which can change the estimate.
Workflow changes. Content might be authorized into or moved to different workflows than assumed in the original estimate, resulting in different translators and editors with potentially different rates, or different workflow steps, being included in the estimate.
Configuration changes, such as which translation memories are included in the leverage configuration, Smart Match settings, workflow steps and settings, rates, etc. can all affect an estimate.

With the exception of TM changes that occur as a result of normal translation work, the changes above can be managed to reduce their impact on active translation jobs. Nonetheless, it’s advisable to save a copy of the original estimate, and the estimate when the content has entered its workflows, as well as any estimates run after significant changes of the types mentioned above.

Why actual word counts can differ from estimates

Actual word counts can differ from what was estimated for various reasons as described below:

Translation memory changes. The state of the translation memory at the moment a translation is saved can be different from how it was the time the estimate was run. This can happen for various unpredictable reasons, but is most commonly due to new translations being saved to the TM in other translation jobs, and in that case results in the actual cost being lower than estimated - a benefit of using a cloud-based translation platform. Note that once a string is translated, the match score that was available at the time of translation is used in subsequent estimates, even if the translation memory has since changed.
Order of submission. Once a translation is published, it becomes available for SmartMatching, which could result in something that was counted as a repetition in the estimate being translated by SmartMatch, potentially reducing the cost.
SmartMatch to post-Edit step. When estimates are calculated, SmartMatching to non-Published steps is assumed to move the content to the first post-Translation step. This over-estimates the cost for strings that will SmartMatch to a post-Edit Review or Hold step.
Skipping steps. Content can be manually moved past a step, removing the actual cost for that step; or it can automatically skip an editing step due to the workflow ‘skip edit’ or be automatically moved out of a step due to the ‘idle strings’ configuration option. In all these cases, the estimated cost of the step will not be incurred.
Dynamic Workflow effects. Estimates currently assume that all content goes through the default branch of a dynamic workflow. However, some of the content is likely to traverse different branches with likely different and potentially higher costs.
Rejected SmartMatches. If smartmatched translations are subsequently rejected and worked on by a translator, the word count reports will reflect this work, but an estimate will not.
Translating before SmartMatch. If a translator enters a translation for a string before SmartMatch does, then the cost will be that of the translator’s work rather than the SmartMatch, and thus could be higher than what was estimated.

Using estimates for budgeting

Estimates serve this use case reasonably well because deviations that occur in the actual word counts will generally lower the cost from the estimated amount. However, a number of exceptions to this need to be borne in mind:

Changes to the job contents can increase the size and cost of the job.
Workflow changes could introduce additional steps resulting in increased cost.
Configuration changes, such as removing a TM from the leverage could result in increased cost.
Dynamic workflows are not fully accounted for in estimates and could result in an under-estimate.

Ideally, the first three items above can simply be avoided once the job is in progress. If they can’t be avoided, a re-estimate may be required along with a potential budget adjustment. For the dynamic workflows issue, it may be advisable to configure the workflow such that the default branch is the most expensive to ensure that dynamic-workflow translations won’t result in cost increases from what was estimated.

One challenge with this use case is how to handle a significant deviation between the estimate and the actual cost. If the actual cost is higher, then either you or your translation vendor will need to bridge the gap in order to pay translators for their actual work. On the other hand, a lower actual cost can present a challenge in managing translators.

Stop words and fuzzy matching

Smartling filters out common stop words when indexing strings for search. These words are not stored in the index and are therefore not considered for fuzzy matching in the CAT Tool or for leverage in estimates.

If a string contains both stop words and non-stop words, only the non-stop words are indexed. If a string consists only of stop words, such as "no" or "not," it will not generate fuzzy matches or leverage.

For English, the stop-word list includes:
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

Considerations

A key point to note is that estimates are ‘real time’ and can give different results when run at different times for the same job. Because of this, the following is recommended:
- Refresh the estimate before using it to be certain you have the latest.
- Make a copy of any estimate that you intend to use or reference in the future. A download CSV option is available in the estimate details, but suggest taking a screenshot too.
- If it’s important that the estimate not change much or that the final word counts be close to the estimate, limit the kinds of changes that can affect this while jobs are in progress (see sections above).
- If changes must be made that will affect estimates and word counts, make sure that they are well communicated.
A small exception to the ‘real time’ point above is that once a translation is saved, the fuzzy score that was available when saved gets used henceforth in the estimate.
Warnings next to total cost figures should be taken seriously as they can indicate a very significant underestimate of total cost. The details of the estimate should be examined (including all languages) to determine which parts are missing, and this should be addressed before using the estimate.
Estimates for dynamic workflows assume that all content goes through the default branch. Consider using the default branch for estimating only. For example, if using language-based branching, add all language resources to the default branch so that they’re reflected in the estimate, but configure rules so that each language goes through a non-default branch.
Since MT steps currently are treated the same as other steps, i.e., they’re expected to have a rate card, consider creating a dummy user with a suitable rate (e.g., 0 or ‘unpayable’) and assigning to MT steps, to avoid warnings about missing rates for those steps.
Remember that SmartMatch to non-Published steps can overestimate cost since it always assumes that the content will Smart Match to the first post-translation step.

Get answers instantly