Word Counts & Estimates Explained

This article outlines the meaning and use case of the following concepts:

Word Counts

Word counts represent the sum of words in source strings, excluding tags and placeholders. They represent the quantity of work done by translators, editors and reviewers in a Smartling account over a particular time period. Word count reports are intended for tracking and invoicing of work completed. The counts reflect the number of source words and weighted words handled by each user and are broken out by language, workflow step type, and fuzzy tier. SmartMatch information does not appear on word count reports, since it does not constitute work done by users.

Sample Word Count Report












Mobile App
Release X

John X

Agency XYZ

French (France)


Agency XYZ


95 - 99.9%




Mobile App
Release X

Mary Y


French (France)






This report is available in the Smartling dashboard, where it can also be downloaded in CSV format. Agencies can download reports for anything their team worked on, and individual translators, editors, and reviewers can download word count reports on their own work.

Character Counts

Like word counts, character counts represent the sum of characters in source strings, excluding tags and placeholders. Character counts are particularly useful to represent the quantity of work done by translators, editors and reviewers translating into Chinese and Japanese, as these languages do not use spaces to separate words.

The method in which your files are ingested into Smartling, via file upload with directives or API, could affect how characters are count. For example, if entities in the file are captured as an entity or as a character. The following table should help clarify some of these rules:

Code Point Rule Description
Simple Whitespace
  • Sequential base whitespace characters
    [ \t\n\x0B\f\r] will be collapsed to one character
  • All other raw Unicode whitespace characters are not collapsed, such as NBSP
  • All valid XML tags are not counted as characters
  • All invalid XML tags are counted as characters
  • All valid placeholders are not counted as characters
  • Any other type of invalid, unprocessed code or placeholder are counted as characters
  • A specific set of entities are not counted as characters



  • Other entities are counted as characters
All other code points All other code points are counted as characters

Fuzzy Matches

When assessing the size of a piece of translation work, it’s customary in the industry to consider previous translations that were available to the translator at the time they did the work. For example, translating the string “Once upon a time” when a previous translation was available for the string “Once upon a time,” (with a comma) would be considered less work for the translator than if it had to be translated from scratch. A pre-existing entry in the translation memory similar to the string being translated is called a ‘fuzzy match’.

Fuzzy Scores

The measure of similarity between one string and another is known as the ‘edit distance’ and is calculated using an industry-standard method called Levenshtein. Edit distances for fuzzy matches are surfaced in Smartling as percentages called ‘fuzzy scores’, with higher percentages signifying greater similarity. In the example above, the fuzzy score for the string being translated is above 90%. A high fuzzy score means that the translation memory already contains a translation for a string very similar to the one being translated, and usually results in reduced translation costs.

Fuzzy Profiles

Pricing for translation work is typically based on the number of source words translated along with the fuzzy scores of the strings when translated. Fuzzy score ranges are organized into pricing bands or ‘tiers’, with higher-score tiers receiving progressively greater discounts. This is illustrated below.

Fuzzy Tier (fuzzy score range)

Fuzzy discount example

0% - 84.9% 

Full per-word translation rate (no discount)

85% - 94.9%

60% of full per-word translation rate

95% - 99.9%

30% of full per-word translation rate

100% - 100%

10% of full per-word translation rate

Sample Fuzzy Pricing Tiers

The definition of a set of fuzzy pricing tiers like the above is known in Smartling as a ‘fuzzy profile’. Multiple fuzzy profiles can be defined for an account, but currently a single profile must be chosen for each language in a given project. This may change in the future to allow fuzzy profiles to be associated with agencies instead of languages in a project.

Your Smartling Customer Success Manager can assist with configuring new fuzzy profiles for your projects if required.

The fuzzy profile that was active when a string was translated is included in the word-count report, along with the string’s fuzzy tier (called ‘fuzzy breakdown’ in the report).

By default, fuzzy tier-based pricing applies only to work done in the translation step, with editing charged at the full per-word editing rate, regardless of fuzzy matches. This is the industry standard.

Weighted Words

The concept of ‘weighted words’ exists to simplify price calculations and thus reduce the chance of errors. Instead of multiplying the number of source words in each fuzzy tier by the tier’s discount and the per-word price, simply multiply the weighted word count by the full per-word price. The weighted-word count has the discount built in. For the sample word count report above, assuming a per-word price of 10 cents, here is how to calculate the price of the work represented in the first row of the report using weighted words:

  1. Calculate price: weighted word count x price per word:
    36 x 10 = 360 cents

Weighted words counts are rounded up to the nearest whole word.

How is work credited?

The main rules for how work is credited are:

  • Translation work is credited to the translator who submits it, at the time of submission. If two translators work on a string, the one who submits it gets the credit. However, if any other user submits a translator’s work, or moves it out of the step, the translator gets the credit.

  • For editing and review steps, work is credited on submission, regardless of whether any changes were made to the translations. Work is not credited when translations are rejected back to earlier steps; it’s credited when it is eventually submitted.

  • If an Account Owner, Project Manager, Agency Account Owner, or Translation Resource Manger (that is not assigned to the step) submits an editor’s or reviewer’s work, the submitter gets the credit in the word count report—not the editor (even if the editor made changes to the translation). If a Smartling Admin (such as your CSM or SA) submits an editor’s or reviewer’s work, nobody gets credit as Admins are not counted in word count reports.

  • If a string is rejected, re-worked and submitted again, it is credited once. However, if a second freelancer or a translator from a different agency works on the rejected translation, both translators receive credit.

  • If a string is unauthorized before the translator submits a saved translation (e.g., job is cancelled or file is replaced by an updated version without the string), string goes inactive (because it’s also unauthorized in that situation), the last translator to save it receives credit at the time the string disappears--unless that user was a Account Owner, Project Manager, or Smartling Admin, in which case, no one gets credit.

  • The same is not true for editors and reviewers: if a string disappears before they submit it, they will not receive credit. 

  • If the string subsequently reappears (e.g., becomes active again), and is retranslated and submitted by the original translator, it’s counted only once. If a second translator translates and submits the re-activated string, then both translators receive credit.

Plural Forms

The additional work required of translators when translating strings ingested with plural forms is not currently reflected in word counts. Instead, these strings are counted as if they had a single source form and corresponding translation.


Cost Estimates

A cost estimate is a prediction of a job’s final word-count totals and cost if the job were to be translated at the point in time when the estimate is run. It's based on the fuzzy estimate for the content in the job, along with the appropriate fuzzy profiles and rate cards. It indicates how large a job is and what it would be likely to cost if translated now, approximately. Below is an example of a cost estimate.



Cost estimates like the above are available only to Account Owner and Project Manager roles. They are accessible in the Job Summary pane of a Job

In addition to the estimate for the whole job that is available in the Job Summary, a high-level estimate is automatically run during the process of authorizing a job. This estimate includes only the unauthorized content, and so might differ from the total estimate for the job if some content has already been authorized. This estimate only displays the top-level totals for source words, weighted words and cost. 

It’s possible to compare estimates for different workflows by selecting them in the dropdown on the Authorization dialog; separate workflows per language can be chosen using the ‘show details’ option.

Agencies and translators can run fuzzy estimates on the content they have access to. These estimates are limited to the translation step only, and also to the content that the user running the estimate has access to. When agencies use fuzzy estimates to produce price quotes, they need to add to the fuzzy-estimate numbers any non-translation steps that they’re responsible for in the job.

Actuals not included

With one exception, actual historical data from the job is not factored into the estimate. For example, it’s possible to run an estimate on a completed job; and if it happened that in this job a user not assigned to the step moved the content past the edit step and so removed the cost of editing from it, this would be ignored by the estimate. Instead, the estimate process would proceed on the assumption that if the job were translated now, the content would go through editing. The one exception to this rule about ignoring actuals is that for any translated strings, the fuzzy score that was available at the time of translation is used in the estimate instead of the fuzzy score available in the current leveraged translation memory.

Cost and prices

Cost figures are included in the estimate if the appropriate rate cards have been added to the platform. A valid rate card, i.e., one that includes a rate corresponding to the workflow step and content type, must be available for at least one of the assigned resources in every step of every workflow included in the estimate. If some rate cards are missing, the figures are displayed where possible (e.g., for certain languages or workflow steps), and a warning is displayed next to the overall cost estimate amount. If the relevant rate cards include multiple currencies, the total is not calculated and a ‘multiple currency’ message is displayed instead. 

Only Smartling Admins and Account Owners can enter rates.

Internal fuzzies and repetitions

As translation work on a job progresses, the translation memory is gradually populated with entries from the job itself, resulting in these entries becoming available as potential fuzzy matches for future translations in the same job. Future fuzzy matches that come from the job itself are known as ‘internal matches’ and are included in the estimate in the appropriate fuzzy tiers. 

Identical (100%) internal matches are known as ‘repetitions’, and are broken out separately in the estimate as it’s common for a separate price to be agreed for repetitions. However, fuzzy profiles don’t currently support a separate tier for repetitions—the 100% tier is used instead. If separate pricing for repetitions is required, it has to be calculated separately.


Strings that can SmartMatch are broken out separately in the estimate and are counted as zero weighted words for the translation step, since the translation will be automatically applied. If the strings will SmartMatch to Published, they are not included in the counts for any post-translation steps. However, if the strings will SmartMatch to a post-translation step other than Published, they are assumed to go to the first post-translation step, and thus will be included in the counts for all post-translation steps. This can result in an over-estimate for this content.

Dynamic Workflows

Estimates assume that all content destined for a dynamic workflow will travel through the default branch of the workflow. This is true even if some of the content is already in a different branch. Therefore, consideration should be given to how best to estimate dynamic workflows. 

Machine Translation Steps

Machine translation steps are currently treated as standard translation steps by the estimating process. I.e., they need to have a valid rate card associated with them in order to be counted in the estimate and not produce a warning on the total cost. In the future, we may default to zero cost for MT steps. A workaround to remove the warning is to assign a dummy resource with the appropriate rate card to all languages on the step. 


Estimates for plural strings reflect how they are currently counted in word count reports. I.e., the plural forms are not counted in the estimate. Instead, the string is counted as if it were a single source string with a corresponding single translation.

How estimates are calculated

The following steps are used in calculating estimates:

  1. Determine workflows. For each string in the job, identify a workflow for each language in order to establish which steps to include and which rate cards to use:

    1. For unauthorized content, the default workflow for each language is used for the estimate. If the default workflow is a dynamic workflow, the default branch of the dynamic workflow is used.

    2. For authorized content, the actual workflows in which the string sits are used for the estimate. If the workflow is a dynamic workflow, the default branch is used in the estimate even if the content is currently in a different branch. [unexpected]

  2. Calculate TM match scores and weighted words. If the string in question already has a translation, then the match score that was available at the time of translation is used in the estimate. For untranslated strings, the following are checked and then the weighted word counts for the string can be calculated:

    1. Smart Match. Smart Match to Published is counted at zero cost, whereas Smart Match to a post-translation step other than Published is counted as zero weighted words for the translation step and counted as normal for post-translation steps.

    2. Fuzzy Match. Check the fuzzy score of the (untranslated) string using the leveraged translation memories as they are at the time the estimate is run. Determine which fuzzy tier this falls into and calculate the corresponding weighted words. Large strings (>10K characters) are excluded from fuzzy-match estimates due to the expense of calculating; they’re assumed to have no fuzzy matches but are included in the total word count of the estimate.

    3. Internal Match. Check the fuzzy score of each string against that of strings in the same job that will be translated before it. The higher of the internal and TM match scores is used.

  3. Calculate cost. Use the rate card corresponding to the workflow content type and the rate corresponding to the step type to determine the cost (or cost range if multiple differing rate cards could apply) for each workflow step. The calculation is done to four decimal places and then rounded up to two at the end.

  4. Add up totals. Add up counts for all strings, steps and languages. Source word count and weighted word count for each string that is likely to be translated; source word count for each string that is likely to pass through editing and review steps.

Why estimates can change over time

Estimates that are re-run at different times over the life of a job can produce significantly different results. This is because an estimate is a real-time prediction of the job’s final word counts based on the current situation, which may have changed from when previous estimates were run. Below are some of the factors that contribute to estimates changing:

  • Translation memory changes. As translation work proceeds in an account across multiple translation jobs, the translation memory is continually updated, and these updates can provide new fuzzy-match and Smart Match options to the current job, resulting in reduced estimates. In addition, new TMs can be imported to the leverage configuration. It’s also possible, though less likely, that entries get deleted from a TM resulting in fewer match options than were originally available, which could result in an estimate increasing.

  • Job content changes. Files in a job can be updated while the job is under way resulting in content being added or removed from the job, which can change the estimate. 

  • Workflow changes. Content might be authorized into or moved to different workflows than assumed in the original estimate, resulting in different translators and editors with potentially different rates, or different workflow steps, being included in the estimate.

  • Configuration changes, such as which translation memories are included in the leverage configuration, Smart Match settings, workflow steps and settings, rates, etc. can all affect an estimate.

With the exception of TM changes that occur as a result of normal translation work, the changes above can be managed to reduce their impact on active translation jobs. Nonetheless, it’s advisable to save a copy of the original estimate, and the estimate when the content has entered its workflows, as well as any estimates run after significant changes of the types mentioned above.

Why actual word counts can differ from estimates

Actual word counts can differ from what was estimated for various reasons as described below:

  • Translation memory changes. The state of the translation memory at the moment a translation is saved can be different from how it was the time the estimate was run. This can happen for various unpredictable reasons, but is most commonly due to new translations being saved to the TM in other translation jobs, and in that case results in the actual cost being lower than estimated - a benefit of using a cloud-based translation platform. Note that once a string is translated, the match score that was available at the time of translation is used in subsequent estimates, even if the translation memory has since changed.

  • Order of submission. Once a translation is published, it becomes available for SmartMatching, which could result in something that was counted as a repetition in the estimate being translated by SmartMatch, potentially reducing the cost.

  • SmartMatch to post-Edit step. When estimates are calculated, SmartMatching to non-Published steps is assumed to move the content to the first post-Translation step. This over-estimates the cost for strings that will SmartMatch to a post-Edit Review or Hold step.

  • Skipping steps. Content can be manually moved past a step, removing the actual cost for that step; or it can automatically skip an editing step due to the workflow ‘skip edit’ or be automatically moved out of a step due to the ‘idle strings’ configuration option. In all these cases, the estimated cost of the step will not be incurred.

  • Dynamic Workflow effects. Estimates currently assume that all content goes through the default branch of a dynamic workflow. However, some of the content is likely to traverse different branches with likely different and potentially higher costs. 

  • Rejected SmartMatches. If smartmatched translations are subsequently rejected and worked on by a translator, the word count reports will reflect this work, but an estimate will not.

  • Translating before SmartMatch. If a translator enters a translation for a string before SmartMatch does, then the cost will be that of the translator’s work rather than the SmartMatch, and thus could be higher than what was estimated. 

Using estimates for budgeting

Estimates serve this use case reasonably well because deviations that occur in the actual word counts will generally lower the cost from the estimated amount. However, a number of exceptions to this need to be borne in mind:

  • Changes to the job contents can increase the size and cost of the job.
  • Workflow changes could introduce additional steps resulting in increased cost.
  • Configuration changes, such as removing a TM from the leverage could result in increased cost.
  • Dynamic workflows are not fully accounted for in estimates and could result in an under-estimate.

Ideally, the first three items above can simply be avoided once the job is in progress. If they can’t be avoided, a re-estimate may be required along with a potential budget adjustment. For the dynamic workflows issue, it may be advisable to configure the workflow such that the default branch is the most expensive to ensure that dynamic-workflow translations won’t result in cost increases from what was estimated.

One challenge with this use case is how to handle a significant deviation between the estimate and the actual cost. If the actual cost is higher, then either you or your translation vendor will need to bridge the gap in order to pay translators for their actual work. On the other hand, a lower actual cost can present a challenge in managing translators. 



  • A key point to note is that estimates are ‘real time’ and can give different results when run at different times for the same job. Because of this, the following is recommended:
    • Refresh the estimate before using it to be certain you have the latest.
    • Make a copy of any estimate that you intend to use or reference in the future. A download CSV option is available in the estimate details, but suggest taking a screenshot too.
    • If it’s important that the estimate not change much or that the final word counts be close to the estimate, limit the kinds of changes that can affect this while jobs are in progress (see sections above).
    • If changes must be made that will affect estimates and word counts, make sure that they are well communicated.
  • A small exception to the ‘real time’ point above is that once a translation is saved, the fuzzy score that was available when saved gets used henceforth in the estimate.

  • Warnings next to total cost figures should be taken seriously as they can indicate a very significant underestimate of total cost. The details of the estimate should be examined (including all languages) to determine which parts are missing, and this should be addressed before using the estimate.

  • Estimates for dynamic workflows assume that all content goes through the default branch. Consider using the default branch for estimating only. For example, if using language-based branching, add all language resources to the default branch so that they’re reflected in the estimate, but configure rules so that each language goes through a non-default branch.

  • Since MT steps currently are treated the same as other steps, i.e., they’re expected to have a rate card, consider creating a dummy user with a suitable rate (e.g., 0 or ‘unpayable’) and assigning to MT steps, to avoid warnings about missing rates for those steps.

  • Remember that SmartMatch to non-Published steps can overestimate cost since it always assumes that the content will Smart Match to the first post-translation step.

Was this article helpful?