Â
Word counts represent the sum of words in source strings, excluding tags and placeholders. They represent the quantity of work done by translators, editors and reviewers in a Smartling account over a particular time period. Word count reports are intended for tracking and invoicing of work completed. The counts reflect the number of source words and weighted words handled by each user and are broken out by language, workflow step type, and fuzzy tier. SmartMatch information does not appear on word count reports, since it does not constitute work done by users.
Sample Word Count Report
Account |
Project/ |
Translation |
Agency |
Target |
Workflow |
Fuzzy |
Fuzzy |
Word |
Weighted |
Character Count |
Company |
Mobile App |
John X |
Agency XYZ |
French (France) |
Translation |
Agency XYZ profile |
95 - 99.9% |
120 |
36 |
480 |
Company |
Mobile App |
Mary Y |
- |
French (France) |
Edit |
- |
- |
120 |
120 |
500 |
This report is available in the Smartling dashboard, where it can also be downloaded in CSV format. Agencies can download reports for anything their team worked on, and individual translators, editors, and reviewers can download word count reports on their own work.
Character Counts
Like word counts, character counts represent the sum of characters in source strings, excluding tags and placeholders. Character counts are particularly useful to represent the quantity of work done by translators, editors and reviewers translating into Chinese and Japanese, as these languages do not use spaces to separate words.
The method in which your files are ingested into Smartling, via file upload with directives or API, could affect how characters are counted. For example, if entities in the file are captured as an entity or as a character. The following table should help clarify some of these rules:
Code Point | Rule Description |
Simple Whitespace |
|
Tags |
|
Placeholders |
|
 Unicode Characters / HTML Entities |
(&(?:nbsp|#160|#xA0|ensp|#8194|#x2002|emsp|
|
All other code points | All other code points are counted as characters |
Note: Smartling does not process mixed character and word-based languages for word counts. It chooses either character-based or word-base counting, based on the source locale of the project. It is applied to all characters and words in the project. For this reason, there may be a difference in the word count displayed here vs in a document tool, such as Microsoft Word.
Fuzzy Matches
When assessing the size of a piece of translation work, itâs customary in the industry to consider previous translations that were available to the translator at the time they did the work. For example, translating the string âOnce upon a timeâ when a previous translation was available for the string âOnce upon a time,â (with a comma) would be considered less work for the translator than if it had to be translated from scratch. A pre-existing entry in the translation memory similar to the string being translated is called a âfuzzy matchâ.
Fuzzy Scores
The measure of similarity between one string and another is known as the âedit distanceâ and is calculated using an industry-standard method called Levenshtein. Edit distances for fuzzy matches are surfaced in Smartling as percentages called âfuzzy scoresâ, with higher percentages signifying greater similarity. In the example above, the fuzzy score for the string being translated is above 90%. A high fuzzy score means that the translation memory already contains a translation for a string very similar to the one being translated, and usually results in reduced translation costs.
Fuzzy Profiles
Pricing for translation work is typically based on the number of source words translated along with the fuzzy scores of the strings when translated. Fuzzy score ranges are organized into pricing bands or âtiersâ, with higher-score tiers receiving progressively greater discounts. This is illustrated below.
Fuzzy Tier (fuzzy score range) |
Fuzzy discount example |
0% - 84.9%Â |
Full per-word translation rate (no discount) |
85% - 94.9% |
60% of full per-word translation rate |
95% - 99.9% |
30% of full per-word translation rate |
100% - 100% |
10% of full per-word translation rate |
Sample Fuzzy Pricing Tiers
The definition of a set of fuzzy pricing tiers like the above is known in Smartling as a âfuzzy profileâ. Multiple fuzzy profiles can be defined for an account, but currently a single profile must be chosen for each language in a given project. This may change in the future to allow fuzzy profiles to be associated with agencies instead of languages in a project.
Your Smartling Customer Success Manager can assist with configuring new fuzzy profiles for your projects if required.
The fuzzy profile that was active when a string was translated is included in the word-count report, along with the stringâs fuzzy tier (called âfuzzy breakdownâ in the report).
By default, fuzzy tier-based pricing applies only to work done in the translation step, with editing charged at the full per-word editing rate, regardless of fuzzy matches. This is the industry standard.
Edit Effort Estimation Levels
The Fuzzy Breakdown column in the Word Count report will display either the fuzzy tier for available TM matches or the Edit Effort Level (Level 1, Level 2, or Level 3) if a translated string underwent Edit Effort Estimation. See Edit Effort Estimation for Machine Translation for more details.
Weighted Words
The concept of âweighted wordsâ exists to simplify price calculations and thus reduce the chance of errors. Instead of multiplying the number of source words in each fuzzy tier by the tierâs discount and the per-word price, simply multiply the weighted word count by the full per-word price. The weighted-word count has the discount built in. For the sample word count report above, assuming a per-word price of 10 cents, here is how to calculate the price of the work represented in the first row of the report using weighted words:
-
Calculate price: weighted word count x price per word:
36 x 10 = 360 cents
Weighted words counts are rounded up to the nearest whole word.
How is work credited?
The main rules for how work is credited are:
- Translation work is credited to the translator who submits it, at the time of submission. If two translators work on a string, the one who submits it gets the credit. However, if any other user submits a translatorâs work, or moves it out of the step, the translator gets the credit.
- For editing and review steps, work is credited on submission, regardless of whether any changes were made to the translations. Work is not credited when translations are rejected back to earlier steps; itâs credited when it is eventually submitted.
- If an Account Owner, Project Manager, Agency Account Owner, or Translation Resource Manager (that is not assigned to the step) submits an editorâs or reviewerâs work, the submitter gets the credit in the word count reportânot the editor (even if the editor made changes to the translation). If a Smartling Admin (such as your CSM or SA) submits an editorâs or reviewerâs work, nobody gets credit as Admins are not counted in word count reports.
- If a string is rejected, re-worked and submitted again, it is credited once. However, if a second freelancer or a translator from a different agency works on the rejected translation, both translators receive credit.
-
If a string is unauthorized before a saved translation is submitted (e.g., when a job is cancelled or a file is replaced by an updated version without the string), the following applies:
- If a string was saved and subsequently unauthorized while it was still in the Translation step, the last translator to save it receives credit at the time the string disappears - unless that user was an Account Owner, Project Manager, or Smartling Admin, in which case, no one gets credit.
-
If a string was saved and subsequently unauthorized in a post-translation step (Edit, Review or Post-Edit), the last linguist to save the translation receives credit only if they had made an edit to the translation before saving it. If an existing translation is saved in a post-translation step without any changes and the string is then unauthorized before being submitted, the linguist who had saved the unedited translation will not be credited.
- If the string subsequently reappears (e.g., becomes active again), and is retranslated and submitted by the original translator, itâs counted only once. If a second translator translates and submits the re-activated string, then both translators receive credit.
Plural Forms
The additional work required of translators when translating strings ingested with plural forms is reflected in the Word Count Report. Plurals are counted with a locale-specific multiplier. The number of source words is multiplied with the number of plural forms for each target locale.
Â
Cost Estimates
A cost estimate is a prediction of a jobâs final word-count totals and cost if the job were to be translated at the point when the estimate is run. It's based on the fuzzy estimate for the content in the job, along with the appropriate fuzzy profiles and rate cards. It indicates how large a job is and what it would be likely to cost if translated now, approximately. Below is an example of a cost estimate.
Â
Cost estimates like the above are available only to Account Owner and Project Manager roles. They are accessible in the Job Summary pane of a Job.Â
In addition to the estimate for the whole job that is available in the Job Summary, a high-level estimate is automatically run during the process of authorizing a job. This estimate includes only the unauthorized content, and so might differ from the total estimate for the job if some content has already been authorized. This estimate only displays the top-level totals for source words, weighted words and cost.Â
Itâs possible to compare estimates for different workflows by selecting them in the dropdown on the Authorization dialog; separate workflows per language can be chosen using the âshow detailsâ option.
Agencies and translators can run fuzzy estimates for all workflow steps they have access to once content reaches those steps. Fuzzy tiers will only be applied if the workflow step has a Fuzzy Match Profile associated with it; otherwise, the estimate will simply display the Word Count for that step. Published content is not included in estimates.
Actuals not included
With one exception, actual historical data from the job is not factored into the estimate. For example, itâs possible to run an estimate on a completed job; and if it happened that in this job a user not assigned to the step moved the content past the edit step and so removed the cost of editing from it, this would be ignored by the estimate. Instead, the estimate process would proceed on the assumption that if the job were translated now, the content would go through editing. The one exception to this rule about ignoring actuals is that for any translated strings, the fuzzy score that was available at the time of translation is used in the estimate instead of the fuzzy score available in the current leveraged translation memory.
Cost and prices
Cost figures are included in the estimate if the appropriate rate cards have been added to the platform. A valid rate card, i.e., one that includes a rate corresponding to the workflow step and content type, must be available for at least one of the assigned resources in every step of every workflow included in the estimate. If some rate cards are missing, the figures are displayed where possible (e.g., for certain languages or workflow steps), and a warning is displayed next to the overall cost estimate amount. If the relevant rate cards include multiple currencies, the total is not calculated and a âmultiple currencyâ message is displayed instead.Â
Only Smartling Admins and Account Owners can enter rates.
Internal fuzzies and repetitions
As translation work on a job progresses, the translation memory is gradually populated with entries from the job itself, resulting in these entries becoming available as potential fuzzy matches for future translations in the same job. Future fuzzy matches that come from the job itself are known as âinternal matchesâ and are included in the estimate in the appropriate fuzzy tiers.Â
Identical (100%) internal matches are known as ârepetitionsâ, and are broken out separately in the estimate as itâs common for a separate price to be agreed for repetitions.
SmartMatch
Strings that can SmartMatch are broken out separately in the estimate and are counted as zero weighted words for the translation step, since the translation will be automatically applied. If the strings will SmartMatch to Published, they are not included in the counts for any post-translation steps. However, if the strings will SmartMatch to a post-translation step other than Published, they are assumed to go to the first post-translation step, and thus will be included in the counts for all post-translation steps. This can result in an over-estimate for this content.
Dynamic Workflows
Estimates assume that all content destined for a dynamic workflow will travel through the default branch of the workflow. This is true even if some of the content is already in a different branch. Therefore, consideration should be given to how best to estimate dynamic workflows.Â
Machine Translation Steps
Machine translation steps are currently treated as standard translation steps by the estimating process. I.e., they need to have a valid rate card associated with them in order to be counted in the estimate and not produce a warning on the total cost. In the future, we may default to zero cost for MT steps. A workaround to remove the warning is to assign a dummy resource with the appropriate rate card to all languages on the step.Â
Plurals
Plural forms are not counted in the estimate. Instead, the string is counted as if it were a single source string with a corresponding single translation. Please note that in the Word Count Report, plural forms are counted and a multiplier is applied to the source word count.Â
How estimates are calculated
The following steps are used in calculating estimates:
-
Determine workflows. For each string in the job, identify a workflow for each language in order to establish which steps to include and which rate cards to use:
-
For unauthorized content, the default workflow for each language is used for the estimate. If the default workflow is a dynamic workflow, the default branch of the dynamic workflow is used.
-
For authorized content, the actual workflows in which the string sits are used for the estimate. If the workflow is a dynamic workflow, the default branch is used in the estimate even if the content is currently in a different branch. [unexpected]
-
For unauthorized content, the default workflow for each language is used for the estimate. If the default workflow is a dynamic workflow, the default branch of the dynamic workflow is used.
-
Calculate TM match scores and weighted words. If the string in question already has a translation, then the match score that was available at the time of translation is used in the estimate. For untranslated strings, the following are checked and then the weighted word counts for the string can be calculated:
-
Smart Match. Smart Match to Published is counted at zero cost, whereas Smart Match to a post-translation step other than Published is counted as zero weighted words for the translation step and counted as normal for post-translation steps.
-
Fuzzy Match. Check the fuzzy score of the (untranslated) string using the leveraged translation memories as they are at the time the estimate is run. Determine which fuzzy tier this falls into and calculate the corresponding weighted words. Large strings (>10K characters) are excluded from fuzzy-match estimates due to the expense of calculating; theyâre assumed to have no fuzzy matches but are included in the total word count of the estimate.
-
Internal Match. Check the fuzzy score of each string against that of strings in the same job that will be translated before it. The higher of the internal and TM match scores is used.
-
Smart Match. Smart Match to Published is counted at zero cost, whereas Smart Match to a post-translation step other than Published is counted as zero weighted words for the translation step and counted as normal for post-translation steps.
-
Calculate cost. Use the rate card corresponding to the workflow content type and the rate corresponding to the step type to determine the cost (or cost range if multiple differing rate cards could apply) for each workflow step. The calculation is done to four decimal places and then rounded up to two at the end.
- Add up totals. Add up counts for all strings, steps and languages. Source word count and weighted word count for each string that is likely to be translated; source word count for each string that is likely to pass through editing and review steps.
Why estimates can change over time
Estimates that are re-run at different times over the life of a job can produce significantly different results. This is because an estimate is a real-time prediction of the jobâs final word counts based on the current situation, which may have changed from when previous estimates were run. Below are some of the factors that contribute to estimates changing:
-
Translation memory changes. As translation work proceeds in an account across multiple translation jobs, the translation memory is continually updated, and these updates can provide new fuzzy-match and Smart Match options to the current job, resulting in reduced estimates. In addition, new TMs can be imported to the leverage configuration. Itâs also possible, though less likely, that entries get deleted from a TM resulting in fewer match options than were originally available, which could result in an estimate increasing.
-
Job content changes. Files in a job can be updated while the job is under way resulting in content being added or removed from the job, which can change the estimate.Â
-
Workflow changes. Content might be authorized into or moved to different workflows than assumed in the original estimate, resulting in different translators and editors with potentially different rates, or different workflow steps, being included in the estimate.
- Configuration changes, such as which translation memories are included in the leverage configuration, Smart Match settings, workflow steps and settings, rates, etc. can all affect an estimate.
With the exception of TM changes that occur as a result of normal translation work, the changes above can be managed to reduce their impact on active translation jobs. Nonetheless, itâs advisable to save a copy of the original estimate, and the estimate when the content has entered its workflows, as well as any estimates run after significant changes of the types mentioned above.
Why actual word counts can differ from estimates
Actual word counts can differ from what was estimated for various reasons as described below:
-
Translation memory changes. The state of the translation memory at the moment a translation is saved can be different from how it was the time the estimate was run. This can happen for various unpredictable reasons, but is most commonly due to new translations being saved to the TM in other translation jobs, and in that case results in the actual cost being lower than estimated - a benefit of using a cloud-based translation platform. Note that once a string is translated, the match score that was available at the time of translation is used in subsequent estimates, even if the translation memory has since changed.
-
Order of submission. Once a translation is published, it becomes available for SmartMatching, which could result in something that was counted as a repetition in the estimate being translated by SmartMatch, potentially reducing the cost.
-
SmartMatch to post-Edit step. When estimates are calculated, SmartMatching to non-Published steps is assumed to move the content to the first post-Translation step. This over-estimates the cost for strings that will SmartMatch to a post-Edit Review or Hold step.
-
Skipping steps. Content can be manually moved past a step, removing the actual cost for that step; or it can automatically skip an editing step due to the workflow âskip editâ or be automatically moved out of a step due to the âidle stringsâ configuration option. In all these cases, the estimated cost of the step will not be incurred.
-
Dynamic Workflow effects. Estimates currently assume that all content goes through the default branch of a dynamic workflow. However, some of the content is likely to traverse different branches with likely different and potentially higher costs.Â
- Rejected SmartMatches. If smartmatched translations are subsequently rejected and worked on by a translator, the word count reports will reflect this work, but an estimate will not.
- Translating before SmartMatch. If a translator enters a translation for a string before SmartMatch does, then the cost will be that of the translatorâs work rather than the SmartMatch, and thus could be higher than what was estimated.Â
Using estimates for budgeting
Estimates serve this use case reasonably well because deviations that occur in the actual word counts will generally lower the cost from the estimated amount. However, a number of exceptions to this need to be borne in mind:
- Changes to the job contents can increase the size and cost of the job.
- Workflow changes could introduce additional steps resulting in increased cost.
- Configuration changes, such as removing a TM from the leverage could result in increased cost.
- Dynamic workflows are not fully accounted for in estimates and could result in an under-estimate.
Ideally, the first three items above can simply be avoided once the job is in progress. If they canât be avoided, a re-estimate may be required along with a potential budget adjustment. For the dynamic workflows issue, it may be advisable to configure the workflow such that the default branch is the most expensive to ensure that dynamic-workflow translations wonât result in cost increases from what was estimated.
One challenge with this use case is how to handle a significant deviation between the estimate and the actual cost. If the actual cost is higher, then either you or your translation vendor will need to bridge the gap in order to pay translators for their actual work. On the other hand, a lower actual cost can present a challenge in managing translators.Â
Â
Considerations
- A key point to note is that estimates are âreal timeâ and can give different results when run at different times for the same job. Because of this, the following is recommended:
- Refresh the estimate before using it to be certain you have the latest.
- Make a copy of any estimate that you intend to use or reference in the future. A download CSV option is available in the estimate details, but suggest taking a screenshot too.
- If itâs important that the estimate not change much or that the final word counts be close to the estimate, limit the kinds of changes that can affect this while jobs are in progress (see sections above).
-
If changes must be made that will affect estimates and word counts, make sure that they are well communicated.
-
A small exception to the âreal timeâ point above is that once a translation is saved, the fuzzy score that was available when saved gets used henceforth in the estimate.
- Warnings next to total cost figures should be taken seriously as they can indicate a very significant underestimate of total cost. The details of the estimate should be examined (including all languages) to determine which parts are missing, and this should be addressed before using the estimate.
- Estimates for dynamic workflows assume that all content goes through the default branch. Consider using the default branch for estimating only. For example, if using language-based branching, add all language resources to the default branch so that theyâre reflected in the estimate, but configure rules so that each language goes through a non-default branch.
- Since MT steps currently are treated the same as other steps, i.e., theyâre expected to have a rate card, consider creating a dummy user with a suitable rate (e.g., 0 or âunpayableâ) and assigning to MT steps, to avoid warnings about missing rates for those steps.
- Remember that SmartMatch to non-Published steps can overestimate cost since it always assumes that the content will Smart Match to the first post-translation step.