Smartling allows you to set a length limit on a string’s translation to help ensure translations are a certain length. A string length can be measured in characters or bytes. Bytes are counted based on the Unicode block in which a character exists. Depending on content encoding (i.e. UTF-8, UTF-16, etc.) byte counts can vary.
The following table outlines the byte count of any character (including emojis) within Smartling, as of December 20, 2023:
Unicode Block | Range |
Bytes count in Smartling |
Hangul Jamo | U+1100 to U+11FF | 2 |
Kana Extended B | U+1AFF0 to U+1AFFF | 2 |
Kana Supplement | U+1B000 to U+1B0FF | 2 |
Kana Extended A | U+1B100 to U+1B12F | 2 |
Small Kana ext | U+1B130 to U+1B16F | 2 |
CJK Unified Ideographs Extension B | U+20000 to U+2A6DF | 2 |
CJK Unified Ideographs Extension C | U+2A700 to U+2B738 | 2 |
CJK Unified Ideographs Extension D | U+2B740 to U+2B81D | 2 |
CJK Unified Ideographs Extension E | U+2B820 to U+2CEA1 | 2 |
CJK Unified Ideographs Extension F | U+2CEB0 to U+2EBE0 | 2 |
CJK Radicals Supplement | U+2E80 to U+2EFF | 2 |
Kangxi Radicals | U+2F00 to U+2FDF | 2 |
CJK Compatibility Ideographs Supplement | U+2F800 to U+2FA1F | 2 |
CJK Unified Ideographs Extension G | U+30000 to U+3134A | 2 |
Hiragana | U+3040 to U+309F | 2 |
Katakana | U+30A0 to U+30FF | 2 |
Hangul Compatibility Jamo | U+3130 to U+318F | 2 |
Kanbun | U+3190 to U+319F | 2 |
CJK Strokes | U+31C0 to U+31EF | 2 |
Katakana Phonetic Extensions | U+31F0 to U+31FF | 2 |
CJK Unified Ideographs Extension A | U+3400 to U+4DBF | 2 |
CJK | U+4E00 to U+9FFF | 2 |
Hangul Jamo Extended-A | U+A960 to U+A97F | 2 |
Hangul Syllables | U+AC00 to U+D7AF | 2 |
Hangul Jamo Extended-B | U+D7B0 to U+D7FF | 2 |
CJK compatibility | U+F900 to U+FAFF | 2 |
Halfwidth and Fullwidth Forms | U+FF00 to U+FFEF | 2 |
Halfwidth and Fullwidth Forms | U+FF00 to U+FFEF | 2 |
Latin Characters | U+0000 to U+007F | 1 |
Cyrillic Characters | U+0400 to U+04FF | 1 |
Mahjong Tiles | U+1F000 to U+1F02B | 2 |
Playing Cards | U+1F0A0 to U+1F0F5 | 2 |
Enclosed Alphanumeric Supplement | U+1F100 to U+1F1FF | 2 |
Enclosed Ideographic Supplement | U+1F200 to U+1F265 | 2 |
Miscellaneous Symbols and Pictographs | U+1F300 to U+1F5FF | 2 |
Emoticons | U+1F600 to U+1F64F | 2 |
Transport and Map Symbols | U+1F680 to U+1F6FC | 2 |
Geometric Shapes Extended | U+1F780 to U+1F7F0 | 2 |
Supplemental Symbols and Pictographs | U+1F900 to U+1F9FF | 2 |
Symbols and Pictographs Extended-A | U+1FA70 to U+1FAF8 | 2 |
General Punctuation | U+2000 to U+206F | 1 |
Combining Diacritical Marks for Symbols | U+20D0 to U+20F0 | 1 |
Letterlike Symbols | U+2100 to U+214F | 1 |
Arrows | U+2190 to U+21FF | 1 |
Miscellaneous Technical | U+2300 to U+23FF | 1 |
Enclosed Alphanumerics | U+2460 to U+24FF | 1 |
Geometric Shapes | U+25A0 to U+25FF | 1 |
Miscellaneous Symbols | U+2600 to U+26FF | 1 |
Dingbats | U+2700 to U+27BF | 1 |
Supplemental Arrows-B | U+2900 to U+297F | 1 |
Miscellaneous Symbols and Arrows | U+2B00 to U+2BFF | 1 |
CJK Symbols and Punctuation | U+3000 to U+303F | 2 |
Enclosed CJK Letters and Months | U+3200 to U+32FF | 1 |
Latin-1 Supplement | U+0080 to U+00FF | 1 |
Tags | U+E0001 to U+E007F | 2 |
Variation Selectors | U+FE00 to U+FE0F | 1 |