How To Use RegEx In Your Translation Memory Search

This article is for accounts with RegEx search enabled on their Translation Memory.

RegEx, or regular expression, is a sequence of characters that specifies a search pattern in text, and although it is a super powerful method of searching, RegEx isn't quick and easy to understand. Sometimes you just need it broken down into a language you are familiar with. Don't worry, we got you. Here is a collection of example RegEx patterns you can use to find translation units in the TM, in a language we all understand (somewhat) - emoji's 😊

Exact String Match

^(🌶️)$

Where you want to find units which are the exact text represented by 🌶️, avoiding any strings where 🌶️ exists with other characters.

Example use case: searching the translation memory for TUs that correspond to the "Next" button in your product UI text, and need to entirely exclude results that might have additional text, such as, "Next Up":

^(Next)$

Does Not Contain

^((?!🕶️).)*$

Where you want to find units which do not contain the keyword or phrase represented by 🕶️.

Example use case: search for a glossary source term and find any translations which don’t contain the proper glossary translation.

This search is case sensitive. To perform a case-insensitive "does not contain" search use:
^(?i)((?!🕶️).)*$

Combination

(?:(💙|🌻).(🐸|🍒))

Where you would like to find any combination of units, keywords, or phrases as follows:

💙🐸
💙🍒
🌻🐸
🌻🍒

Example use case: search for any combination of good or bad pets:

(?:(good|bad).(dog|cat|lizard|guinea pig))

Note: Keywords are case-sensitive. More than two arguments can be used in each parenthetical.

Character Count Ranges

^(?=[\S\s]{1,9}$)[\S\s]*

Where the unit has between 1 and 9 characters, not including whitespace. This pattern can be used for any character count range by simply defining the numerical range. For example:

.{8,} can be used for units with 8 or more characters, excluding line breaks
^(?=[\S\s]{00,8}$)[\S\s]* can be used for units with 0-8 characters excluding whitespace
similarly,^(?=[\S\s]{10,20}$)[\S\s]*can be used for units with 10-20 characters excluding whitespace

Example use case 1: your product screen previously only supported 10 characters; now it can support 12. You’d like to find all units with 10 or fewer characters, so you can evaluate the level of effort required to update all 10-character translations to 12-character translations:
^(?=[\S\s]{00,10}$)[\S\s]*

Example use case 2: you want to find short units, with 8 or fewer characters, e.g. strings used for mobile app push notifications, you might use this sequence:

^(?=[\S\s]{00,8}$)[\S\s]*

Note: If starting with 0 as the lower limit, use 00 (not just 0)

The Roots of Words

\b(🌲).*\B

Where 🌲 is the root of a word.

Example use case: a French linguist is looking for all translations containing a verb (ex. Manger / to eat), but the verb is conjugated with various endings based on the subject (je mange, tu manges, ils mangent, etc.):

\b(mang).*\B

This Word, That Word, and This Other Word

(?:(🔵|🟡|🔴))

Where 🔵, 🟡, and 🔴 are three separate keywords.

Example use case: you need to locate all units that reference MTV, VH1, or Disney Channel:

(?:(MTV|VH1|Disney Channel))

Note: Phrases (such as Disney Channel) work as long as each term is separated by a vertical line.

Hey! Hoi! ¡Oye! Ciao ! 你好! Hallo! Salut ! Hey! How can we help?