This article is for accounts with RegEx search enabled on their Translation Memory.
RegEx, or regular expression, is a sequence of characters that specifies a search pattern in text, and although it is a super powerful method of searching, RegEx isn't quick and easy to understand. Sometimes you just need it broken down into a language you are familiar with. Don't worry, we got you. Here is a collection of example RegEx patterns you can use to find your strings in the TM, in a language we all understand (somewhat) - emoji's 😊
Does Not Contain
^((?!🕶️).)*$
Where you want to find strings which do not contain the keyword or phrase represented by 🕶️.
Example use case: search for a glossary source term and find any translations which don’t contain the proper glossary translation.
Combination
(?:(💙|🌻).(🐸|🍒))
Where you would like to find any combination of strings, keywords, or phrases as follows:
- 💙🐸
- 💙🍒
- 🌻🐸
- 🌻🍒
Example use case: search for any combination of good or bad pets:
(?:(good|bad).(dog|cat|lizard|guinea pig))
Note: Keywords are case-sensitive. More than two arguments can be used in each parenthetical.
Character Count Ranges
^(?=[\S\s]{1️⃣,9️⃣}$)[\S\s]*
Where the string has between 1️⃣ and 9️⃣ characters, not including whitespace. This pattern can be used for any character count range by simply defining the numerical range. For example:
.{8,}
can be used for strings with 8 or more characters, excluding line breaks^(?=[\S\s]{00,8}$)[\S\s]*
can be used for strings with 0-8 characters excluding whitespace- similarly,
^(?=[\S\s]{10,20}$)[\S\s]*
can be used for strings with 10-20 characters excluding whitespace
Example use case 1: your product screen previously only supported 10 characters; now it can support 12. You’d like to find all strings with 10 or fewer characters, so you can evaluate the level of effort required to update all 10-character translations to 12-character translations:^(?=[\S\s]{00,10}$)[\S\s]*
Example use case 2: you want to find short strings, with 8 or fewer characters, e.g. strings used for mobile app push notifications, you might use this sequence:
^(?=[\S\s]{00,8}$)[\S\s]*
Note: If starting with 0 as the lower limit, use 00 (not just 0)
The Roots of Words
\b(🌲).*\B
Where 🌲 is the root of a word.
Example use case: a French linguist is looking for all translations containing a verb (ex. Manger / to eat), but the verb is conjugated with various endings based on the subject (je mange, tu manges, ils mangent, etc.):
\b(mang).*\B
This Word, That Word, and This Other Word
(?:(🔵|🟡|🔴))
Where 🔵, 🟡, and 🔴 are three separate keywords.
Example use case: you need to locate all strings that reference MTV, VH1, or Disney Channel:
(?:(MTV|VH1|Disney Channel))
Note: Phrases (such as Disney Channel) work as long as each term is separated by a vertical line.