search blog

Regex is Really Useful

Regex is a really useful tool that I have not utilised enough, hopefully with this post as a guide I can incorporate it more into my workflow.

What is Regex?

Regex is short for "regular expressions". It is a powerful tool for matching patterns in text. It is used in many programming languages and text editors to search for and manipulate text. It can also be used for replacing strings within a string.

How does Regex Work?

Regex works by using a pattern to match text. The pattern is made up of special characters that represent different types of characters. For example, \s would match a single whitespace character, and \d would match a single digit character.

Patterns

What I mean by patterns is what you can use to identify a string.

Start & End

There are 2 really important characters that you should be aware of: ^ and $. The caret ^ matches the start of a string and the dollar sign $ matches the end of a string. This is useful for when you want to match a string that starts with a certain character or ends with a certain character.

const xp = new RegExp(/^word$/);

const str = "word";

const doesMatch = xp.test(str);
// doesMatch = true

Brackets

There are 3 types of brackets to be aware of:

{} - These are used to quantify the preceding expression

[] - These are used to match a single character from a list of characters

() - These are used to group expressions together

const xp = new RegExp(/^[a-z]{3}$/);
// This will match any 3 lowercase English letters

const xp = new RegExp(/^(a|b)c$/);
// This will match:
// - "ac"
// - "bc"

Negation

Negation is used to match any character that is not in a list of characters. This can be done using the caret, For example, [^a] will match any character that is not a.

const xp = new RegExp(/^[^a]$/);
// This will match any single character that is not "a"

Quantifiers

Quantifiers are used to match a certain number of characters.

? Matches the preceding pattern 0 or 1 times

* Matches the preceding pattern 0 or more times

+ Matches the preceding pattern 1 or more times

{n} Matches the preceding pattern exactly n times

{n,} Matches the preceding pattern n or more times

{n,m} Matches the preceding pattern between n and m times

By default all matches have to pass for a test to return true, think of it using the and operator, to use the or operator you can use the | character.

const xp = new RegExp(/^(bob){2}(tom)?$/);
// will match
// - "bobbob"
// - "bobbobtom"

const str = "bobbob";

const doesMatch = xp.test(str);
// doesMatch = true

Sinlge Characters

Single characters are the most basic type of pattern. They match a single character in the text. For example, a would match a and a only. There are special cases where you can apply a range to a character. For example, A-Z which will match all upper case english letters. The same will apply for numbers.

const xp = new RegExp(/^[a-z]{2}A$/);
// will match any two lowercase English characters followed by an uppercase "A"

Special Characters

Special characters are characters that have a special meaning in regex. For example, the dot . will match any character except a newline character. The backslash \ is used to escape special characters. For example, \\ will match a single backslash character. Important ones to learn are:

\b Word boundary (start or end of a word)

\B Non-word boundary

\w Word character (alphanumeric & underscore)

\W Non-word character

\d Digit character (0-9)

\D Non-digit character

\s Whitespace character ( )

\S Non-whitespace character

. Any character except a newline character

const xp = new RegExp(/\bword\b/);
// will match the word "word" but not "words"

const xp = new RegExp(/\d{3}/);
// will match any 3 digit number

const xp = new RegExp(/(\s)*/);
// will match any number of whitespace characters

const xp = new RegExp(/(\w){3,18}/);
// will match any word that is between 3 and 18 characters long
// this would be quite useful for something like a username

Flags

Flags are used to change the behavior of a regular expression. There are 6 flags that can be used:

g Global - Matches all occurrences of the pattern

i Case-insensitive - Matches the pattern regardless of case

m Multiline - Matches the pattern across multiple lines

u Unicode - Matches the pattern as a Unicode string

y Sticky - Matches the pattern only at the start of the string

s Dotall - Matches the pattern across multiple lines

const xp = new RegExp(/word/g);
// will match all occurrences of the word "word"

const xp = new RegExp(/word/i);
// will match the word "word" regardless of case

Lookaround

Lookaround is used to match a pattern only if it is followed or preceded by another pattern. There are 4 types of lookaround:

(?=...) Positive lookahead

(?!...) Negative lookahead

(?<=...) Positive lookbehind

(?<!...) Negative lookbehind

const xp = new RegExp(/^a(?=b{1,2})$/);
// This will match the single letter `a` if it is followed by 1 or 2 occurrences
// of the letter `b`

Something like positive lookhead is really useful for doing complex find and replace where you would want to keep the center of something the same.

In this example we are replacing the bad words if the name they are being called is 'bob' or 'Bob'

const str =
  "Bob is bad. Will is naughty, Tom is naughty, Tom is bad, Bob is good, bob is naughty.";

const xp = new RegExp(/(?<=(B|b)ob\sis\s)(bad|naughty)/, "g");

const newStr = str.replace(xp, "good");

console.log(newStr);
// "Bob is good. Will is naughty, Tom is naughty, Tom is bad, Bob is good, bob is good."

In this example we want to return all the names that come before 'BAD'

const names = `Bob BAD
Tom BAD
Will GOOD
Bob GOOD
Greg BAD`;

const xp = new RegExp(/\w+(?=\sBAD)/, "g");

const matches = names.match(xp);
// ["Bob", "Tom", "Greg"]

Note that if there are no matches it will return null


Published 1/9/2025