Regex

Regex

A regular expression (regex) is a character set used to create patterns for searching through texts. These patterns are often used for finding and or replacing text.

The most basic regex is to use the exact characters you want to use for your search. If you are trying to find "hello" in a text from a play, you can just simply use the pattern hello. One thing to note is that regex is case sensitive so while hello will find what you are looking for Hello will not.

Operators

* Operator

Use the * to match a particular character or set 0 or more times. To use this operator you will put it behind the character or character set. To match the following

sooper 
sooper 
soper 
sper

you can use the pattern so*per and it will match all four.

+ Operator

Use the + to match a particular character or set 1 or more times. To use this operator you will put it behind the character or character set. To match the following

Ahwoogah
Ahwoooogah
Ahwoooooogah

you can use the pattern Ahwo+gah and it will match all three.

? Operator

Use the ? to match a particular character or set 0 or 1 time. To use this operator you will put it behind the character or character set. To match the following

sper
soper
sooper

you can use the pattern so?o?per to match all three.

Character Classes

Character classes can be used instead of the ? when there are several characters that could possibly be in one spot. To create a character class you put the characters you want to match in square brackets []. To match the following

sper
soper
super

you can use the patter s[ou]per to match all three.

You can use operators with Character classes as well. To match the following

AZD
EDV
asdf

you can use the pattern [a-z]*[A-Z]*

Ranges

You can use ranges to create classes. A-Z matches all capital letters. a-z matches all lowercase letters. 0-9 matches all numbers.

Excluding

If you want to match everything but a chracter you can use the ^ operator as the first character inside a character class. To use this operator in this way you need to have the ^ as the first character in the []. To exclude matches with a digit you can use .

Shorthand Character Classes

Special character classes will always start with a backslash \. You can use special character classes inside character classes. [\d\w] You can also use operators with character classes. \d?

POSIX

DESCRIPTION

ASCII

Unicode

Shorthand

[:alnum:]

Alphanumeric characters

[a-zA-Z0-9]

[\p{L}\p{Nl}\p{Nd}]

[:alpha:]

Alphabetic characters

[a-zA-Z]

\p{L}\p{Nl}

[:ascii:]

ASCII characters

[\x00-\x7F]

\p{InBasicLatin}

[:blank:]

Space and tab

[ \t]

[\p{Zs}\t]

\h

[:cntrl:]

Control characters

[\x00-\x1F\x7F]

\p{Cc}

[:digit:]

Digits

[0-9]

\p{Nd}

\d

[:graph:]

Visible characters. Anything other than spaces and control characters.

[\x21-\x7E]

[^\p{Z}\p{C}]

[:lower:]

Lowercase letters

[a-z]

\p{Ll}

\l

[:print:]

Visible characters and spaces. Anything other than control characters. }

[\x20-\x7E]

\p{C}

[:punct:]

Punctuatjion and symbols.

[!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_‘{\|}~]

\p{P}

[:space:]

All whitespace characters including line breaks.

[ \t\r\n\v\f]

[\p{Z}\t\r\n\v\f]

\s

[:upper:]

Uppercase letters

[A-Z]

\p{Lu}

\u

[:word:]

Word characters: letters, numbers and underscores.

[A-Za-z0-9_]

[\p{L}\p{Nl} \p{Nd}\p{Pc}]

\w

[:xdigit:]

Hexadecimal digits.

[A-Fa-f0-9]

[A-Fa-f0-9]

Note: This table comes from: https://www.regular-expressions.info/posixbrackets.html

Last updated

Was this helpful?