Regex
Regex
A regular expression (regex) is a character set used to create patterns for searching through texts. These patterns are often used for finding and or replacing text.
The most basic regex is to use the exact characters you want to use for your search. If you are trying to find "hello" in a text from a play, you can just simply use the pattern hello
. One thing to note is that regex is case sensitive so while hello
will find what you are looking for Hello
will not.
Operators
* Operator
Use the * to match a particular character or set 0 or more times. To use this operator you will put it behind the character or character set. To match the following
you can use the pattern so*per
and it will match all four.
+ Operator
Use the + to match a particular character or set 1 or more times. To use this operator you will put it behind the character or character set. To match the following
you can use the pattern Ahwo+gah
and it will match all three.
? Operator
Use the ? to match a particular character or set 0 or 1 time. To use this operator you will put it behind the character or character set. To match the following
you can use the pattern so?o?per
to match all three.
Character Classes
Character classes can be used instead of the ? when there are several characters that could possibly be in one spot. To create a character class you put the characters you want to match in square brackets []. To match the following
you can use the patter s[ou]per
to match all three.
You can use operators with Character classes as well. To match the following
you can use the pattern [a-z]*[A-Z]*
Ranges
You can use ranges to create classes. A-Z matches all capital letters. a-z matches all lowercase letters. 0-9 matches all numbers.
Excluding
If you want to match everything but a chracter you can use the ^ operator as the first character inside a character class. To use this operator in this way you need to have the ^ as the first character in the []. To exclude matches with a digit you can use .
Shorthand Character Classes
Special character classes will always start with a backslash \
. You can use special character classes inside character classes. [\d\w]
You can also use operators with character classes. \d?
POSIX
DESCRIPTION
ASCII
Unicode
Shorthand
[:alnum:]
Alphanumeric characters
[a-zA-Z0-9]
[\p{L}\p{Nl}\p{Nd}]
[:alpha:]
Alphabetic characters
[a-zA-Z]
\p{L}\p{Nl}
[:ascii:]
ASCII characters
[\x00-\x7F]
\p{InBasicLatin}
[:blank:]
Space and tab
[ \t]
[\p{Zs}\t]
\h
[:cntrl:]
Control characters
[\x00-\x1F\x7F]
\p{Cc}
[:digit:]
Digits
[0-9]
\p{Nd}
\d
[:graph:]
Visible characters. Anything other than spaces and control characters.
[\x21-\x7E]
[^\p{Z}\p{C}]
[:lower:]
Lowercase letters
[a-z]
\p{Ll}
\l
[:print:]
Visible characters and spaces. Anything other than control characters. }
[\x20-\x7E]
\p{C}
[:punct:]
Punctuatjion and symbols.
[!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_‘{\|}~]
\p{P}
[:space:]
All whitespace characters including line breaks.
[ \t\r\n\v\f]
[\p{Z}\t\r\n\v\f]
\s
[:upper:]
Uppercase letters
[A-Z]
\p{Lu}
\u
[:word:]
Word characters: letters, numbers and underscores.
[A-Za-z0-9_]
[\p{L}\p{Nl} \p{Nd}\p{Pc}]
\w
[:xdigit:]
Hexadecimal digits.
[A-Fa-f0-9]
[A-Fa-f0-9]
Last updated
Was this helpful?