These are the main regular expression characters that you should know.
Character classes | |
Character classes provide a way to specify a set of characters. The set can be explicitly enclosed in []. The set can also be expressed by what must not be in it by beginning the set with a caret, "^". There are a number of predefined sets (eg, \d, \s, etc). The minus, "-", can be used to indicate a range of character values. Altho a character class matches only one character, a quantifier following it can be used to match multiple characters. | |
[abc] | a, b, or c (simple class) |
[^abc] | Any character except a, b, or c (negation) |
[a-zA-Z] | a through z or A through Z, inclusive (range) |
Predefined character classes | |
---|---|
. | Any character (may or may not match line terminators) |
\d | A digit: [0-9] |
\D | A non-digit: [^0-9] |
\s | A whitespace character: [ \t\n\x0B\f\r] |
\S | A non-whitespace character: [^\s] |
\w | A word character: [a-zA-Z_0-9] |
\W | A non-word character: [^\w] |
Quantifiers (repeating the previous element) | |
Greedy quantifiers - Expand as much as possible | |
X? | X, once or not at all |
X* | X, zero or more times |
X+ | X, one or more times |
X{n} | X, exactly n times |
X{n,} | X, at least n times |
X{n,m} | X, at least n but not more than m times |
Reluctant quantifiers - Expand only if forced by later failure to match | |
X?? | X, once or not at all |
X*? | X, zero or more times |
X+? | X, one or more times |
X{n}? | X, exactly n times |
X{n,}? | X, at least n times |
X{n,m}? | X, at least n but not more than m times |
Boundary matchers - Zero-width matches. | |
^ | The beginning of a line. Very useful. |
$ | The end of a line. Very userful. ^$ matches all emtpy lines. |
\b | A word boundary |
\B | A non-word boundary |
\A | The beginning of the input |
\G | The end of the previous match |
\Z | The end of the input but for the final terminator, if any |
\z | The end of the input |
Other | |
Logical operators | |
XY | X followed by Y |
X|Y | Either X or Y |
Grouping - Parentheses both group and create a numbered element that can be used later. | |
(X) | X. This capturing group is remembered so it can be referenced later. Numbered starting at 1. |
Quotation | |
\ | Nothing, but quotes the following character. |
Characters | |
x | The character x |
\\ | The backslash character |
\t | The tab character ('\u0009') |
\n | The newline (line feed) character ('\u000A') |
\r | The carriage-return character ('\u000D') |
\f | The form-feed character ('\u000C') |