Regular Expressions

Regular expressions are characters and meta-characters that are used to identify parts of text. Using meta-characters, pattern matching can be very specific or generalized. They are used to find and take actions on text. The following is not a comprehensive list, but does cover the majority of operations that are used when parsing logs/text/files.

Note: For quick testing/building of regular expressions sites such as RegExr.com can really help.

Specifying Position

These are used to specify a position within a string or line.

Start of a line/string

End of a line/string

Start of a string

End of a string

Specifying Characters

These are used to specify a particular type of character.

Any character except a newline

Control character

Digit character (e.g. 0-9)

Non-digit character

New line character

Octal digit

Carriage return character

Whitespace character (tab/space/etc)

Non-whitespace character

Tab character

Word

Non-word

Hexadecimal digit

Specifying POSIX Character Classes

These are alternative nomenclatures for specifying character types under the POSIX standard.

[:upper:]

Uppercase characters [A-Z]

[:lower:]

Lowercase characters [a-z]

[:digit:]

Any digit character [0-9]

[:space:]

Any space character (space/tab/etc)

[:alpha:]

Any uppercase or lowercase alphabetical character [A-Za-z]

[:alnum:]

Any uppercase, lowercase, or digit character [A-Za-z0-9]

[:punct:]

Any punctuation character

[:xdigit:]

Any hexadecimal digit

[:cntrl:]

Any control character

Specifying Quantity

These are used to specify how many times the preceding pattern has to match. For example:

\d{3}-?\d{2}-?\d{4}

Matches a Social Security Number (SSN) format either with or without dashes.

Zero or more instances of the previous pattern (or single character)

One or more instances of the previous pattern (or single character)

Zero or one instance (only, not more than) of the previous pattern (or single character)

{NUMBER}

Exactly NUMBER instances

{NUMBER,}

NUMBER or more instances

{NUMBER_A, NUMBER_B}

NUMBER_A to NUMBER_B instances

Specifying Logic

These are used to specify how matching occurs and are used to make more complex patterns

[ ]

Specify a range

[A-M]

Single character in the range inclusive between "A" and "M" (e.g. "A", "B", "C", "D", ... "K", "L", "M")

[1-4]

Single digit in the range inclusive between 1 and 4 (e.g. 1, 2, 3, 4)

(A|B)

Single character that is either "A" or "B"

[ABC]

Single character that is either "A" or "B" or "C"

[^ABC]

Single character that is not "A" and not "B" and not "C"

Report abuse