Regular Expression Syntax

Syntax of the builtin regular expression library

These regular expressions are implemented using the package written by Henry Spencer, based on the 1003.2 spec and some (not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description of regular expressions below is copied verbatim from his manual entry.

An ARE is one or more branches, separated by `$\vert$', matching anything that matches any of the branches.

A branch is zero or more constraints or quantified atoms, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.

A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. The quantifiers, and what a so-quantified atom matches, are:

* a sequence of 0 or more matches of the atom
+ a sequence of 1 or more matches of the atom
? a sequence of 0 or 1 matches of the atom
{m} a sequence of exactly m matches of the atom
{m,} a sequence of m or more matches of the atom
{m,n} a sequence of m through n (inclusive) matches of the atom; m may not exceed n
*? +? ?? {m}? {m,}? {m,n}? non-greedy quantifiers, which match the same possibilities, but prefer the smallest number rather than the largest number of matches (see Matching)

The forms using { and } are known as bounds. The numbers m and n are unsigned decimal integers with permissible values from 0 to 255 inclusive. An atom is one of:

(re) (where re is any regular expression) matches a match for re, with the match noted for possible reporting
(?:re) as previous, but does no reporting (a ``non-capturing'' set of parentheses)
() matches an empty string, noted for possible reporting
(?:) matches an empty string, without reporting
$[chars]$ a bracket expression, matching any one of the chars (see Bracket Expressions for more detail)
. matches any single character
$\backslash$k (where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g. $\backslash\backslash$ matches a backslash character
$\backslash$c where c is alphanumeric (possibly followed by other characters), an escape (AREs only), see Escapes below
{ when followed by a character other than a digit, matches the left-brace character `{'; when followed by a digit, it is the beginning of a bound (see above)
x where x is a single character with no other significance, matches that character.

A constraint matches an empty string when specific conditions are met. A constraint may not be followed by a quantifier. The simple constraints are as follows; some more constraints are described later, under Escapes.

^ matches at the beginning of a line
$ matches at the end of a line
(?=re) positive lookahead (AREs only), matches at any point where a substring matching re begins
(?!re) negative lookahead (AREs only), matches at any point where no substring matching re begins

The lookahead constraints may not contain back references (see later), and all parentheses within them are considered non-capturing.

An RE may not end with `$\backslash$'.

ymasuda 平成17年11月19日