Syntax of the builtin regular expression library

In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available.

Normally the flavor of RE being used is specified by application-dependent means. However, this can be overridden by a director. If an RE of any flavor begins with `***:', the rest of the RE is an ARE. If an RE of any flavor begins with `***=', the rest of the RE is taken to be a literal string, with all characters considered ordinary characters.

An ARE may begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These supplement, and can override, any options specified by the application. The available option letters are:

b rest of RE is a BRE
c case-sensitive matching (usual default)
e rest of RE is an ERE
i case-insensitive matching (see Matching, below)
m historical synonym for n
n newline-sensitive matching (see Matching, below)
p partial newline-sensitive matching (see Matching, below)
q rest of RE is a literal (``quoted'') string, all ordinary characters
s non-newline-sensitive matching (usual default)
t tight syntax (usual default; see below)
w inverse partial newline-sensitive (``weird'') matching (see Matching, below)
x expanded syntax (see below)

Embedded options take effect at the ) terminating the sequence. They are available only at the start of an ARE, and may not be used later within it.

In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded syntax, available in AREs with the embedded x option. In the expanded syntax, white-space characters are ignored and all characters between a # and the following newline (or the end of the RE) are ignored, permitting paragraphing and commenting a complex RE. There are three exceptions to that basic rule:

Expanded-syntax white-space characters are blank, tab, newline, and any character that belongs to the space character class.

Finally, in an ARE, outside bracket expressions, the sequence `(?#ttt)' (where ttt is any text not containing a `)') is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols like `(?:'. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead.

None of these metasyntax extensions is available if the application (or an initial ***= director) has specified that the user's input be treated as a literal string rather than as an RE.

ymasuda 平成17年11月19日