Syntax of the builtin regular expression library
A bracket expression is a list
of characters enclosed in `'. It normally matches any single character from
the list (but see below). If the list begins with `^', it matches any single
character (but see below) not from the rest of the list.
If two characters
in the list are separated by `-', this is shorthand for the full range of
characters between those two (inclusive) in the collating sequence, e.g.
in ASCII matches any decimal digit. Two ranges may not share an endpoint,
so e.g. a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable
programs should avoid relying on them.
To include a literal or - in the
list, the simplest method is to enclose it in and to make it a collating
element (see below). Alternatively, make it the first character (following
a possible `^'), or (AREs only) precede it with `
'.
Alternatively, for `-', make
it the last character, or the second endpoint of a range. To use a literal
- as the first endpoint of a range, make it a collating element or (AREs
only) precede it with `
'. With the exception of these, some combinations using
(see next paragraphs), and escapes, all other special characters lose
their special significance within a bracket expression.
Within a bracket expression, a collating element (a character, a multi-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in and stands for the sequence of characters of that collating element.
wxWidgets: Currently no multi-character collating elements are defined.
So in , X can either be a single character literal or
the name of a character. For example, the following are both identical
and
and mean the same as
.
See Character Names.
Within a bracket expression, a collating element enclosed in and is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. An equivalence class may not be an endpoint of a range.
wxWidgets: Currently no equivalence classes are defined, so stands for just the single character X. X can either be a single character literal or the name of a character, see Character Names.
Within a bracket expression, the name of a character class enclosed in and stands for the list of all characters (not all collating elements!) belonging to that class. Standard character classes are:
alpha | A letter. |
upper | An upper-case letter. |
lower | A lower-case letter. |
digit | A decimal digit. |
xdigit | A hexadecimal digit. |
alnum | An alphanumeric (letter or digit). |
An alphanumeric (same as alnum). | |
blank | A space or tab character. |
space | A character producing white space in displayed text. |
punct | A punctuation character. |
graph | A character with a visible representation. |
cntrl | A control character. |
A character class may not be used as an endpoint of a range.
wxWidgets: In a non-Unicode build, these character classifications depend on the current locale, and correspond to the values return by the ANSI C 'is' functions: isalpha, isupper, etc. In Unicode mode they are based on Unicode classifications, and are not affected by the current locale.
There are two special cases of bracket expressions: the bracket expressions < and > are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character or an underscore (_). These special bracket expressions are deprecated; users of AREs should use constraint escapes instead (see Escapes below).
ymasuda 平成17年11月19日