* Back-reference Operator:: \digit
* Anchoring Operators:: ^ $
-Repetition Operators
+Repetition Operators
* Match-zero-or-more Operator:: *
* Match-one-or-more Operator:: +
* Character Class Operators:: [:class:]
* Range Operator:: start-end
-Anchoring Operators
+Anchoring Operators
* Match-beginning-of-line Operator:: ^
* Match-end-of-line Operator:: $
* Match-word-constituent Operator:: \w
* Match-non-word-constituent Operator:: \W
-Buffer Operators
+Buffer Operators
* Match-beginning-of-buffer Operator:: \`
* Match-end-of-buffer Operator:: \'
@itemize @bullet
@item
-see if a string matches a specified pattern as a whole, and
+see if a string matches a specified pattern as a whole, and
@item
search within a string for a substring matching a specified pattern.
number of times.
The Regex library consists of two source files: @file{regex.h} and
-@file{regex.c}.
+@file{regex.c}.
@pindex regex.h
@pindex regex.c
Regex provides three groups of functions with which you can operate on
@node Syntax Bits, Predefined Syntaxes, , Regular Expression Syntax
-@section Syntax Bits
+@section Syntax Bits
@cindex syntax bits
of bits; we refer to these bits as @dfn{syntax bits}. In most cases,
they affect what characters represent what operators. We describe the
meanings of the operators to which we refer in @ref{Common Operators},
-@ref{GNU Operators}, and @ref{GNU Emacs Operators}.
+@ref{GNU Operators}, and @ref{GNU Emacs Operators}.
For reference, here is the complete list of syntax bits, in alphabetical
order:
@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax
-@section Predefined Syntaxes
+@section Predefined Syntaxes
If you're programming with Regex, you can set a pattern buffer's
(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers})
(@pxref{Syntax Bits}) or else to the configurations defined by Regex.
These configurations define the syntaxes used by certain
programs---@sc{gnu} Emacs,
-@cindex Emacs
+@cindex Emacs
@sc{posix} Awk,
@cindex POSIX Awk
-traditional Awk,
+traditional Awk,
@cindex Awk
Grep,
@cindex Grep
@end example
@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax
-@section Collating Elements vs.@: Characters
+@section Collating Elements vs.@: Characters
@sc{posix} generalizes the notion of a character to that of a
collating element. It defines a @dfn{collating element} to be ``a
represents the open-group operator. Which one does depends on the
setting of a syntax bit, in this case @code{RE_NO_BK_PARENS}. Why is
this so? Historical reasons dictate some of the varying
-representations, while @sc{posix} dictates others.
+representations, while @sc{posix} dictates others.
Finally, almost all characters lose any special meaning inside a list
(@pxref{List Operators}).
example, @samp{xy} (two match-self operators) matches @samp{xy}.
@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators
-@section Repetition Operators
+@section Repetition Operators
Repetition operators repeat the preceding regular expression a specified
number of times.
case when it:
@itemize @bullet
-@item
+@item
is first in a regular expression, or
-@item
+@item
follows a match-beginning-of-line, open-group, or alternation
operator.
@cindex backtracking
The matcher processes a match-zero-or-more operator by first matching as
many repetitions of the smallest preceding regular expression as it can.
-Then it continues to match the rest of the pattern.
+Then it continues to match the rest of the pattern.
If it can't match the rest of the pattern, it backtracks (as many times
as necessary), each time discarding one of the matches until it can
@node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators
@subsection The Match-one-or-more Operator (@code{+} or @code{\+})
-@cindex @samp{+}
+@cindex @samp{+}
If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize
this operator. Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't
@itemize @bullet
@item
-@var{min} is greater than @var{max}, or
+@var{min} is greater than @var{max}, or
@item
any of @var{count}, @var{min}, or @var{max} are outside the range
match @samp{foo} or @samp{bar}.)
@cindex backtracking
-The matcher usually tries all combinations of alternatives so as to
+The matcher usually tries all combinations of alternatives so as to
match the longest possible string. For example, when matching
@samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot
take, say, the first (``depth-first'') combination it could match, since
-then it would be content to match just @samp{fooqbar}.
+then it would be content to match just @samp{fooqbar}.
@comment xx something about leftmost-longest
more items. An @dfn{item} is a character,
@ignore
(These get added when they get implemented.)
-a collating symbol, an equivalence class expression,
+a collating symbol, an equivalence class expression,
@end ignore
a character class expression, or a range expression. The syntax bits
affect which kinds of items you can put in a list. We explain the last
A @dfn{matching list} matches a single character represented by one of
the list items. You form a matching list by enclosing one or more items
within an @dfn{open-matching-list operator} (represented by @samp{[})
-and a @dfn{close-list operator} (represented by @samp{]}).
+and a @dfn{close-list operator} (represented by @samp{]}).
For example, @samp{[ab]} matches either @samp{a} or @samp{b}.
@samp{[ad]*} matches the empty string and any string composed of just
the first character in the list. If you put a @samp{^} character first
in (what you think is) a matching list, you'll turn it into a
nonmatching list.}) instead of an open-matching-list operator to start a
-nonmatching list.
+nonmatching list.
For example, @samp{[^ab]} matches any character except @samp{a} or
-@samp{b}.
+@samp{b}.
If the @code{posix_newline} field in the pattern buffer (@pxref{GNU
Pattern Buffers} is set, then nonmatching lists do not match a newline.
@code{RE_CHAR_CLASSES} is set and what precedes it is an
open-character-class operator followed by a valid character class name.
-@item -
+@item -
represents the range operator (@pxref{Range Operator}) if it's
not first or last in a list or the ending point of a range.
@end table
@noindent
-All other characters are ordinary. For example, @samp{[.*]} matches
-@samp{.} and @samp{*}.
+All other characters are ordinary. For example, @samp{[.*]} matches
+@samp{.} and @samp{*}.
@menu
* Character Class Operators:: [:class:]
@table @code
-@item alnum
+@item alnum
letters and digits
@item alpha
@item graph
same as @code{print} except omits space
-@item lower
+@item lower
lowercase letters
@item print
-printable characters (in the @sc{ascii} encoding, space
+printable characters (in the @sc{ascii} encoding, space
tilde---codes 040 through 0176)
@item punct
Regex recognizes @dfn{range expressions} inside a list. They represent
those characters
that fall between two elements in the current collating sequence. You
-form a range expression by putting a @dfn{range operator} between two
+form a range expression by putting a @dfn{range operator} between two
@ignore
(If these get implemented, then substitute this for ``characters.'')
of any of the following: characters, collating elements, collating symbols,
Operator}) or a repetition operator (@pxref{Repetition
Operators}).
-@item
+@item
keep track of the indices of the substring that matched a given group.
@xref{Using Registers}, for a precise explanation.
This lets you:
@item
use the back-reference operator (@pxref{Back-reference Operator}).
-@item
+@item
use registers (@pxref{Using Registers}).
@end itemize
@node Anchoring Operators, , Back-reference Operator, Common Operators
-@section Anchoring Operators
+@section Anchoring Operators
@cindex anchoring
@cindex regexp anchoring
@end menu
@node Non-Emacs Syntax Tables, Match-word-boundary Operator, , Word Operators
-@subsection Non-Emacs Syntax Tables
+@subsection Non-Emacs Syntax Tables
A @dfn{syntax table} is an array indexed by the characters in your
character set. In the @sc{ascii} encoding, therefore, a syntax table
@node Buffer Operators, , Word Operators, GNU Operators
-@section Buffer Operators
+@section Buffer Operators
Following are operators which work on buffers. In Emacs, a @dfn{buffer}
is, naturally, an Emacs buffer. For other programs, Regex considers the
Following are operators that @sc{gnu} defines (and @sc{posix} doesn't)
that you can use only when Regex is compiled with the preprocessor
-symbol @code{emacs} defined.
+symbol @code{emacs} defined.
@menu
* Syntactic Class Operators::
unsigned long allocated;
/* Number of bytes actually used in `buffer'. */
- unsigned long used;
+ unsigned long used;
/* Syntax setting with which the pattern was compiled. */
reg_syntax_t syntax;
unsigned no_sub : 1;
/* If set, a beginning-of-line anchor doesn't match at the
- beginning of the string. */
+ beginning of the string. */
unsigned not_bol : 1;
/* Similarly for an end-of-line anchor. */
@findex re_compile_pattern
@example
-char *
-re_compile_pattern (const char *@var{regex}, const int @var{regex_size},
+char *
+re_compile_pattern (const char *@var{regex}, const int @var{regex_size},
struct re_pattern_buffer *@var{pattern_buffer})
@end example
@vindex fastmap_accurate @r{field, set by @code{re_compile_pattern}}
to zero on the theory that the pattern you're compiling is different
than the one previously compiled into @code{buffer}; in that case (since
-you can't make a fastmap without a compiled pattern),
+you can't make a fastmap without a compiled pattern),
@code{fastmap} would either contain an incompatible fastmap, or nothing
at all.
@node GNU Matching, GNU Searching, GNU Regular Expression Compiling, GNU Regex Functions
-@subsection GNU Matching
+@subsection GNU Matching
@cindex matching with GNU functions
@findex re_match
@example
int
-re_match (struct re_pattern_buffer *@var{pattern_buffer},
- const char *@var{string}, const int @var{size},
+re_match (struct re_pattern_buffer *@var{pattern_buffer},
+ const char *@var{string}, const int @var{size},
const int @var{start}, struct re_registers *@var{regs})
@end example
@node GNU Searching, Matching/Searching with Split Data, GNU Matching, GNU Regex Functions
-@subsection GNU Searching
+@subsection GNU Searching
@cindex searching with GNU functions
@findex re_search
@example
-int
-re_search (struct re_pattern_buffer *@var{pattern_buffer},
- const char *@var{string}, const int @var{size},
- const int @var{start}, const int @var{range},
+int
+re_search (struct re_pattern_buffer *@var{pattern_buffer},
+ const char *@var{string}, const int @var{size},
+ const int @var{start}, const int @var{range},
struct re_registers *@var{regs})
@end example
that fails, and so on, up to @math{@var{start} + @var{range}}; if
@var{range} is negative, then it attempts a match starting first at
index @var{start}, then at @math{@var{start} -1} if that fails, and so
-on.
+on.
If @var{start} is not between zero and @var{size}, then @code{re_search}
returns @math{-1}. When @var{range} is positive, @code{re_search}
@subsection Matching and Searching with Split Data
Using the functions @code{re_match_2} and @code{re_search_2}, you can
-match or search in data that is divided into two strings.
+match or search in data that is divided into two strings.
The function:
@findex re_match_2
@example
int
-re_match_2 (struct re_pattern_buffer *@var{buffer},
- const char *@var{string1}, const int @var{size1},
- const char *@var{string2}, const int @var{size2},
- const int @var{start},
- struct re_registers *@var{regs},
+re_match_2 (struct re_pattern_buffer *@var{buffer},
+ const char *@var{string1}, const int @var{size1},
+ const char *@var{string2}, const int @var{size2},
+ const int @var{start},
+ struct re_registers *@var{regs},
const int @var{stop})
@end example
characters of @var{string} it matched. Regard @var{string1} and
@var{string2} as concatenated when you set the arguments @var{start} and
@var{stop} and use the contents of @var{regs}; @code{re_match_2} never
-returns a value larger than @math{@var{size1} + @var{size2}}.
+returns a value larger than @math{@var{size1} + @var{size2}}.
The function:
@findex re_search_2
@example
int
-re_search_2 (struct re_pattern_buffer *@var{buffer},
- const char *@var{string1}, const int @var{size1},
- const char *@var{string2}, const int @var{size2},
- const int @var{start}, const int @var{range},
- struct re_registers *@var{regs},
+re_search_2 (struct re_pattern_buffer *@var{buffer},
+ const char *@var{string1}, const int @var{size1},
+ const char *@var{string2}, const int @var{size2},
+ const int @var{start}, const int @var{range},
+ struct re_registers *@var{regs},
const int @var{stop})
@end example
address to the pattern buffer's @code{fastmap} field. You either can
compile the fastmap yourself or have @code{re_search} do it for you;
when @code{fastmap} is nonzero, it automatically compiles a fastmap the
-first time you search using a particular compiled pattern.
+first time you search using a particular compiled pattern.
To compile a fastmap yourself, use:
@sc{posix}, on the other hand, requires a different interface: the
caller is supposed to pass in a fixed-length array which the matcher
-fills. Therefore, if @code{regs_allocated} is @code{REGS_FIXED}
+fills. Therefore, if @code{regs_allocated} is @code{REGS_FIXED}
@vindex REGS_FIXED
the matcher simply fills that array.
@itemize @bullet
-@item
+@item
If the regular expression has an @w{@var{i}-th}
group not contained within another group that matches a
substring of @var{string}, then the function sets
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
@item
-0 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
+0 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
@item
-0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
+0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
@item
-1 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]}
+1 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]}
@end itemize
@item
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
@item
-1 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
+1 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
@end itemize
@item
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
@item
-@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
+@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
@end itemize
@item
If the @w{@var{i}-th} group matches a zero-length string, then the
function sets @code{@w{@var{regs}->}start[@var{i}]} and
@code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
-zero-length string.
+zero-length string.
For example, when you match the pattern @samp{(a*)b} against the string
@samp{b}, you get:
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
@item
-0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
+0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
@end itemize
@ignore
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]}
@item
-0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
+0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
@end itemize
@end ignore
@item
-If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
+If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
in turn not contained within any other group within group @var{i} and
the function reports a match of the @w{@var{i}-th} group, then it
records in @code{@w{@var{regs}->}start[@var{j}]} and
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
@item
-2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
+2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
@item
-2 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]}
+2 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]}
@end itemize
When you match the pattern @samp{((a)*b)*} against the string
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
@item
-2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
+2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
@item
-0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
+0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
@end itemize
@item
If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
in turn not contained within any other group within group @var{i}
-and the function sets
-@code{@w{@var{regs}->}start[@var{i}]} and
+and the function sets
+@code{@w{@var{regs}->}start[@var{i}]} and
@code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets
@code{@w{@var{regs}->}start[@var{j}]} and
@code{@w{@var{regs}->}end[@var{j}]} to @math{-1}.
@itemize
@item
-0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
+0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
@item
-@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
+@math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
@item
-@math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]}
+@math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]}
@end itemize
@end itemize
@node POSIX Matching, Reporting Errors, POSIX Regular Expression Compiling, POSIX Regex Functions
-@subsection POSIX Matching
+@subsection POSIX Matching
Matching the @sc{posix} way means trying to match a null-terminated
string starting at its first character. Once you've compiled a pattern
@findex regexec
@example
int
-regexec (const regex_t *@var{preg}, const char *@var{string},
+regexec (const regex_t *@var{preg}, const char *@var{string},
size_t @var{nmatch}, regmatch_t @var{pmatch}[], int @var{eflags})
@end example
@noindent
@var{preg} is the address of a pattern buffer for a compiled pattern.
-@var{string} is the string you want to match.
+@var{string} is the string you want to match.
@xref{Using Byte Offsets}, for an explanation of @var{pmatch}. If you
pass zero for @var{nmatch} or you compiled @var{preg} with the
corresponding to @var{errcode} (including its terminating null). If
@var{errbuf} and @var{errbuf_size} are nonzero, it also returns in
@var{errbuf} the first @math{@var{errbuf_size} - 1} characters of the
-error string, followed by a null.
+error string, followed by a null.
@var{errbuf_size} must be a nonnegative number less than or equal to the
size in bytes of @var{errbuf}.
@findex regfree
@example
-void
+void
regfree (regex_t *@var{preg})
@end example
If you're writing code that has to be Berkeley @sc{unix} compatible,
you'll need to use these functions whose interfaces are the same as those
-in Berkeley @sc{unix}.
+in Berkeley @sc{unix}.
@menu
* BSD Regular Expression Compiling:: re_comp ()
With Berkeley @sc{unix}, you can only search for a given regular
expression; you can't match one. To search for it, you must first
compile it. Before you compile it, you must indicate the regular
-expression syntax you want it compiled according to by setting the
+expression syntax you want it compiled according to by setting the
variable @code{re_syntax_options} (declared in @file{regex.h} to some
syntax (@pxref{Regular Expression Syntax}).
Compiling}).
@node BSD Searching, , BSD Regular Expression Compiling, BSD Regex Functions
-@subsection BSD Searching
+@subsection BSD Searching
Searching the Berkeley @sc{unix} way means searching in a string
starting at its first character and trying successive positions within