Content-type: text/html Man page of regexp

regexp

Section: C Library Functions (3)
Index Return to Main Contents
 

NAME

advance, advance_r, compile, compile_r, step, step_r - Regular expression compile and match routines  

SYNOPSIS

#define INIT declarations #define GETC getc code #define PEEKC peek code #define UNGETC(c) ungetc code #define RETURN(ptr) return code #define ERROR(val) error code

#include <regexp.h>

char *compile(
        char *instring,
        char *expbuf,
        const char *endbuf,
        int eof);

int step(
        const char *string,
        const char *expbuf);

int advance(
        const char *string,
        const char *expbuf);

extern char *loc1, *loc2, *locs;

The following functions do not conform to current standards and are supported only for backward compatibility:

char *compile_r(
        char *instring,
        char *expbuf,
        char *endbuf,
        int eof,
        struct regexp_data *regexp_data);

int advance_r(
        char *string,
        char *expbuf,
        struct regexp_data *regexp_data);

int step_r(
        char *string,
        char *expbuf,
        struct regexp_data *regexp_data);  

STANDARDS

Interfaces documented on this reference page conform to industry standards as follows:

advance(), compile(), step():  XPG4, XPG4-UNIX

Refer to the standards(5) reference page for more information about industry standards and associated tags.  

PARAMETERS

The value of the next character (byte) in the regular expression pattern. Returned by the next call to the GETC() and PEEKC() macros. Specifies a pointer to the character following the last character of the compiled regular expression. Specifies an error value. Specifies a string to be passed to the compile() function.

The instring parameter is never used explicitly by the compile() function, but you can use it in your macros. For example, you may want to pass the string containing a pattern as the instring parameter to the compile() function and use the INIT() macro to set a pointer to the beginning of this string. When your macros do not use instring, call the compile() function with a value of ((char *) 0) for this parameter. Points to a character array where the compiled regular expression is stored. Points to the location that immediately follows the character array where the compiled regular expression is stored. When the compiled expression cannot be contained in (endbuf-expbuf) number of bytes, a call to the ERROR(_BIGREGEXP) macro is made (see the ERRORS section). Specifies the character that marks the end of the regular expression. For example, in ed this character is usually a / (slash). Points to a NULL terminated string of characters, in the step() function, to be searched for a match. Is data for the compile_r(), step_r(), and advance_r() functions.
 

DESCRIPTION

The compile(), advance(), and step() functions are used for general-purpose expression matching.

The compile() function takes a simple regular expression as input and produces a compiled expression that can be used with the step() and advance() functions.

The following six macros, used in the compile() function, must be defined before the #include <regexp.h> statement in programs. The GETC(), PEEKC(), and UNGETC() macros operate on the regular expression provided as input for the compile() function. The INIT() macro is used for dependent declarations and initializations. In the regexp.h header file this macro is located right after the compile() function declarations and opening { (left brace). Your INIT() declarations must end with a ; (semicolon).

The INIT() macro is frequently used to set a register variable to point to the beginning of the regular expression, so that this pointer can be used in declarations for GETC(), PEEKC(), and UNGETC(). Alternatively, you can use INIT() to declare external variables that GETC(), PEEKC(), and UNGETC() need. The GETC() macro returns the value of the next character (byte) in the regular-expression pattern. Successive calls to GETC() return successive characters of the regular expression. The PEEKC() macro returns the next character (byte) in the regular expression. Immediate subsequent calls to this macro return the same byte, which is also the next character returned by the GETC() macro. The UNGETC() macro causes the c parameter to be returned by the next call to the GETC() and PEEKC() macros. No more than one character of pushback is ever needed because this character is guaranteed to be the last character read by the GETC() macro. The value of the UNGETC() macro is always ignored. The RETURN() macro is used for normal exit of the compile() function. The value of the ptr parameter is a pointer to the character following the last character of the compiled regular expression. This is useful in programs that manage memory allocation. The ERROR() macro is the abnormal return from the compile() function. A call to this macro should never return a value. In this macro, val is an error number, which is described in the ERRORS section of this reference page.

The step() function finds the first substring of the string parameter that matches the compiled expression pointed to by the expbuf parameter. When there is no match, the step() function returns a value of 0 (zero). When there is a match, the step() function returns a nonzero value and sets two global character pointers: loc1, which points to the first character of the substring that matches the pattern, and loc2, which points to the character immediately following the substring that matches the pattern. When the regular expression matches the entire expression, loc1 points to the first character of the string parameter and loc2 points to the NULL character at the end of the expression specified by the string parameter.

The step() function uses the integer variable circf, which is set by the compile() function when the regular expression begins with a ^ (circumflex). When this variable is set, the step() function only tries to match the regular expression to the beginning of the string. When you compile more than one regular expression before executing the first one, save the value of circf for each compiled expression and set circf to the saved value before each call to step().

The advance() function tests whether an initial substring of the string parameter matches the expression pointed to by the expbuf parameter. Using the same parameters that were passed to it, the step() function calls the advance() function. The step() function increments a pointer through the string parameter characters and calls advance() until a nonzero value, which indicates a match, is returned, or until the end of the expression pointed to by the string parameter is reached. To unconditionally constrain string to point to the beginning of the expression, call the advance() function directly instead of calling step().

When the advance() function encounters an * (asterisk) or a \{\} sequence in the regular expression, it advances its pointer to the string to be matched as far as possible and recursively calls itself, trying to match the remainder of the regular expression. As long as there is no match, the advance() function backs up along the string until the function finds a match or reaches the point in the string where the initial match with the * or \{\} character occurred.

It is sometimes desirable to stop this backing up before the initial pointer position in the string is reached. When the locs global character pointer is matched with the character at the pointer position in the string during the backing-up process, the advance() function breaks out of the recursive loop that backs up and returns the value 0 (zero).

The compile_r(), step_r(), and advance_r() functions are the reentrant versions of the compile(), step(), and advance() functions. They are supported in order to maintain backward compatibility with operating system versions prior to Tru64 UNIX Version 4.0.

The regexp.h header file defines the regexp_data structure.  

EXAMPLES

The following is an example of the regular expression macros and calls from the grep command:

#define INIT register char *sp=instring; #define GETC() (*sp++) #define PEEKC() (*sp) #define UNGETC(c) (--sp) #define RETURN(c) return; #define ERROR(c) regerr()

#include <regexp.h>
        . . . compile (patstr, expbuf, &expbuf[ESIZE], '\0');
        . . . if (step (linebuf, expbuf))
        succeed( );
        . . .  

NOTES

This interface has been deprecated in favor of the regcomp() interface specified by the POSIX and X/Open standards. The regexp interface is provided to support System V applications. Traditional BSD applications use different functions for regular expression handling. See the re_comp(3) and re_exec(3) reference pages.

The advance(), compile(), and step() functions are scheduled to be withdrawn from a future version of the X/Open CAE Specification.  

RETURN VALUES

Upon successful completion, the compile() function calls the RETURN() macro. Upon failure, this function calls the ERROR() macro.

Whenever a successful match occurs, the step() and advance() functions return a nonzero value. Upon failure, these functions return a value of 0 (zero).

[Digital]  The compile_r(), step_r(), and advance_r() functions return the same values as their non-reentrant counterparts.  

ERRORS

If any of the following conditions occurs, the compile() or compile_r() functions call the ERROR() macro with an error value as its argument: The range endpoint is too large. A bad number was received. The number in \digit is out of range. There is an illegal or missing delimiter. There is no remembered search string. The use of a pair of \( and \) is unbalanced. There are too many \( and \) pairs (maximum is 9). More than two numbers are given in the \{ and \} pair. A } character was expected after a \. The first number exceeds the second in the \{ and \} pair. There is a [ ] pair imbalance. There is a regular expression overflow. [Digital]  There was an unknown error.  

RELATED INFORMATION

Functions: ctype(3), fnmatch(3), glob(3), regcomp(3), re_comp(3)

Commands: ed(1), sed(1), grep(1)

Standards: standards(5) delim off


 

Index

NAME
SYNOPSIS
STANDARDS
PARAMETERS
DESCRIPTION
EXAMPLES
NOTES
RETURN VALUES
ERRORS
RELATED INFORMATION

This document was created by man2html, using the manual pages.
Time: 02:42:06 GMT, October 02, 2010