Content-type: text/html
grep, egrep, fgrep - Searches a file for patterns
grep [-E|-F] [-c|-l|-q] [-bhinsvwxy] [-pparagraph_separator] -e pattern_list [-e pattern_list]... [-f pattern_file]... [file...]
grep [-E|-F] [-c|-l|-q] [-bhinsvwxy] [-pparagraph_separator] [-e pattern_list]... -f pattern_file [-f pattern_file]... [file...]
grep [-E|-F] [-c|-l|-q] [-bhinsvwxy] [-pparagraph_separator] pattern_list [file...]
The commands grep -E and grep -F are equivalent to the obsolescent commands egrep and fgrep, respectively.
The
grep
command searches the specified files (standard
input by default) for lines containing characters that match the specified
patterns, and then writes matching lines to standard output.
Interfaces documented on this reference page conform to industry standards as follows:
grep: XPG4, XPG4-UNIX
egrep: XPG4, XPG4-UNIX
fgrep: XPG4, XPG4-UNIX
Refer to the
standards(5)
reference page for more information
about industry standards and associated tags.
Although most options can be combined, some combinations result in one
option overriding another. For example, if you specify
-n
and
-l, the output includes file names only (as specified
by
-l) and thus does not include line numbers (as specified
by
-n).
Treats patterns as extended regular expressions and is equivalent
to the obsolescent
egrep
command.
Treats patterns as fixed strings and is equivalent to the
obsolescent
fgrep
command.
[Compaq] Precedes each line by the block number on which
it was found. Use this option to help find disk block numbers by context.
Displays only a count of matching lines.
Used to specify one or more patterns to match. If more than
one pattern is specified in
pattern_list, they
must be separated by newline characters (carriage returns). The
-e
option is useful for specifying a pattern that begins with a
-
(dash).
Specifies a file that contains patterns to match, one per
line.
[Compaq] Suppresses reporting of file names when multiple
files are processed. That is, it prevents the name of the file containing
the matching line from being appended to that line.
Ignores the case of letters pattern matching; that is, uppercase
and lowercase in the input are considered to be identical.
Lists only the name of each file containing matched lines.
Each file name is listed only once; file names are separated by newline characters.
The
grep
command returns
(standard input)
(or the local equivalent) in place of a file name if
-l
is
specified with standard input.
Precedes each line with its relative line number in the file.
[Compaq] Displays the entire paragraph containing matched
lines. Paragraphs are delimited by paragraph separators,
paragraph_separator, which are patterns in the same form as the search pattern.
Lines containing the paragraph separators are used only as separators; they
are never included in the output. The default paragraph separator is a blank
line.
Suppresses all output except error messages. This is useful
for checking status.
Suppresses error messages arising from non-existent or unreadable
files. Other error messages are still displayed.
Displays all lines except those that match the specified pattern.
Useful for filtering unwanted lines out of a file.
[Compaq] Matches only if the expression is found as a separate
word in the text. A word is any string of alphanumeric characters (letters,
numerals, and underscores) delimited by nonalphanumeric characters (punctuation
or white space) or by the beginning or end of the line). See
ex.
Displays a line only if the pattern matches the entire line.
[Compaq] Same as
-i
option.
Specify one or more patterns to be used during the search
for input. This operand is treated as if it were specified as
-e pattern_list.
A path name of a file to be searched for the patterns. If
no
file
operands are specified, the standard input
is used.
By default, the grep command treats a pattern as a basic regular expression (BRE). With the -E option, the pattern is treated as an extended regular expression (ERE). With the -F option, the pattern is considered a fixed string. See the following discussion of regular expressions.
In the output of the grep command, a matched line is preceded with the name of the file in which it was found if you specify more than one file (except when the -h option is specified).
[Compaq] You are strongly encouraged to single quote patterns to protect them from unwanted shell substitutions. In some cases, such as in multiline pattern lists and subexpressions, quoting is essential. When using the C shell interactively, you must enter a backslash before terminating a line in a multiline pattern.
[Compaq] Running
grep
on a file that is not a text
file (for example, an
.o
file) produces unpredictable
results and is discouraged.
Regular expressions (RE's) provide a powerful way to specify patterns to search for in text files (or in the standard input). This section explains the rules for constructing such patterns.
On Tru64 UNIX (and XPG4 conforming systems) there are two standard types of REs, and thus two sets of rules for building patterns. The two types of a regular expression that can be built by using these rules are termed either basic regular expression (BRE) or extended regular expression (ERE). There is much in common between BREs and EREs, but there are important differences as well.
A variety of commands and utilities use one or the other type of RE, or both. Thus the rules described below are applicable in many contexts. Nonetheless, the grep command is used illustratively here.
The term regular expression, or RE, is used when there is no need to distinguish between BREs and EREs. The terms pattern and regular expression can be used interchangeably. The term match is used to describe a string in a file (or standard input) that is successfully specified by a pattern or RE. A pattern or an RE may also be referred to as a string. The matched string might also be termed a substring or a sequence (of characters).
Simple REs match a single character. More complex REs are built from
other REs as explained in the rules below. REs are defined recursively; for
example, if you concatenate two REs, the resultant string is an RE.
The concept of a character is generalized to the concept of a collating element. For many purposes, especially in English-speaking locales, the term collating element may be considered synonymous with character. Collating elements are relevant to bracket expressions, and are discussed in the following sections.
A collating element is the smallest unit used to determine how to order characters. They are necessary for languages that treat some strings as individual collating elements. For example, in Spanish, the strings ch and ll each are collating symbols (that is, the Spanish primary sort order is a, b, c, ch, d,...,k, l, ll, m,...).
As an example, suppose we have a file test that contains these three lines:
ab acbcbc 12356
The command grep 'b' test results in this output: ab acbcbc
because the RE b, the pattern, matches the letter b in the first and second lines of the file, and there is no b in the third line. The RE c would match just the second line. The RE bc, built by concatenating the prior two REs, would match just the second line.
There are two instances of bc in the second line, so the pattern matches the line. However, in using some of the rules that build REs, it is important to understand exactly what substrings are matched by a pattern.
Those rules are given in the following sections, but for illustration, consider the RE c.*b. This pattern means match a string beginning with c, ending with b, and with any number of characters between, including none. Thus this pattern matches lines containing cb, cxb, and canythingb, for example.
The search for a match starts at the beginning of a string and stops when the first sequence matching the pattern is found scanning from left to right. If there is more than one possible leftmost match, the longest match is used. For example, in the file test above, the pattern c.*b matches the second through third characters of the second line, and also the second through the fifth characters. The latter, being the longer, is the actual match. However, a longer substring that is not the leftmost match is not a match.
A null pattern will match any character, so the command
grep '' test
matches all three lines.
A multicharacter collating element is considered a single character in the rules below that describe how to form a bracket expression, which matches a single character. However, when considering what the longest sequence is in a match involving a multicharacter collating element, the element counts not as one character but as the number of characters it matches.
Pattern matching can be done in a case-insensitive manner. Case-insensitive processing permits matching of multicharacter collating elements as well as characters. For example, in grep -i '[[.Ch.]]' file
the RE [[.Ch.]] would match ch, Ch, cH, or CH. The notation is explained below.
Some utilities that use regular expressions, including
grep, process a file line by line. A line ends with a newline character.
In general (but not with
grep
the newline character is
regarded as an ordinary character and both a period and a nonmatching list
can match one. (See discussion below.) Some utilities, including
grep, do not allow newline characters in a pattern to be matched.
Basic regular expressions (BREs) are built by concatenating simpler BREs. BREs can be classified as those that can match a single character in the search string, and those that can match multiple characters.
The following BREs match a single character (or collating element):
An ordinary character, a special character preceded by a backslash, or a period (.), matches a single character. A bracket expression matches a single character or a single collating element. These terms are defined in the following sections.
BRE Ordinary Characters
Any character, except for those listed in the section ``BRE Special Characters,'' below, is an ordinary character and is a BRE that matches itself.
Except for the following, do not quote ordinary characters with a backslash (\): The characters (, ), { and }. The use of these characters quoted with backslashes is explained in the sections on subexpressions and interval expressions under the heading ``BREs Matching Multiple Characters,'' following. The digits 1 to 9 inclusive. The use of these numerals quoted with backslashes is explained in the section on back-reference expressions under the heading ``BREs Matching Multiple Characters,'' below.
You can not use a backslash to quote a character inside a bracket expression; inside a bracket expression a backslash is an ordinary character.
These characters, (, ), {, }, and 1 - 9 are considered ``ordinary characters'' (see next section) because they do not have to be quoted with a backslash to match themselves as do ``special characters.''
BRE Special Characters
Some characters have special meaning when used in a BRE in some contexts, defined next. Outside such contexts, or in the context but quoted with a preceding backslash, these characters have no special meaning, and each is a BRE that matches itself. The BRE special characters and contexts are: The period, left bracket, and backslash are special except when used in a bracket expression (discussed below). A pattern containing a [ that is not preceded by a backslash and is not part of a bracket expression is not valid. The asterisk is special except when used in a bracket expression, as the first character of a complete pattern (after an initial ^, if any), or as the first character of a subexpression (after an initial ^ if any); The circumflex is special when used as an anchor or as the first character of a bracket expression. These concepts are explained below. The dollar sign is special when used as an anchor.
Periods in BREs
A period (.), when used outside a bracket expression, is a BRE that matches any character.
BRE Bracket Expression
A non-null string enclosed in [ ] (brackets) is called a Bracket Expression. It is a BRE that matches any single character (or collating element) in the enclosed string. For example, using the sample file test described above, the command grep '[a3][c5]' test
outputs the second and third lines, acbcbc and 12356, because the two contiguous bracket expressions in the pattern match the substrings ac and 35 in those lines.
A bracket expression is either a matching list expression or a nonmatching list expression. It consists of one or more collating elements, collating symbols, equivalence classes, character classes or range expressions.
The right bracket (]) loses its special meaning and represents itself in a bracket expression if it occurs first in the list (after an initial circumflex (^), if any). Otherwise, it terminates the bracket expression, unless it appears in a collating symbol (such as [.].] ) or is the ending right bracket for a collating symbol, equivalence class, or character class. The special characters . * [ \ (period, asterisk, left bracket and backslash) lose their special meanings within a bracket expression.
The character sequences [., [=, and [: (left bracket followed by a period, equal sign, or colon) are special inside a bracket expression and are used to delimit collating symbols, equivalence class expressions and character class expressions. These symbols must be followed by a valid expression and the matching terminating sequence .], =], or :], as defined next.
The rules follow for creating and using matching and nonmatching list expressions, collating symbol, equivalence class expression, character class expression, and range expression, in bracket expressions. A matching list expression, such as [a3] in the example above, specifies a list that matches any character or collating element in the list. The first character in the list can not be a circumflex. [a3] matches either the character a or the character 3. A nonmatching list expression begins with a circumflex (^), and specifies a list that matches any character or collating element except for the expressions in the list after the leading circumflex. For example, [^abc] is a BRE that matches any character or collating element except the characters a, b or c. If the circumflex does not appear immediately following the left bracket, it loses its special meaning. A collating symbol is a collating element enclosed within bracket-period ([. .]) delimiters. The concept is introduced above under the heading ``Regular Expression Concepts.''
BREs Matching Multiple Characters
The rules above describe how to construct a BRE that matches a single character. In some of the examples above, patterns that match multiple characters were given based on the intuitive concept of concatenation. This, and the other rules used to build BREs which match multiple characters from BREs matching single characters, follow. The concatenation of BREs matches the concatenation of the strings matched by each component of the BRE. A subexpression can be defined within a BRE by enclosing it between the character pairs \( and \). Such a subexpression matches whatever it would have matched without the \( and \).
BRE Expression Anchoring--Restricting What Patterns Match
A pattern (an entire BRE) can be restricted to match from the beginning of a line, restricted to match up to the end of the line, or restricted to match the entire line. This is done by anchoring the search pattern. A ^ (circumflex) at the beginning of an expression or subexpression causes the pattern to match only a string that begins in the first character position on a line. For example, the pattern ^bc matches bc in the line bcdef but doesn't match bc in abcdef. The subexpression \(^bc\) also matches bcdef. A $ (dollar sign) at the end of a pattern causes that pattern to match only if the last matched character is the last character (not including the newline character) on a line. The construction ^pattern$ restricts the pattern to matching only an entire line. For example, the BRE ^abcd$ matches lines containing the string abcd, where a is the first character on the line and d the last.
BRE Precedence
The order of precedence, for high to low, is as shown in the following table:
collation-related bracket symbols | [= =] [: :] [. .] |
escaped characters | \<special character> |
bracket expressions | [ ] |
subexpressions/back-references | \( \) \n |
single-character duplication | * \{i,j\} |
concatenation | |
anchoring | ^ $ |
Like BREs, extended regular expressions (EREs) are built by concatenating simpler EREs. EREs can be classified as those that can match a single character, and those that can match multiple characters.
An ERE ordinary character, an ERE special character preceded by a backslash, or a period matches a single character. A bracket expression matches a single character or a single collating element. An ERE matching a single character enclosed in parentheses (a group) matches the same strings as the ERE without parentheses.
ERE Ordinary Characters
Any character, except for special characters listed below, is an ordinary character and is an ERE that matches itself.
ERE Special Characters
Some characters have special meaning when used in a ERE in some contexts, defined next. Outside such contexts, or in the context but quoted with a preceding backslash, these character have no special meaning, and each is a ERE that matches itself. The ERE special characters and contexts are: The period, left bracket, backslash and left parenthesis are special except when used in a bracket expression. Outside a bracket expression, do not use a left parenthesis, (, unless it is quoted with a backslash, \(. The right parenthesis is special when matched with a preceding left parenthesis, outside a bracket expression. To search for the string (), use the quoted form \(). The asterisk, plus sign, question mark, and left brace are special except when used in a bracket expression. Outside of a bracket expression, it is invalid to use any of them as the first character in an ERE, or immediately following a vertical line, a circumflex, or a left parenthesis. It is invalid to use a left brace that is not part of an interval expression. (Of course, quoting with a backslash removes such invalidity.) The vertical line is special except when used in a bracket expression. It is invalid to use a vertical line first or last in an ERE, or immediately following another vertical line or a left parenthesis, or immediately preceding a right parenthesis. The circumflex is special when used as an anchor or as the first character of a bracket expression. The dollar sign is special when used as an anchor.
Periods in EREs
A period (.), when used outside a bracket expression, is an ERE that matches any character.
ERE Bracket Expression
The rules for ERE Bracket Expressions are the same as for the BRE bracket expressions discussed above.
EREs Matching Multiple Characters
The rules above describe how to construct an ERE that matches a single character. The rules used to build EREs which match multiple characters from EREs matching single characters follow. A concatenation of EREs matches the concatenation of the strings matched by each component of the ERE. A concatenation of EREs enclosed in parentheses, matches whatever the concatenation without the parentheses matches. For example, both EREs ab and (ab) match the second and third characters of the string cabcdabc. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character plus sign (+) matches what one or more consecutive occurrences of the ERE would match. For example, the ERE (ab)a+ matches the second to sixth character in the string cabaaabc and c(ab)+ matches the first to seventh characters in the string cabababc. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character asterisk (*) matches what zero or more consecutive occurrences of the ERE would match. For example, the ERE b*c matches the first character in the string cabbbcde, and the ERE c*de matches the second to sixth characters in the string dcccdec. The EREs [cd]+ and [cd][cd]* are equivalent and [cd]* and [cd][cd] are equivalent when matching the string cd. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character question mark (?) matches what zero or one consecutive occurrence of the ERE would match. For example, the ERE c?d matches the third character in the string abdbcccde. An ERE matching a single character or an ERE enclosed in parentheses followed by an interval expression of the format {i}, {i,}, or {i,j}, matches what repeated consecutive occurrences of the ERE would match. The rules for matching are the same as for BRE interval expressions (discussed above) except for the notational difference.
ERE Alternation
If x and y are EREs, then x|y is an ERE that matches any string that is matched by either x or y. For example, the ERE ((cd)|e)b matches the string cdb and the string eb. Single characters, or expressions matching single characters, separated by the vertical bar and enclosed in parentheses, match a single character.
ERE Expression Anchoring
ERE anchoring is the same as BRE anchoring, discussed above.
ERE Precedence
The order of precedence, for high to low, is as shown in the following table:
collation-related bracket symbols | [= =] [: :] [. .] |
escaped characters | \<special character> |
bracket expression | [ ] |
grouping | ( ) |
single-character duplication | * + ? {i,j} |
concatenation | |
anchoring | ^ $ |
alternation | | |
For example, the pattern
ab|cd
is the same as
(ab)|(cd)
and is not equivalent to
a(b|c)d.
The exit values of the
grep
command are:
A match was found.
No match was found.
A syntax error was found or a file was inaccessible, even
if matches were found.
To search several C-language source files for the pattern strcpy, enter: grep 'strcpy' *.c
The following environment variables affect the execution of
grep,
egrep, and
fgrep:
Provides a default value for the internationalization variables
that are unset or null. If
LANG
is unset or null, the corresponding value from the default locale is used.
If any of the internationalization variables contain an invalid setting, the
utility behaves as if none of the variables had been defined.
If set to any string value, overrides the values of all the
other internationalization variables.
Determines the locale for the interpretation of sequences
of bytes of text data as characters (for example, single-byte as opposed to
multibyte characters in arguments and input files) and the behavior of character
classes within regular expressions..
Determines the locale for the format and contents of diagnostic
messages written to standard error.
Determines the location of message catalogues for the processing
of
LC_MESSAGES.
Commands: ed(1), ex(1), ksh(1), sed(1), Bourne shell sh(1b), POSIX shell sh(1p)
Standards: standards(5)