Content-type: text/html Man page of grep

grep

Section: User Commands (1)
Index Return to Main Contents
 

NAME

grep, egrep, fgrep - Searches a file for patterns  

SYNOPSIS

grep [-E|-F] [-c|-l|-q] [-bhinsvwxy] [-pparagraph_separator] -e pattern_list [-e pattern_list]... [-f pattern_file]... [file...]

grep [-E|-F] [-c|-l|-q] [-bhinsvwxy] [-pparagraph_separator] [-e pattern_list]... -f pattern_file [-f pattern_file]... [file...]

grep [-E|-F] [-c|-l|-q] [-bhinsvwxy] [-pparagraph_separator] pattern_list [file...]

The commands grep -E and grep -F are equivalent to the obsolescent commands egrep and fgrep, respectively.

The grep command searches the specified files (standard input by default) for lines containing characters that match the specified patterns, and then writes matching lines to standard output.
 

STANDARDS

Interfaces documented on this reference page conform to industry standards as follows:

grep:  XPG4, XPG4-UNIX

egrep:  XPG4, XPG4-UNIX

fgrep:  XPG4, XPG4-UNIX

Refer to the standards(5) reference page for more information about industry standards and associated tags.
 

OPTIONS

Although most options can be combined, some combinations result in one option overriding another. For example, if you specify -n and -l, the output includes file names only (as specified by -l) and thus does not include line numbers (as specified by -n). Treats patterns as extended regular expressions and is equivalent to the obsolescent egrep command. Treats patterns as fixed strings and is equivalent to the obsolescent fgrep command. [Compaq]  Precedes each line by the block number on which it was found. Use this option to help find disk block numbers by context. Displays only a count of matching lines. Used to specify one or more patterns to match. If more than one pattern is specified in pattern_list, they must be separated by newline characters (carriage returns). The -e option is useful for specifying a pattern that begins with a - (dash). Specifies a file that contains patterns to match, one per line. [Compaq]  Suppresses reporting of file names when multiple files are processed. That is, it prevents the name of the file containing the matching line from being appended to that line. Ignores the case of letters pattern matching; that is, uppercase and lowercase in the input are considered to be identical. Lists only the name of each file containing matched lines. Each file name is listed only once; file names are separated by newline characters. The grep command returns (standard input) (or the local equivalent) in place of a file name if -l is specified with standard input. Precedes each line with its relative line number in the file. [Compaq]  Displays the entire paragraph containing matched lines. Paragraphs are delimited by paragraph separators, paragraph_separator, which are patterns in the same form as the search pattern. Lines containing the paragraph separators are used only as separators; they are never included in the output. The default paragraph separator is a blank line. Suppresses all output except error messages. This is useful for checking status. Suppresses error messages arising from non-existent or unreadable files. Other error messages are still displayed. Displays all lines except those that match the specified pattern. Useful for filtering unwanted lines out of a file. [Compaq]  Matches only if the expression is found as a separate word in the text. A word is any string of alphanumeric characters (letters, numerals, and underscores) delimited by nonalphanumeric characters (punctuation or white space) or by the beginning or end of the line). See ex. Displays a line only if the pattern matches the entire line. [Compaq]  Same as -i option.
 

OPERANDS

Specify one or more patterns to be used during the search for input. This operand is treated as if it were specified as -e pattern_list. A path name of a file to be searched for the patterns. If no file operands are specified, the standard input is used.
 

DESCRIPTION

By default, the grep command treats a pattern as a basic regular expression (BRE). With the -E option, the pattern is treated as an extended regular expression (ERE). With the -F option, the pattern is considered a fixed string. See the following discussion of regular expressions.

In the output of the grep command, a matched line is preceded with the name of the file in which it was found if you specify more than one file (except when the -h option is specified).

[Compaq]  You are strongly encouraged to single quote patterns to protect them from unwanted shell substitutions. In some cases, such as in multiline pattern lists and subexpressions, quoting is essential. When using the C shell interactively, you must enter a backslash before terminating a line in a multiline pattern.

[Compaq]  Running grep on a file that is not a text file (for example, an .o file) produces unpredictable results and is discouraged.
 

REGULAR EXPRESSIONS

Regular expressions (RE's) provide a powerful way to specify patterns to search for in text files (or in the standard input). This section explains the rules for constructing such patterns.

On Tru64 UNIX (and XPG4 conforming systems) there are two standard types of REs, and thus two sets of rules for building patterns. The two types of a regular expression that can be built by using these rules are termed either basic regular expression (BRE) or extended regular expression (ERE). There is much in common between BREs and EREs, but there are important differences as well.

A variety of commands and utilities use one or the other type of RE, or both. Thus the rules described below are applicable in many contexts. Nonetheless, the grep command is used illustratively here.

The term regular expression, or RE, is used when there is no need to distinguish between BREs and EREs. The terms pattern and regular expression can be used interchangeably. The term match is used to describe a string in a file (or standard input) that is successfully specified by a pattern or RE. A pattern or an RE may also be referred to as a string. The matched string might also be termed a substring or a sequence (of characters).

Simple REs match a single character. More complex REs are built from other REs as explained in the rules below. REs are defined recursively; for example, if you concatenate two REs, the resultant string is an RE.
 

Regular Expression Concepts

The concept of a character is generalized to the concept of a collating element. For many purposes, especially in English-speaking locales, the term collating element may be considered synonymous with character. Collating elements are relevant to bracket expressions, and are discussed in the following sections.

A collating element is the smallest unit used to determine how to order characters. They are necessary for languages that treat some strings as individual collating elements. For example, in Spanish, the strings ch and ll each are collating symbols (that is, the Spanish primary sort order is a, b, c, ch, d,...,k, l, ll, m,...).

As an example, suppose we have a file test that contains these three lines:

ab acbcbc 12356

The command grep 'b' test results in this output: ab acbcbc

because the RE b, the pattern, matches the letter b in the first and second lines of the file, and there is no b in the third line. The RE c would match just the second line. The RE bc, built by concatenating the prior two REs, would match just the second line.

There are two instances of bc in the second line, so the pattern matches the line. However, in using some of the rules that build REs, it is important to understand exactly what substrings are matched by a pattern.

Those rules are given in the following sections, but for illustration, consider the RE c.*b. This pattern means match a string beginning with c, ending with b, and with any number of characters between, including none. Thus this pattern matches lines containing cb, cxb, and canythingb, for example.

The search for a match starts at the beginning of a string and stops when the first sequence matching the pattern is found scanning from left to right. If there is more than one possible leftmost match, the longest match is used. For example, in the file test above, the pattern c.*b matches the second through third characters of the second line, and also the second through the fifth characters. The latter, being the longer, is the actual match. However, a longer substring that is not the leftmost match is not a match.

A null pattern will match any character, so the command

grep '' test

matches all three lines.

A multicharacter collating element is considered a single character in the rules below that describe how to form a bracket expression, which matches a single character. However, when considering what the longest sequence is in a match involving a multicharacter collating element, the element counts not as one character but as the number of characters it matches.

Pattern matching can be done in a case-insensitive manner. Case-insensitive processing permits matching of multicharacter collating elements as well as characters. For example, in grep -i '[[.Ch.]]' file

the RE [[.Ch.]] would match ch, Ch, cH, or CH. The notation is explained below.

Some utilities that use regular expressions, including grep, process a file line by line. A line ends with a newline character. In general (but not with grep the newline character is regarded as an ordinary character and both a period and a nonmatching list can match one. (See discussion below.) Some utilities, including grep, do not allow newline characters in a pattern to be matched.
 

Basic Regular Expressions

Basic regular expressions (BREs) are built by concatenating simpler BREs. BREs can be classified as those that can match a single character in the search string, and those that can match multiple characters.

The following BREs match a single character (or collating element):

An ordinary character, a special character preceded by a backslash, or a period (.), matches a single character. A bracket expression matches a single character or a single collating element. These terms are defined in the following sections.

BRE Ordinary Characters

Any character, except for those listed in the section ``BRE Special Characters,'' below, is an ordinary character and is a BRE that matches itself.

Except for the following, do not quote ordinary characters with a backslash (\): The characters (, ), { and }. The use of these characters quoted with backslashes is explained in the sections on subexpressions and interval expressions under the heading ``BREs Matching Multiple Characters,'' following. The digits 1 to 9 inclusive. The use of these numerals quoted with backslashes is explained in the section on back-reference expressions under the heading ``BREs Matching Multiple Characters,'' below.

You can not use a backslash to quote a character inside a bracket expression; inside a bracket expression a backslash is an ordinary character.

These characters, (, ), {, }, and 1 - 9 are considered ``ordinary characters'' (see next section) because they do not have to be quoted with a backslash to match themselves as do ``special characters.''

BRE Special Characters

Some characters have special meaning when used in a BRE in some contexts, defined next. Outside such contexts, or in the context but quoted with a preceding backslash, these characters have no special meaning, and each is a BRE that matches itself. The BRE special characters and contexts are: The period, left bracket, and backslash are special except when used in a bracket expression (discussed below). A pattern containing a [ that is not preceded by a backslash and is not part of a bracket expression is not valid. The asterisk is special except when used in a bracket expression, as the first character of a complete pattern (after an initial ^, if any), or as the first character of a subexpression (after an initial ^ if any); The circumflex is special when used as an anchor or as the first character of a bracket expression. These concepts are explained below. The dollar sign is special when used as an anchor.

Periods in BREs

A period (.), when used outside a bracket expression, is a BRE that matches any character.

BRE Bracket Expression

A non-null string enclosed in [ ] (brackets) is called a Bracket Expression. It is a BRE that matches any single character (or collating element) in the enclosed string. For example, using the sample file test described above, the command grep '[a3][c5]' test

outputs the second and third lines, acbcbc and 12356, because the two contiguous bracket expressions in the pattern match the substrings ac and 35 in those lines.

A bracket expression is either a matching list expression or a nonmatching list expression. It consists of one or more collating elements, collating symbols, equivalence classes, character classes or range expressions.

The right bracket (]) loses its special meaning and represents itself in a bracket expression if it occurs first in the list (after an initial circumflex (^), if any). Otherwise, it terminates the bracket expression, unless it appears in a collating symbol (such as [.].] ) or is the ending right bracket for a collating symbol, equivalence class, or character class. The special characters . * [ \ (period, asterisk, left bracket and backslash) lose their special meanings within a bracket expression.

The character sequences [., [=, and [: (left bracket followed by a period, equal sign, or colon) are special inside a bracket expression and are used to delimit collating symbols, equivalence class expressions and character class expressions. These symbols must be followed by a valid expression and the matching terminating sequence .], =], or :], as defined next.

The rules follow for creating and using matching and nonmatching list expressions, collating symbol, equivalence class expression, character class expression, and range expression, in bracket expressions. A matching list expression, such as [a3] in the example above, specifies a list that matches any character or collating element in the list. The first character in the list can not be a circumflex. [a3] matches either the character a or the character 3. A nonmatching list expression begins with a circumflex (^), and specifies a list that matches any character or collating element except for the expressions in the list after the leading circumflex. For example, [^abc] is a BRE that matches any character or collating element except the characters a, b or c. If the circumflex does not appear immediately following the left bracket, it loses its special meaning. A collating symbol is a collating element enclosed within bracket-period ([. .]) delimiters. The concept is introduced above under the heading ``Regular Expression Concepts.''

Multicharacter collating elements are represented as collating symbols to distinguish them from the individual characters in the collating symbol. For example, when using Spanish collation rules, [[.ch.]] is treated as a BRE matching the sequence ch, while [ch] is treated as an BRE matching c or h. In addition, [a-[.ch.]] matches a, b, c, and ch. (See range expressions, below.) Collating symbols are valid only inside bracket expressions. An equivalence class expression specifies a set of collating elements that all sort to the same primary location. An equivalence class is enclosed in bracket-equal ([= =]) delimiters.
An equivalence class generally is designed to deal with primary-secondary sorting; that is, for languages like French, that define groups of characters as sorting to the same primary location, and then having a tie-breaking, secondary sort.
For example, if x, y, and z are collating elements that belong to the same equivalence class, then the bracket expressions [[=x=]a], [[=y=]a], and [[=z=]a] are equivalent to [xyza]. (Here we use x, y, and z as variables representing characters in the same equivalence class; in a typical example, x might be the collating element e, and y and z the characters e with an acute accent and e with a grave accent.) If the collating element within [= =] delimiters does not belong to an equivalence class, the equivalence class expression is treated as a collating symbol, that is, the delimiters are ignored. A character class expression enclosed in bracket-colon [: :] delimiters matches any of the set of characters in the named class. Members of each of the sets are determined by the current setting of the LC_CTYPE environment variable. The supported classes are: alpha, upper, lower, digit, alnum, xdigit, space, print, punct, graph, and cntrl. Here is an example of how to specify one of these classes: [[:lower:]]
This matches any single lowercase character for the current locale. A range expression represents the set of collating elements that fall between two elements in the current collation sequence, inclusively. It is expressed as starting and ending points separated by a hyphen (-). For example, the BRE 1[a-d]2, which includes the bracket expression [a-d], containing the range expression a-d, represents a pattern that will match any of these strings: 1a2, 1b2, 1c2, and 1d2.
Range expressions should not be used in portable applications because their behavior depends on collating sequences.
A construction such as [a-d-g] is invalid.
The hyphen character loses its special meaning in a bracket expression if it occurs first (after an initial ^, if any) or last, or as an ending range point in a range expression. For example, the expressions [-df] and [df-] are equivalent and match any of the characters d, f, or -. The expressions [^-df] and [^df-] are equivalent and match any characters except d, f and -; the expression [&--] matches any character between &, and - inclusive; the expression [--;] matches any of the characters between - and ; inclusive; and the expression [A--] is invalid, because A follows - in the collation sequence. A hyphen or right bracket may be represented as collating symbols, [.-.] or [.].], anywhere in a bracket expression; Otherwise, if both - and ] are required in a bracket expression, bracket must be first (after an optional initial ^) and the hyphen last.

BREs Matching Multiple Characters

The rules above describe how to construct a BRE that matches a single character. In some of the examples above, patterns that match multiple characters were given based on the intuitive concept of concatenation. This, and the other rules used to build BREs which match multiple characters from BREs matching single characters, follow. The concatenation of BREs matches the concatenation of the strings matched by each component of the BRE. A subexpression can be defined within a BRE by enclosing it between the character pairs \( and \). Such a subexpression matches whatever it would have matched without the \( and \).

Up to nine subexpressions are saved into numbered holding spaces. Counting from left to right on the line, the first pattern saved is placed in the first holding space, the second pattern is placed in the second holding space, and so on.
The character sequence \n, called a back-reference expression, matches the nth saved pattern, which is in the nth holding space. (The value of n is a digit, 1-9.) Thus, the pattern:
\(a\)\(b\)c\2\1
matches the string abcba. You can nest patterns to be saved in holding spaces. Whether the enclosed patterns are nested or are in a series, \n refers to the nth occurrence, counting from the left, of the delimiting characters \). In utilities that have replacement as well as search patterns, you can use \n expressions in the replacement strings as well as in the search patterns.
A back-reference expression is invalid if less than n subexpressions precede the \n. Finally, any number of subexpressions are allowed in a search pattern even though the number of back-reference expressions is limited to nine. If a BRE x matches a single character, or is a subexpression or a back-reference, then the pattern x* (x followed by an asterisk), matches zero or more occurrences of the character that the BRE x matches. For example, this pattern:
ab*cd
matches each of these strings:
acd abcd abbcd abbbcd
but not this string:
abd A BRE that matches a single character, or that is a subexpression or a back-reference, followed by an interval expression of the format \{i\}, \{i,\} or \{i,j\}, matches what repeated consecutive occurrences of the BRE would match. Such a BRE followed by: matches exactly i occurrences of the character matched by the BRE matches at least i occurrences of the character matched by the BRE matches any number of occurrences of the character matched by the BRE from i to j, inclusive.
The values of i and j must be integers in the range 0 <= i <= j <= 255. Whenever a choice exists, the pattern matches as many occurrences as possible.
Note that if i is 0 (zero), the interval expression is equivalent to the null BRE.

BRE Expression Anchoring--Restricting What Patterns Match

A pattern (an entire BRE) can be restricted to match from the beginning of a line, restricted to match up to the end of the line, or restricted to match the entire line. This is done by anchoring the search pattern. A ^ (circumflex) at the beginning of an expression or subexpression causes the pattern to match only a string that begins in the first character position on a line. For example, the pattern ^bc matches bc in the line bcdef but doesn't match bc in abcdef. The subexpression \(^bc\) also matches bcdef. A $ (dollar sign) at the end of a pattern causes that pattern to match only if the last matched character is the last character (not including the newline character) on a line. The construction ^pattern$ restricts the pattern to matching only an entire line. For example, the BRE ^abcd$ matches lines containing the string abcd, where a is the first character on the line and d the last.

BRE Precedence

The order of precedence, for high to low, is as shown in the following table:

collation-related bracket symbols[= =] [: :] [. .]
escaped characters\<special character>
bracket expressions[ ]
subexpressions/back-references\( \) \n
single-character duplication* \{i,j\}
concatenation
anchoring^ $


 

Extended Regular Expressions

Like BREs, extended regular expressions (EREs) are built by concatenating simpler EREs. EREs can be classified as those that can match a single character, and those that can match multiple characters.

An ERE ordinary character, an ERE special character preceded by a backslash, or a period matches a single character. A bracket expression matches a single character or a single collating element. An ERE matching a single character enclosed in parentheses (a group) matches the same strings as the ERE without parentheses.

ERE Ordinary Characters

Any character, except for special characters listed below, is an ordinary character and is an ERE that matches itself.

ERE Special Characters

Some characters have special meaning when used in a ERE in some contexts, defined next. Outside such contexts, or in the context but quoted with a preceding backslash, these character have no special meaning, and each is a ERE that matches itself. The ERE special characters and contexts are: The period, left bracket, backslash and left parenthesis are special except when used in a bracket expression. Outside a bracket expression, do not use a left parenthesis, (, unless it is quoted with a backslash, \(. The right parenthesis is special when matched with a preceding left parenthesis, outside a bracket expression. To search for the string (), use the quoted form \(). The asterisk, plus sign, question mark, and left brace are special except when used in a bracket expression. Outside of a bracket expression, it is invalid to use any of them as the first character in an ERE, or immediately following a vertical line, a circumflex, or a left parenthesis. It is invalid to use a left brace that is not part of an interval expression. (Of course, quoting with a backslash removes such invalidity.) The vertical line is special except when used in a bracket expression. It is invalid to use a vertical line first or last in an ERE, or immediately following another vertical line or a left parenthesis, or immediately preceding a right parenthesis. The circumflex is special when used as an anchor or as the first character of a bracket expression. The dollar sign is special when used as an anchor.

Periods in EREs

A period (.), when used outside a bracket expression, is an ERE that matches any character.

ERE Bracket Expression

The rules for ERE Bracket Expressions are the same as for the BRE bracket expressions discussed above.

EREs Matching Multiple Characters

The rules above describe how to construct an ERE that matches a single character. The rules used to build EREs which match multiple characters from EREs matching single characters follow. A concatenation of EREs matches the concatenation of the strings matched by each component of the ERE. A concatenation of EREs enclosed in parentheses, matches whatever the concatenation without the parentheses matches. For example, both EREs ab and (ab) match the second and third characters of the string cabcdabc. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character plus sign (+) matches what one or more consecutive occurrences of the ERE would match. For example, the ERE (ab)a+ matches the second to sixth character in the string cabaaabc and c(ab)+ matches the first to seventh characters in the string cabababc. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character asterisk (*) matches what zero or more consecutive occurrences of the ERE would match. For example, the ERE b*c matches the first character in the string cabbbcde, and the ERE c*de matches the second to sixth characters in the string dcccdec. The EREs [cd]+ and [cd][cd]* are equivalent and [cd]* and [cd][cd] are equivalent when matching the string cd. An ERE matching a single character or an ERE enclosed in parentheses followed by the special character question mark (?) matches what zero or one consecutive occurrence of the ERE would match. For example, the ERE c?d matches the third character in the string abdbcccde. An ERE matching a single character or an ERE enclosed in parentheses followed by an interval expression of the format {i}, {i,}, or {i,j}, matches what repeated consecutive occurrences of the ERE would match. The rules for matching are the same as for BRE interval expressions (discussed above) except for the notational difference.

For example, the ERE d{3} matches characters eight through 10 in the string abcbcbcddddde and the ERE (bc){2,} matches characters two to seven.

ERE Alternation

If x and y are EREs, then x|y is an ERE that matches any string that is matched by either x or y. For example, the ERE ((cd)|e)b matches the string cdb and the string eb. Single characters, or expressions matching single characters, separated by the vertical bar and enclosed in parentheses, match a single character.

ERE Expression Anchoring

ERE anchoring is the same as BRE anchoring, discussed above.

ERE Precedence

The order of precedence, for high to low, is as shown in the following table:

collation-related bracket symbols [= =] [: :] [. .]
escaped characters\<special character>
bracket expression[ ]
grouping( )
single-character duplication* + ? {i,j}
concatenation
anchoring^ $
alternation|

For example, the pattern ab|cd is the same as (ab)|(cd) and is not equivalent to a(b|c)d.
 

EXIT STATUS

The exit values of the grep command are: A match was found. No match was found. A syntax error was found or a file was inaccessible, even if matches were found.
 

EXAMPLES

To search several C-language source files for the pattern strcpy, enter: grep 'strcpy' *.c

This searches for the string strcpy in all files in the current directory with names ending in .c. To count the number of lines that match a pattern, enter: grep -F -c '{' pgm.c grep -F -c '}' pgm.c
This displays the number of lines in pgm.c that contain left and right braces.
If you do not put more than one { or } on a line in your C programs, and if the braces are properly balanced, then the two numbers displayed will be the same. If the numbers are not the same, then you can display the lines that contain braces with the command: grep -n -E '\{|}' pgm.c To display all lines in a file that begin with an ASCII letter, enter: grep '^[a-zA-Z]' pgm.s
Note that because grep -F searches only for fixed strings and does not use regular expressions such as bracket expressions or anchoring, the following command causes grep to search only for the literal string ^[a-zA-Z] in pgm.s: grep -F '^[a-zA-Z]' pgm.s To display all lines that contain ASCII letters in parentheses or digits in parentheses (with spaces optionally preceding and following the letters or digits), but not letter-digit combinations in parentheses, enter: grep -E '\( *([a-zA-Z]*|[0-9]*) *\)' my.txt
This command displays lines in my.txt such as ( 783902) or (y), but not (alpha19c).
Note that with grep -E, \( and \) match parentheses in the text and ( and ) are special characters that group parts of the pattern. With grep without the -E option, the reverse is true; use ( and ) to match parentheses and \( and \) to group characters. To display all lines that do not match a pattern, enter: grep -v '^#'
This displays all lines that do not begin with a # (number sign). To display the names of files that contain a pattern, enter: grep -F -l 'rose' *.list
This searches the files in the current directory that end with .list and displays the names of those files that contain at least one line containing the string rose. To display all lines that contain uppercase characters, enter: grep [[:upper:]] pgm.s To display all lines that begin with a range of characters that includes a multicharacter collating symbol, enter: grep '^[a-[.ch.]]' pgm.s
With your locale set to a Spanish locale, this command matches all lines that begin with a, b, c, or ch.
 

ENVIRONMENT VARIABLES

The following environment variables affect the execution of grep, egrep, and fgrep: Provides a default value for the internationalization variables that are unset or null. If LANG is unset or null, the corresponding value from the default locale is used. If any of the internationalization variables contain an invalid setting, the utility behaves as if none of the variables had been defined. If set to any string value, overrides the values of all the other internationalization variables. Determines the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multibyte characters in arguments and input files) and the behavior of character classes within regular expressions.. Determines the locale for the format and contents of diagnostic messages written to standard error. Determines the location of message catalogues for the processing of LC_MESSAGES.
 

SEE ALSO

Commands:  ed(1), ex(1), ksh(1), sed(1), Bourne shell sh(1b), POSIX shell sh(1p)

Standards:  standards(5)


 

Index

NAME
SYNOPSIS
STANDARDS
OPTIONS
OPERANDS
DESCRIPTION
REGULAR EXPRESSIONS
Regular Expression Concepts
Basic Regular Expressions
Extended Regular Expressions
EXIT STATUS
EXAMPLES
ENVIRONMENT VARIABLES
SEE ALSO

This document was created by man2html, using the manual pages.
Time: 02:42:48 GMT, October 02, 2010