Content-type: text/html Man page of awk

awk

Section: User Commands (1)
Index Return to Main Contents
 

NAME

awk - Pattern scanning and processing language  

SYNOPSIS

awk [-F ERE] [-f program_file]... [-v var=val]... [argument]...

awk [-F ERE] [-v var=val]... ['program_text'] [argument]...


 

STANDARDS

Interfaces documented on this reference page conform to industry standards as follows:

awk:  XPG4, XPG4-UNIX

Refer to the standards(5) reference page for more information about industry standards and associated tags.
 

OPTIONS

Defines ERE (extended regular expression) as the value of the input field separator before any input is read. Using this option is comparable to assigning a value to the built-in variable FS. Specifies the pathname (program_file) of a file containing a awk program. If multiple instances of this option are specified, the concatenation of the files specified as program_file in the order specified is the awk program. The awk program can alternatively be specified on the command line as the single argument program_text. The var=val argument is an assignment operand that specifies a value (val) for a variable (var). The specified variable assignment occurs prior to executing the awk program, including the actions associated with BEGIN patterns (if any are in the program). Multiple occurrences of the -v option can be specified on the awk command line.
 

OPERANDS

If -f program_file is not specified, the first parameter to awk is program_text, delimited by single quotation (') characters.

See the DESCRIPTION section for the processing of this parameter. The following two types of argument can be intermixed: A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no input_file operands are specified, or if the input_file argument is -, standard input is used. The characters before the = represent the name of an awk variable. If that name is an awk reserved word, the behavior is undefined. The characters following the = are interpreted as if they appeared in the awk program preceded and followed by a double quotation (") character, in other words, as a string value. If the value is considered a numeric string, the variable is assigned a numeric value. Each such variable assignment occurs just prior to the processing of the following program_file, if any. Thus, an assignment before the first program_file argument is executed after the BEGIN actions (if any), while an assignment after the last program_file argument occurs before the END actions (if any). If there are no program_file arguments, assignments are executed before processing the standard input.
 

DESCRIPTION

The awk command executes programs written in the awk programming language, a powerful pattern matching utility for textual data manipulation. An awk program is a sequence of patterns and corresponding actions that are carried out when a pattern is read. The awk command is a more powerful tool for text manipulation than either sed or grep.

The awk command: Performs convenient numeric processing Allows variables within actions Allows general selection of patterns Allows control flow in the actions Does not require any compiling of programs

The pattern-matching and action statements of the awk language can be specified either on the command line or in a program file. In either case, the awk command first reads all program statements.

If -f program_file is not specified, the first operand to awk is program_text, delimited by single quotation (') characters.

Execution of an awk program starts by executing the actions associated with all BEGIN patterns in the order they occur in the program. Then, each operand in an input-file argument (or standard input if an input file is not specified) is processed in turn by: Reading input data until a record separator is seen (a newline character by default) Splitting the current record into fields using the current value of FS Evaluating each pattern in the program in the order of occurrence Executing the action associated with each pattern that matches the current record

The action for a matching pattern is executed before evaluating subsequent patterns. The actions associated with all END patterns are executed in program order.

Refer to the EXAMPLES section for an example that demonstrates the results of specifying a variable assignment as a flag argument or command argument in different positions on the awk command line.

The awk command reads input data in the order stated on the command line. If you specify input_file as a - (dash) or do not specify a filename, awk reads standard input.

The awk command reads input data from any of the following sources: Any input_file operands or their equivalents, which can be affected by modifying the awk variables ARGV and ARGC Standard input, in the absence of any input_file operands Arguments to the getline function

Input files must be text files. When the built-in variable RS is set to a value other than a newline character, awk supports records terminated with the specified separator up to LINE_MAX bytes.

Pattern-action statements on the command line are enclosed in ' (single quote characters) to protect them from interpretation by the shell. Consecutive pattern-action statements on the same command line are separated by a ; (semicolon), within one set of quote delimiters.

By default, the awk command treats input lines as records, separated by spaces, tabs, or a field separator you set with the FS variable. (When a space character is the field separator, multiple spaces are recognized as a single separator.) Fields are referenced as $1, $2, and so on. The reference $0 specifies the entire record (by default, a line).
 

Program Structure

A awk program is composed of pairs of the form: pattern { action}

Either the pattern or the action (including the enclosing brace characters) can be omitted.

If pattern lacks a corresponding action, awk writes the entire record that contains the pattern to standard output. If action lacks a corresponding pattern, awk applies the action to every record.
 

Actions

An action is a sequence of statements that follow C language syntax. Any single statement can be replaced by a statement list enclosed in braces. When statement is a list of statements, they must be separated by newline characters or semicolons, and are executed sequentially in order of appearance. Statements in the awk language include:

break continue delete array [expression] exit [expression] for (expression;expression;expression) statement for (variable in array) statement if (expression) statement [else statement] next print [expression_list][>file|>>file][| command] printf format[ ,expression_list][>file|>>file][| command] printf format[,expression_list ][>file] while (expression) statement variable=expression

Statements can end with a semicolon, a newline character, or the right brace enclosing the action:

{ [ statement ... ] }

Expressions can have string or numeric values and are built using the operators +, -, *, /, %, a space for string concatenation, and the C operators ++, --, +=, -=, *=, /=, =, ^=, ?:, >, >=, <, <=, ==, $, (), ~, !~, in, ||, &&, !, and !=.

Because the actions process fields, input white space is not preserved in the output.

The file and command arguments in awk statements can be literal names or expressions enclosed in double quotation (") characters. Identical string values in different statements refer to the same open file.

The print statement writes its arguments to standard output (or to a file if > file or >> file is present), separated by the current output field separator and terminated by the current output record separator.

The printf statement formats its expression list according to the format of the printf subroutine, and writes it arguments to standard output, separated by the output field separator and terminated by the output record separator. You can redirect the output into a file using the print ... file or printf( ...) > file statements.
 

Variables

Variables can be scalars, array elements (denoted x[i]), or fields. With the exception of function parameters, variables are not explicitly declared.

Variable names can consist of uppercase and lowercase alphabetic letters, the underscore character, the digits (0 to 9), and extended characters. Variable names cannot begin with a digit. Field variables are designated by $ (dollar sign), followed by a number or numerical expression. The effect of the field number expression evaluating to anything other than a non-negative integer is unspecified.

Variables are initialized to the null string. Array subscripts can be any string; they do not have to be numeric. This allows for a form of associative memory. Enclose string constants in expressions in double quotation (") characters.

There are several variables with special meaning to awk. They include: The number of elements in the ARGV array. An array of command line arguments, excluding options and the program_file arguments, numbered from zero to ARGC-1.

The arguments in ARGV can be modified or added to; ARGC can be altered. As each input file ends, awk treats the next non-null element of ARGV, up to and including the current value of ARGC-1, as the name of the next input file. Therefore, setting an element of ARGV to null means that it is not be treated as an input file. When the element is the character -, standard input is specified. When the element matches the format for an assignment (variable=value), the element is treated as an assignment rather than as the name of an awk input file. The PRINTF format for converting numbers to strings (except for output statements, where OFMT is used); %.6g by default. The variable ENVIRON is an array representing the value of the environment. The indexes of the array are strings consisting of the names of the environmental variables, and the value of each array element is a string consisting of the value of that variable. The name of the current input file. Inside a BEGIN action, the FILENAME value is undefined. Inside an END action, the value is the name of the last input file processed. The ordinal number of the current input line (record) in the current file. Inside a BEGIN action, the value is zero. Inside an END action, the value is the number of the last record processed in the last file processed. Input field separator (default is a space). If it is a space, then any number of spaces and tabs can separate fields. The number of fields in the current input line (record) with a limit of 99. The number of the current input line (record). The print statement output field separator (default is a space). The print statement output record separator (default is a newline character). The printf statement output format for converting numbers to strings in output statements (default is %.6g). The length of the string matched by the match function. Input record separator (default is a newline character). The starting position of the string matched by the match function, numbering from 1. This is always equivalent to the return value of the match function. The subscript separator string for multi-dimensional arrays.
 

Functions

There are a variety of built-in functions that can be used in awk actions.
 

Arithmetic Functions

The arithmetic functions, except for int, are based on the ISO C standard. The behavior is undefined in cases where the ISO C standard specifies that an error be returned or that the behavior is undefined. Returns the arctangent of y/x. Returns the cosine of x, where x is in radians. Returns the sine of x where x is in radians. Returns the exponential factor of x. Returns the natural logarithm of x. Returns the square root of x. Truncates its argument to an integer. It is truncated toward 0 when x > 0. Returns a random number n, such that 0 <= n < 1. Sets the seed value for rand to expr or uses the time of day if expr is omitted. The previous seed value is returned.
 

String Functions

Behave like sub (see below), except replace all occurrences of the regular expression (like the ed utility global substitute) in $0 or in the in argument, when specified. Returns the position, in characters, numbering from 1, in string s where string t first occurs, or zero if it does not occur at all. Returns the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument. Returns the position, in characters, numbering from 1, in string s where the extended regular expression ere occurs, or zero if it does not occur at all. RSTART is set to the starting position, zero if no match is found; RLENGTH is set to the length of the matched string, -1 if no match is found. Splits the string s into array elements a[1], a[2], ... a[n], and return n. The separation is done with the extended regular expression fs or with the field separator FS if fs is not given. Each array element has a string value when created. If the string assigned to any array element, with any occurrence of the decimal point character from the current locale changed to a period character, would be considered a numeric string, the array element also has the numeric value of the numeric string. The effect of a null string as the value of fs is unspecified. Formats the expressions according to the printf format given by fmt and return the resulting string. Substitutes the string repl in place of the first instance of the extended regular expression ERE in string in and return the number of substitutions. An ampersand (&) appearing in the string repl is replaced by the string from in that matches the regular expression. For each occurrence of backslash (\) encountered when scanning the string repl from beginning to end, the next character is taken literally and loses its special meaning (for example, \& is interpreted as a literal ampersand character). Except for & and \, it is unspecified what the special meaning of any such character is. If in is specified and it is not an lvalue, the behavior is undefined. If in is omitted, awk substitutes in the current record ($0). Returns the at most n character substring of s that begins at position m, numbering from 1. If n is missing, the length of the substring is limited by the length of the string s. Returns a string based on the string s. Each character in s that is an upper case letter specified to have a tolower mapping by the LC_TYPE category of the current locale is replaced in the returned string by the lower case letter specified by the mapping. Other characters in s are unchanged in the returned string. Returns a string based on the string s. Each character in s that is a lower case letter specified to have a toupper mapping by the LC_TYPE category of the current locale is replaced in the returned string by the upper case letter specified by the mapping. Other characters in s are unchanged in the returned string.
 

Input/Output and General Functions

Closes the file or pipe opened by a print or printf statement or a call to getline with the same string-valued expression. If the close was successful, the function returns zero; otherwise, it returns non-zero. Reads a record of input from a stream piped from the output of a command. The stream is created if no stream is currently open with the value of expression as its common name. The stream created is equivalent to one created by a call to the popen function with the value of expression as the command argument and a value of r as the mode argument. As long as the stream remains open, subsequent calls in which expression evaluates to the same string read subsequent records from the file. The stream will remain open until the close function is called with an expression that evaluates to the same string value. At that time, the stream is closed as if by a call to the pclose function. If var is missing, $0 and NF are set; otherwise, var is set. Sets $0 to the next input record from the current input file. This form of getline sets the NF, NR, and FNR variables. Sets variable var to the next input record from the current input file. This form of getline sets the FNR and NR variables. Reads the next record of input from a named file. The expression is evaluated to produce a string that is used as a full pathname. If the file of that name is not currently open, it is opened. As long as the stream remains open, subsequent calls in which expression evaluates to the same string value, read subsequent records from the file. The file remains open until the close function is called with an expression that evaluates to the same string value. If var is missing, $0 and NF are set; otherwise, var is set. Executes the command given by expression in a manner equivalent to the system function and returns the exit status to the command.

All forms of getline return 1 for successful input, zero for end of file, and -1 for an error.

The getline function sets $0 to the next input record from the current input file; getline < file sets $0 to the next record from file. The function getlinex sets variable x instead. Finally, command| getline pipes the output of command into getline. Each call of getline returns the next line of output from command. In all cases, getline returns 1 for a successful input, 0 (zero) for End-of-File, and -1 for an error.

The getline function sets $0 to the next input record from the current input file. The getline function returns 1 for a successful input and 0 for End-of-File.

Where strings are used as the name of a file or pipeline, the strings must be textually identical. The terminology ``same string value'' implies that ``equivalent strings'', even those that differ only by space characters, represent different files.
 

User-defined Functions

The awk language also provides user-defined functions. Such functions can be defined as: function name(args,...) { statements }

A function can be referred to anywhere in an awk program; in particular, the function's use can precede the function definition. The scope of a function is global.

Function arguments can be either scalars or arrays; the behavior is undefined if an array name is passed as an argument that the function uses as a scalar, or if a scalar expression is passed as an argument that the function uses as an array. Function arguments are passed by value if scalar and by reference if array name. Argument names are local to the function; all other variable names are global. The same name is not used as both an argument name and as the name of a function or special awk variable. The same name must not be used both as a variable name with global scope and as the name of a function. The same name must not be used within the same scope both as a scalar variable and as an array.

The number of parameters in the function definition need not match the number of parameters in the function call. Excess formal parameters can be used as local variables. If fewer arguments are supplied in a function call than are in the function definition, the extra parameters that are used in the function body as scalars is initialized with a string value of the null string and a numeric value of zero, and the extra parameters that are used in the function body as arrays are initialized as empty arrays. If more arguments are supplied in a function call than are in the function definition, the behavior is undefined.

When invoking a function, no white space can be placed between the function name and the opening parenthesis. Function calls can be nested and recursive calls can be made upon functions. Upon return from any nested or recursive function call, the values of all the calling function's parameters are unchanged, except for array parameters passed by reference. The return statement can be used to return a value.
 

Patterns

Patterns are arbitrary Boolean combinations of patterns and relational expressions (the !, ||, and && operators and parentheses for grouping). You must start and end regular expressions with slashes. You can use regular expressions as described for grep, including the following special characters: One or more occurrences of the pattern. Zero or one occurrence of the pattern. Either of two statements. Grouping of expressions.

Isolated regular expressions in a pattern apply to the entire line. Regular expressions can occur in relational expressions. Any string (constant or variable) can be used as a regular expression, except in the position of an isolated regular expression in a pattern.

If two patterns are separated by a comma, the action is performed on all lines between an occurrence of the first pattern and the next occurrence of the second.

There are two types of relational expressions that you can use. The first type has the form: expression match_operator pattern

where match_operator is either: ~ (for contains) or !~ (for does not contain).

The second type has the form: expression relational_operator expression

where relational_operator is any of the six C relational operators: <, >, <=, >=, ==, and !=. An expression can be an arithmetic expression, a relational expression, or a Boolean combination of these.
 

Special Patterns

You can use the BEGIN and END special patterns to capture control before the first and after the last input line is read, respectively. BEGIN must be the first pattern; END must be the last.

Each BEGIN pattern is matched once and its associated action executed before the first record of input is read and before command line assignment is done. Each END pattern is matched once and its associated action executed after the last record of input has been read. These two patterns have associated actions.

BEGIN and END do not combine with other patterns. Multiple BEGIN and END patterns are allowed. The actions associated with the BEGIN patterns is executed in the order specified in the program, as are the END actions. An END pattern can precede a BEGIN pattern in a program.

You have two ways to designate an extended regular expression other than white space to separate fields. You can use the -Fere option on the command line, or you can assign a string with the expression to the built-in variable FS. Either action changes the field separator to ere.

There are no explicit conversions between numbers and strings. To force an expression to be treated as a number, add 0 to it. To force it to be treated as a string, append a null string ("").
 

EXIT STATUS

The following exit values are returned: Successful completion. An error occurred.
 

EXAMPLES

To display the file lines that are longer than 72 bytes, enter: % awk 'length >72' chapter1

This command selects each line of the file chapter1 that is longer than 72 bytes. The command then writes these lines to standard output because no action is specified. To display all lines between the words start and stop, enter: % awk '/start/,/stop/' chapter1 To run an awk program (sum2.awk) that processes a file (chapter1), enter: % awk -f sum2.awk chapter1 The following awk program computes the sum and average of the numbers in the second column of the input file:

        {
                sum += $2
        } END {
        print "Sum: ", sum;
        print "Average:", sum/NR;
        }
The first action adds the value of the second field of each line to the sum variable. The awk command initializes sum, and all variables, to 0 (zero) before starting. The keyword END before the second action causes awk to perform that action after all of the input file is read. The NR variable, which is used to calculate the average, is a special variable containing the number of records (lines) that were read. To print the names of the users who have the C shell as the initial shell, enter: % awk -F: '$7 ~ /csh/ {print $1}' /etc/passwd To print the first two fields in reversed order, enter: % awk '{ print $2, $1 }' The following awk program prints the first two fields of the input file in reversed order, with input fields separated by a comma, then adds up the first column and prints the sum and average:
BEGIN { FS = "," }
        { print $2, $1}
        { s += $1 } END { print "sum is", s, "average is", s/NR } The following example shows how command line assignments synchronize with awk program statements.
Consider the following set of awk statements that make up a program named test_program:
BEGIN { if (RS == ":")
        print "Assignment in effect for BEGIN statements"
      }
      { if (RS == ":")
        print "Assignment in effect for middle statements"
      } END { if (RS == ":")
        print "Assignment in effect for END statements"
      }
Notice the different results that are produced by different ways of assigning a value to RS on the awk command line. The file text_file contains the line ``Hello, Hello''. % awk -f test_program -v RS=: text_file

Assignment in effect for BEGIN statements Assignment in effect for middle statements Assignment in effect for END statements

% awk -f test_program RS=: text_file

Assignment in effect for middle statements Assignment in effect for END statements

% awk -f test_program text_file RS=:

Assignment in effect for END statements


 

ENVIRONMENT VARIABLES

The following environment variables affect the execution of awk: Provides a default value for the internationalization variables that are unset or null. If LANG is unset or null, the corresponding value from the default locale is used. If any of the internationalization variables contain an invalid setting, the utility behaves as if none of the variables had been defined. If set to a non-empty string value, overrides the values of all the other internationalization variables. Determines the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments). Determines the locale for the format and contents of diagnostic messages written to standard error. Determines the location of message catalogs for the processing of LC_MESSAGES.
 

SEE ALSO

Commands:  grep(1), lex(1), sed(1)

Routines:  printf(3)

Programming Support Tools


 

Index

NAME
SYNOPSIS
STANDARDS
OPTIONS
OPERANDS
DESCRIPTION
Program Structure
Actions
Variables
Functions
Arithmetic Functions
String Functions
Input/Output and General Functions
User-defined Functions
Patterns
Special Patterns
EXIT STATUS
EXAMPLES
ENVIRONMENT VARIABLES
SEE ALSO

This document was created by man2html, using the manual pages.
Time: 02:42:56 GMT, October 02, 2010