Demonstrate proficiency with regular expressions

Author: name contact BSD flavour

Reviewer: name contact BSD flavour

Reviewer: name contact BSD flavour


Concept

Regular expressions(((expressions)))(((regular expressions))) are part of the daily life of a system administrator. Be able to match text patterns when analyzing program output or searching through files. Be able to specify a range of characters within brackets [], specify a literal, use a repetition operator, recognize a metacharacter(((metacharacter))) and create an inverse filter.

Introduction

Regular expressions are used for matching patterns in a line of text. The placement and patterns of characters are represented by special characters (^ $ . * [ ] ). Other ordinary characters are also used for matching. Regular expressions are used by many different Unix command-line tools, interactive applications, and programming languages. Some common tools included in the base BSD systems that provide regular expression support include grep, sed, awk, ed, and vi.

The following describes the basic syntax and common usage of regular expressions.

A circumflex "^" at the beginning of the regular expression is used to start matching at the beginning of a line.

A dollar-sign "$" at the end of the regular expression is used to quit matching at the end of a line.

A period "." is used to match any character except a newline (Ctrl-J) at the end of a line.

An asterisk "*" is used to match zero or repeated adjacent matches of the preceding regular expression.

Brackets "[]" containing a string of characters are used to match a single character (and no others). If the bracketed string begins with a circumflex "^" then the matching is reversed and it will match any single character (except the end of the line newline) not listed in the characters in the bracketed string.

The backslash "\" is used to match one of the special characters literally.

TODO: Be able to match text patterns when analyzing program output

TODO: searching through files.

TODO: Be able to specify a range of characters within brackets []

TODO: specify a literal

TODO: use a repetition operator

TODO: recognize a metacharacter(((metacharacter)))

TODO: create an inverse filter

Some programs may also have extended regular expression features; these are not covered in the BSDA exam nor this book. Note that some regular expression implementations may assume that the brace "{" is a special character. TODO

Note that regular expressions are different than shell file name matching (such as asterisk or brackets). Use quotes around regular expressions on command lines so they aren't processed by the shell first.

The grep command is used to print the complete lines that match a pattern. Lines that don't match can be output instead by using the grep -v switch for inverse.

Examples

Note that there may be many different ways to match for the same patterns. The following examples share some basic wasy for pattern matching.

The following regular expression (just the special period) will match all non-empty lines in the hosts database:

""$ grep . /etc/hosts

The following regular expression matches the root username in the password database:

""$ grep ^root: /etc/passwd

The following regular expression matches the accounts that use the csh shell:

""$ grep :/bin/csh$ /etc/passwd

The following regular expression can find the service name for TCP port 80 in the services database:

""$ grep " 80/tcp" /etc/services

The following examples may be used to estimate the number of emails in a standard mbox mailbox:

""$ grep "^From " /var/mail/reed | wc -l

or use grep's -c switch to count the matches:

""$ grep -c "^From " /var/mail/reed

The following regular expression can be used to list the short group names of three or less characters in length:

""$ grep "^.:" /etc/group ""$ grep "^..:" /etc/group ""$ grep "^...:" /etc/group

All usernames containing a vowel can be listed with the following example. Note that the circumflex in the brackets is used so that it only matches the pattern before the first colon.

""$ grep "^[^:][aeiouAEIOU][^:]:" /etc/passwd

The following example shows how to match a literal special character --- the period. This example lists the uncommented IPv4 addresses in the hosts database:

""$ grep "^[^#.][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9]" /etc/hosts

More information

grep(1), egrep(1), fgrep(1), re_format(7)