With most Unix shells, you can easily chain several commands together, having each command use the output of the previous one as input.
$ grep -i '^q' /usr/share/dict/words | grep -iv 'qu' | tr A-Z a-z | uniq
q
qasida
qere
qeri
qintar
qoheleth
qoph
As far as results go, this is equivalent to running the commands one after another, leaving intermediate results in temporary files:
$ grep -i '^q' /usr/share/dict/words > /tmp/1
$ < /tmp/1 grep -iv 'qu' > /tmp/2
$ < /tmp/2 tr A-Z a-z > /tmp/3
$ < /tmp/3 uniq
q
qasida
qere
qeri
qintar
qoheleth
qoph
This gives the same result, but Unix pipes are smarter than that: they start all the processes at once, and have them wait for each other:
$ perl -E 'say ++$n while 1' | less # won't block indefinitely
Of course, that only works for programs that can start outputting something without having all the input first.
$ perl -E 'say ++$n while 1' | sort # will block indefinitely
Anyway, here's the important part: Unix tools are developed around the idea of files being passed around as streams, each being made up of records (lines) containing zero or more fields. (What constitutes a field varies with the application and the problem.) Unix pipes help leverage this by making any command potentially play along with any other command.
sed
With sed
, one can see the stream philosophy showing. Lines come in on the
assembly line, we exectue various commands on them, and spit them out.
$ grep ^proo /usr/share/dict/words | sed 's/oo/(O_O)/'
pr(O_O)
pr(O_O)emiac
pr(O_O)emion
pr(O_O)emium
pr(O_O)f
pr(O_O)fer
pr(O_O)fful
pr(O_O)fing
pr(O_O)fless
pr(O_O)flessly
pr(O_O)fness
pr(O_O)fread
pr(O_O)freader
pr(O_O)freading
pr(O_O)froom
pr(O_O)fy
The s
means "substitute". You can use any character after the s
. not just
slashes.
$ sed 's:foo:bar:'
$ sed 's!foo!bar!'
$ sed 's#foo#bar#'
Sed recognizes the quantifier *
for zero-or-more matches. It doesn't
recognize +
or ?
.
$ echo 'f fo foo FOO' | sed 's/fo*/bar/g'
bar bar bar FOO
Oh, and the /g
flag makes the search "global", so it's done more than once.
It's only done in a non-overlapping way, though:
$ echo 'abababa' | sed 's/aba/aCa/g'
aCabaCa
Otherwise there would be a risk of infinite regress.
Here's an example that shows captures (\(...\)
), character classes ([...]
),
and capture references (\1
, \2
etc.). The sed
scripts deletes duplicate
words from a text.
$ echo 'going to to feed the the cat' | sed 's!\([a-zA-Z]*\) \1 !\1 !g'
going to feed the cat
You can make several substitutions inside one sed
script:
$ echo 'the Great White North' | sed 's/North/Whale/; s/White/Angry/'
the Great Angry Whale
Actually, there are a number of other commands in sed
besides s
. They
allow the insertion or deletion of lines, or the manipulation of a so-called
"hold buffer". With these commands, the scope of sed
is actually quite
broad. However, more advanced scripts in sed
are probably better written
as scripts in more advanced languages. :-)
awk
With awk
, one can find and manipulate fields.
$ ps
PID TTY TIME CMD
5063 ttys000 0:00.05 -bash
5301 ttys003 0:00.22 -bash
$ ps | awk '{ print $3 }' # prints the third column
TIME
0:00.05
0:00.22
0:00.00
$ ps | awk '!/TIME/ { print $3 }' # prints third column but not 'TIME'
0:00.05
0:00.22
0:00.00
The general form of an awk
program is '<pattern> { <code> }'
. Patterns
can be regexes within /.../
(and the regexes can be negated with a !
, as
in the example above); they can be expressions in general (such as NR == 10
to only match line 10); or they can be BEGIN
or END
.
$ ls -l | awk '{ s += $5 }; END { print s }' # sums all the byte sizes
1572695
The regexes in awk
are similar to sed
, though capturing parentheses
((...)
) are no longer backwhacked. As opposed to sed
, awk
recognizes the
+
and ?
quantifiers, matching one-or-more things and zero-or-one things,
respectively.
awk
can also be used in more advanced situations. It has if
statements,
while
loops, and for
loops.
perl
Perl was originally made to occupy much the same niche as sed
and awk
.
Here are the previous sed
and awk
examples with perl
instead:
$ grep ^proo /usr/share/dict/words | perl -wpe 's/oo/(O_O)/'
pr(O_O)
pr(O_O)emiac
pr(O_O)emion
pr(O_O)emium
pr(O_O)f
pr(O_O)fer
pr(O_O)fful
pr(O_O)fing
pr(O_O)fless
pr(O_O)flessly
pr(O_O)fness
pr(O_O)fread
pr(O_O)freader
pr(O_O)freading
pr(O_O)froom
pr(O_O)fy
$ echo 'abababa' | perl -wpe 's/aba/aCa/g'
aCabaCa
$ ps | perl -wanle 'print $F[2]'
TIME
0:00.05
0:00.27
0:00.00
$ ps | perl -wanle 'if (!/TIME/) { print $3 }'
0:00.05
0:00.27
0:00.00
$ ls -l | perl -wanle '$s += $F[4]; END { print $s }'
1572695
Note the use of the -e
flag for command-line oneliners, and of -n
and
-p
to emulate normal sed
and awk
streaming behaviour, respectively.
The -l
flag means "automatically handle line endings"; these are then
removed from the end of each incoming line, and added after each print
statement.
The -w
flag turns on warnings. That's just good style, even for a oneliner.
In Perl, you can use the -a
flag to automatically split
the record into
fields, to be stored in a @F
array. The indices are zero-based, so each index
above is one smaller than the corresponding awk
field variable. Note the use
of @
for referring to the whole array, and $
for referring to only one of
its elements.
You have to explicitly use an if
statement to emulate awk
s pattern
matching. Just like awk
, Perl has BEGIN
and END
.
Use grep
on /usr/share/dict/words
to find all words with three
consecutive double letters. An example would be "bookkeeper".
Write an awk
script that expects integers in the first two fields, and
outputs their sub, difference, product, and quotient.
Use either perldoc
on the command line or
perldoc online to find out what the Perl
equivalent of awk
's NR
variable is. Also find out which arguments,
if any, split
accepts.