Pipes and Filters
Overview
Teaching: 15 min
Exercises: 0 minQuestions
How can I combine existing commands to do new things?
Objectives
Using Wildcards
When run in the
moleculesdirectory, whichlscommand(s) will produce this output?
ethane.pdb methane.pdb
ls *t*ane.pdbls *t?ne.*ls *t??ne.pdbls ethane.*Solution
The solution is
3.
1.shows all files that contain any number and combination of characters, followed by the lettert, another single character, and end withane.pdb. This includesoctane.pdbandpentane.pdb.
2.shows all files containing any number and combination of characters,t, another single character,ne.followed by any number and combination of characters. This will give usoctane.pdbandpentane.pdbbut doesn’t match anything which ends inthane.pdb.
3.fixes the problems of option 2 by matching two characters betweentandne. This is the solution.
4.only shows files starting withethane..
What Does
sort -nDo?If we run
sorton this file:10 2 19 22 6the output is:
10 19 2 22 6If we run
sort -non the same input, we get this instead:2 6 10 19 22Explain why
-nhas this effect.Solution
The
-nflag specifies a numeric sort, rather than alphabetical.
Piping Commands Together
In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?
wc -l * > sort -n > head -n 3wc -l * | sort -n | head -n 1-3wc -l * | head -n 3 | sort -nwc -l * | sort -n | head -n 3Solution
Option 4 is the solution. The pipe character
|is used to feed the standard output from one process to the standard input of another.>is used to redirect standard output to a file. Try it in thedata-shell/moleculesdirectory!
Why Does
uniqOnly Remove Adjacent Duplicates?The command
uniqremoves adjacent duplicated lines from its input. For example, the filedata-shell/data/salmon.txtcontains:coho coho steelhead coho steelhead steelheadRunning the command
uniq salmon.txtfrom thedata-shell/datadirectory produces:coho steelhead coho steelheadWhy do you think
uniqonly removes adjacent duplicated lines? (Hint: think about very large data sets.) What other command could you combine with it in a pipe to remove all duplicated lines?Solution
$ sort salmon.txt | uniq
Removing Unneeded Files
Suppose you want to delete your processed data files, and only keep your raw files and processing script to save storage. The raw files end in
.datand the processed files end in.txt. Which of the following would remove all the processed data files, and only the processed data files?
rm ?.txtrm *.txtrm * .txtrm *.*Solution
- This would remove
 .txtfiles with one-character names- This is correct answer
 - The shell would expand
 *to match everything in the current directory, so the command would try to remove all matched files and an additional file called.txt- The shell would expand
 *.*to match all files with any extension, so this command would delete all files
Wildcard Expressions
Wildcard expressions can be very complex, but you can sometimes write them in ways that only use simple syntax, at the expense of being a bit more verbose.
Consider the directorydata-shell/north-pacific-gyre/2012-07-03: the wildcard expression*[AB].txtmatches all files ending inA.txtorB.txt. Imagine you forgot about this.
Can you match the same set of files with basic wildcard expressions that do not use the
[]syntax? Hint: You may need more than one expression.The expression that you found and the expression from the lesson match the same set of files in this example. What is the small difference between the outputs?
Under what circumstances would your new expression produce an error message where the original one would not?
Solution
1.
``` $ ls *A.txt $ ls *B.txt ``` {: .bash} 2. The output from the new commands is separated because there are two commands. 3. When there are no files ending in `A.txt`, or there are no files ending in `B.txt`.
Which Pipe?
The file
data-shell/data/animals.txtcontains 586 lines of data formatted as follows:2012-11-05,deer 2012-11-05,rabbit 2012-11-05,raccoon 2012-11-06,rabbit ...Assuming your current directory is
data-shell/data/, what command would you use to produce a table that shows the total count of each type of animal in the file?
grep {deer, rabbit, raccoon, deer, fox, bear} animals.txt | wc -lsort animals.txt | uniq -csort -t, -k2,2 animals.txt | uniq -ccut -d, -f 2 animals.txt | uniq -ccut -d, -f 2 animals.txt | sort | uniq -ccut -d, -f 2 animals.txt | sort | uniq -c | wc -lSolution
Option 5. is the correct answer. If you have difficulty understanding why, try running the commands, or sub-sections of the pipelines (make sure you are in the
data-shell/datadirectory).
Key Points