Shell Scripts
Overview
Teaching: 15 min
Exercises: 0 minQuestions
How can I save and re-use commands?
Objectives
Variables in Shell Scripts
In the
molecules
directory, imagine you have a shell script calledscript.sh
containing the following commands:head -n $2 $1 tail -n $3 $1
While you are in the
molecules
directory, you type the following command:bash script.sh '*.pdb' 1 1
Which of the following outputs would you expect to see?
- All of the lines between the first and the last lines of each file ending in
.pdb
in themolecules
directory- The first and the last line of each file ending in
.pdb
in themolecules
directory- The first and the last line of each file in the
molecules
directory- An error because of the quotes around
*.pdb
Solution
The correct answer is 2.
The special variables $1, $2 and $3 represent the command line arguments given to the script, such that the commands run are:
$ head -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb $ tail -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb
The shell does not expand
'*.pdb'
because it is enclosed by quote marks. As such, the first argument to the script is'*.pdb'
which gets expanded within the script byhead
andtail
.
List Unique Species
Leah has several hundred data files, each of which is formatted like this:
2013-11-05,deer,5 2013-11-05,rabbit,22 2013-11-05,raccoon,7 2013-11-06,rabbit,19 2013-11-06,deer,2 2013-11-06,fox,1 2013-11-07,rabbit,18 2013-11-07,bear,1
An example of this type of file is given in
data-shell/data/animals.txt
.Write a shell script called
species.sh
that takes any number of filenames as command-line arguments, and usescut
,sort
, anduniq
to print a list of the unique species appearing in each of those files separately.Solution
# Script to find unique species in csv files where species is the second data field # This script accepts any number of file names as command line arguments # Loop over all files for file in $@ do echo "Unique species in $file:" # Extract species names cut -d , -f 2 $file | sort | uniq done
Find the Longest File With a Given Extension
Write a shell script called
longest.sh
that takes the name of a directory and a filename extension as its arguments, and prints out the name of the file with the most lines in that directory with that extension. For example:$ bash longest.sh /tmp/data pdb
would print the name of the
.pdb
file in/tmp/data
that has the most lines.Solution
# Shell script which takes two arguments: # 1. a directory name # 2. a file extension # and prints the name of the file in that directory # with the most lines which matches the file extension. wc -l $1/*.$2 | sort -n | tail -n 2 | head -n 1
Why Record Commands in the History Before Running Them?
If you run the command:
$ history | tail -n 5 > recent.sh
the last command in the file is the
history
command itself, i.e., the shell has addedhistory
to the command log before actually running it. In fact, the shell always adds commands to the log before running them. Why do you think it does this?Solution
If a command causes something to crash or hang, it might be useful to know what that command was, in order to investigate the problem. Were the command only be recorded after running it, we would not have a record of the last command run in the event of a crash.
Script Reading Comprehension
For this question, consider the
data-shell/molecules
directory once again. This contains a number of.pdb
files in addition to any other files you may have created. Explain what a script calledexample.sh
would do when run asbash example.sh *.pdb
if it contained the following lines:# Script 1 echo *.*
# Script 2 for filename in $1 $2 $3 do cat $filename done
# Script 3 echo $@.pdb
Solutions
Script 1 would print out a list of all files containing a dot in their name.
Script 2 would print the contents of the first 3 files matching the file extension. The shell expands the wildcard before passing the arguments to the
example.sh
script.Script 3 would print all the arguments to the script (i.e. all the
.pdb
files), followed by.pdb
. cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb.pdb
Debugging Scripts
Suppose you have saved the following script in a file called
do-errors.sh
in Nelle’snorth-pacific-gyre/2012-07-03
directory:# Calculate reduced stats for data files at J = 100 c/bp. for datafile in "$@" do echo $datfile bash goostats -J 100 -r $datafile stats-$datafile done
When you run it:
$ bash do-errors.sh *[AB].txt
the output is blank. To figure out why, re-run the script using the
-x
option:bash -x do-errors.sh *[AB].txt
What is the output showing you? Which line is responsible for the error?
Solution
The
-x
flag causesbash
to run in debug mode. This prints out each command as it is run, which will help you to locate errors. In this example, we can see thatecho
isn’t printing anything. We have made a typo in the loop variable name, and the variabledatfile
doesn’t exist, hence returning an empty string.
Key Points