July 23, 2004

Scripting guidelines    [ Software ]

One of the junior admins at work just hacked up a MySQL/PostgreSQL backup script, and was looking for some feedback on it.   I probably wrote up 10x more than needed, but it's stuff worth saying about shell scripting on Unix-style systems.

A few suggestions on your script:

* The world is not a Linux box.  Program accordingly.  Ten years ago, you could 
have similarly admonished someone that 'The world is not a SysV box'; or 
twenty years previous, 'The world is not a BSD box'.  However, a careful 
programmer could have written a script ten or twenty years ago on those 
platforms that would run with no modifications needed today.

It's good that you're using /bin/bash as your script path in case you utilize some
bashisms (which you don't appear to in this script), but what happens if 
someone wants to run this on a Solaris box without bash installed?  Or HP-UX?  
Or some ancient 4.2BSD or SVR5 system that can't even spell bash?  Better to 
get in the habit of writing strictly Bourne-compatible scripts for the sake of 
portability, or use Perl (with 'require x.y.z' statements which match your use of 
the language) if that doesn't suffice.

* It's nice to handle 'input decision' information (which databases am I backing 
up, where are the backups stored, what's the password for this database?) via 
command-line arguments.  Look into using getopt and friends to parse 
arguments passed to the script, this makes it easily extensible and fits the Unix 
philosophy of "Many small programs which do one thing well".  Also makes your 
scripts easier to automate and integrate with other scripts (read "Spend less 
time talking to other people about using or modifying your old scripts, and spend
 more time writing new ones").  If you're feeling object-oriented, think of it as 
publishing methods or making a public instead of a private interface.

* Beware the -j flag to tar.  The Debian man page for 'tar' says this:

       -j, --bzip2
              filter archive through bzip2,  use  to  decompress  .bz2  files.
              WARNING:  some previous versions of tar used option -I to filter
              through bzip2.  When writing scripts, use --bzip2 instead of  -j
              so that both older and newer tar versions will work.

For maximum portability and compatability, it may be better to do something 
like:

BZIP2=/path/to/bzip2
if [ ! -x ${BZIP2} ]; then
    echo "FATAL ERROR: configured bzip2 file ${BZIP2} is not executable, 
quitting.  You may need to check the path for BZIP2 in this script!"
    exit 1
fi

TAR=/path/to/tar
if [ ! -x ${TAR} ]; then
    echo "FATAL ERROR: configured tar file ${TAR} is not executable, quitting.  
You may need to check the path for TAR in this script!"
    exit 1
fi

[ ... ]

${TAR} rf - -C $TEMP_DIR pg_${x}.dump | ${BZIP2} --options > \
sqlbackup-${DATE}.tar.bz2

You can use loops to reduce the code count / automate the executable path 
checks, but this should communicate the basic idea.  Don't trust your 
environment until you've checked it thoroughly.

* Glad to see you're using mktemp, I'm frequently too lazy to do that unless I'm
 writing a production-quality script. :)  However, you need to always check the 
return values from anything you call.  What happens to your script if /usr/tmp is 
full?  The man page for mktemp shows this example:

       The  following  sh(1) fragment illustrates a simple use of mktemp where
       the script should quit if it cannot get a safe temporary file.

              TMPFILE=`mktemp /tmp/example.XXXXXX` || exit 1
              echo "program output" >> $TMPFILE

* On the subject of checking return values, consider these two lines from your 
script:

cp $TEMP_DIR/sqlbackup-$DATE.tar.bz2 /usr/local/sqlbackups/

rm -rf $TEMP_DIR

What happens if /usr/local/sqlbackups/ doesn't exist, resides on a partition with 
0 free blocks, or is a network drive and the remote server times out?  To say 
nothing of what may happen if you can't access the first file argument to 'cp'.  I 
would strongly suggest appending a '&&' to the end of the first line to make 
execution of the second command contingent on the first line executing 
successfully.

* It is NOT good practice to store your data files with your executables.  Any 
reasonable filesystem hierarchy would keep backup data way far away from the
 scripts that generate it - you don't want to blow both away by accident or 
system failure.  I'd suggest keeping your scripts in a bin/ directory somewhere 
(preferably a common one in the standard $PATH, like /usr/local/bin, so as to 
avoid the "where the hell did that file go?" problem) and your data somewhere 
under /home, /export or /var depending on your local conventions.

Personal preference point - I'd also break out the backup directory to a variable,
defined at the top of the script, to make it over-obviously clear where the heck 
the data ended up.  Wouldn't hurt to spit out some concise all-is-well output as 
well:  'echo "Databases backed up to $destination_file" ' or something like that if
everything goes the way you expect.

* I like using `date +%Y%m%d` - it's partly personal preference, but it also 
means that an 'ls' or other listing will automatically sort from most significant 
'byte' (years) to least significant 'byte' (days).  Makes it easier to tell at a glance
which is oldest/newest, and makes it easy to find a particular date in a long list 
of files, since it'll be sorted logically.  Using '%m%d%y' means that months 
come first, then days, then years, which makes sense if you're dealing with less 
than a year's worth of logs.  However, if you run this over the course of (say) 
three years, you'll end up with entries like:

072906
073005
073104
080506
080605
080704
081206
081305
081404
081906
082005
082104
082606
082705
082804
090206
090305
090404

Compare this to '%Y%m%d':

20040731
20040807
20040814
20040821
20040828
20040904
[ ... ]
20050730
20050806
20050813
20050820
20050827
20050903
[ ... ]
20060729
20060805
20060812
20060819
20060826
20060902

Especially if you're using shell completion (like bash does by default) and you're 
trying to restore a backup from August 6 2005, it's waaay to easy to accidentally
use the wrong file with the first date style.


In general, this isn't a bad script, but it suffers from a lack of error checking and
portability.  I take a very different view of quick test/hack scripts (which is what
this reads like) and production-quality scripts (which is what this will be used 
as).  If you or anyone else depends upon a script working, then it needs to be 
production quality.  Try to write your scripts (or any kind of code) as if they'll be
used in extremely hostile conditions on platforms that you've never seen and 
for ten times longer than seems reasonable.  If you do this, you'll run into many
fewer problems now, and your tools (and fellow/future coworkers) will enjoy 
years of trouble-free use.
Posted by edobbs at July 23, 2004 01:32 PM