Ubuntu Manpage: safecat - safely write data to a file

Provided by: safecat_1.13+git20170317.185e8bf-1_amd64

NAME

       safecat - safely write data to a file

SYNOPSIS

       safecat tempdir destdir

INTRODUCTION

       safecat  is  a  program which implements Professor Daniel Bernstein's maildir algorithm to
       copy stdin safely to a file in a specified directory.  With safecat, the user  is  offered
       two  assurances.   First,  if  safecat  returns a successful exit status, then all data is
       guaranteed to be saved in the destination directory.  Second, if  a  file  exists  in  the
       destination  directory,  placed  there  by  safecat,  then  the  file  is guaranteed to be
       complete.

       When saving data with safecat, the user specifies a destination directory, but not a  file
       name.   The  file name is selected by safecat to ensure that no filename collisions occur,
       even if many safecat processes and other programs implementing the maildir  algorithm  are
       writing  to  the  directory simultaneously.  If particular filenames are desired, then the
       user should rename the file after safecat completes.  In general, when spooling data  with
       safecat,  a  single, separate process should handle naming, collecting, and deleting these
       files.  Examples of such a process are daemons, cron jobs, and mail readers.

RELIABILITY ISSUES

       A machine may crash while data is being written to disk.   For  many  programs,  including
       many  mail  delivery  agents,  this means that the data will be silently truncated.  Using
       Professor Bernstein's maildir algorithm, every file is guaranteed complete or nonexistent.

       Many people or programs may write data to a common "spool" directory.   Systems  like  mh-
       mail  store  files  using  numeric  names in a directory.  Incautious writing to files can
       result in a collision, in which one write succeeds and the other appears  to  succeed  but
       fails.   Common strategies to resolve this problem involve creation of lock files or other
       synchronizing mechanisms, but such mechanisms are subject  to  failure.   Anyone  who  has
       deleted  $HOME/.netscape/lock  in order to start netscape can attest to this.  The maildir
       algorithm is immune to this problem because it uses no locks at all.

THE MAILDIR ALGORITHM

       As described in maildir(5), safecat applies the maildir algorithm by writing data  in  six
       steps.   First,  it stat()s the two directories tempdir and destdir, and exits unless both
       directories exist and are writable.  Second, it stat()s  the  name  tempdir/time.pid.host,
       where  time is the number of seconds since the beginning of 1970 GMT, pid is the program's
       process ID, and host is the host name.  Third, if  stat()  returned  anything  other  than
       ENOENT,  the  program  sleeps for two seconds, updates time, and tries the stat() again, a
       limited number of times.  Fourth, the program creates tempdir/time.pid.host.   Fifth,  the
       program  NFS-writes  the  message  to  the  file.   Sixth, the program link()s the file to
       destdir/time.pid.host.  At that instant the data has been successfully written.

       In addition, safecat starts a 24-hour timer  before  creating  tempdir/time.pid.host,  and
       aborts the write if the timer expires.  Upon error, timeout, or normal completion, safecat
       attempts to unlink() tempdir/time.pid.host.

EXIT STATUS

       An exit status of 0 (success) implies that all data has been safely committed to disk.   A
       non-zero  exit  status  should  be  considered to mean failure, though there is an outside
       chance that safecat wrote the data successfully, but didn't think so.

       Note again that if a file appears in the destination directory, then it is  guaranteed  to
       be complete.

       If  safecat  completes successfully, then it will print the name of the newly created file
       (without its path) to standard output.

SUGGESTED APPLICATIONS

Exciting uses for safecat abound, obviously, but a word may be in order to suggest what
they are.

If you run Linux and use qmail instead of sendmail, you should consider converting your
inbox to maildir for its superior reliability. If your home directory is NFS mounted,
qmail forces you to use maildir.

If you write CGI applications to collect data over the World Wide Web, you might find
safecat useful. Web applications suffer from two major problems. Their performance
suffers from every stoppage or bottleneck in the internet; they cannot afford to introduce
performance problems of their own. Additionally, web applications should NEVER leave the
server and database in an inconsistent state. This is likely, however, if CGI scripts
directly frob some database--particularly if the database is overloaded or slow. What
happens when users get bored and click "Stop" or "Back"? Maybe the database activity
completes. Maybe the CGI script is killed, leaving the DB in an inconsistent state.

Consider the following strategy. Make your CGI script dump its request to a spool
directory using safecat. Immediately return a receipt to the browser. Now the browser
has a complete guarantee that their submission is received, and the perceived performance
of your web application is optimal.

Meanwhile, a spooler daemon notices the fresh request, snatches it and updates the
database. Browsers can be informed that their request will be fulfilled in X minutes.
The result is optimal performance despite a capricious internet. In addition, users can
be offered nearly 100% reliability.

EXAMPLES

       To convince sendmail to use maildir for message delivery, add the following line  to  your
       .forward file:

       |SAFECAT HOME/Maildir/tmp HOME/Maildir/new || exit 75 #USERNAME

       where SAFECAT is the complete path of the safecat program, HOME is the complete path to
       your home directory, and USERNAME is your login name. Making this change is likely to pay
       off; many campuses and companies mount user home directories with NFS.  Using maildir to
       deliver to your inbox folder helps ensure that your mail will not be lost due to some NFS
       error.  Of course, if you are a System Administrator, you should consider switching to
       qmail.

       To run a program and catch its output safely into some directory, you can use a shell
       script like the following.

       #!/bin/bash

       MYPROGRAM=cat              # The program you want to run
       TEMPDIR=/tmp               # The name of a temporary directory
       DESTDIR=$HOME/work/data    # The directory for storing information

       try() { $* 2>/dev/null || echo NO 1>&2 }

       set `( try $MYPROGRAM | try safecat $TEMPDIR $DESTDIR ) 2>&1`
       test "$?" = "0"  || exit -1
       test "$1" = "NO" && { rm -f $DESTDIR/$2; exit -1; }

       This script illustrates the pitfalls of writing secure programs with the shell.  The
       script assumes that your program might generate some output, but then fail to complete.
       There is no way for safecat to know whether your program completed successfully or not,
       because of the semantics of the shell.  As a result, safecat might create a file in the
       data directory which is "complete" but not useful.  The shell script deletes the file in
       that case.

       More generally, the safest way to use safecat is from within a C program which invokes
       safecat with fork() and execve().  The parent process can the simply kill() the safecat
       process if any problems develop, and optionally can try again.  Whether to go to this
       trouble depends upon how serious you are about protecting your data.  Either way, safecat
       will not be the weak link in your data flow.

BUGS

In order to perform the last step and link() the temporary file into the destination
directory, both directories must reside in the same file system. If they do not, safecat
will quietly fail every time. In Professor Bernstein's implementation of maildir, the
temporary and destination directories are required to belong to the same parent directory,
which essentially avoids this problem. We relax this requirement to provide some
flexibility, at the cost of some risk. Caveat emptor.

Although safecat cleans up after itself, it may sometimes fail to delete the temporary
file located in tempdir. Since safecat times out after 24 hours, you may freely delete
any temporary files older than 36 hours. Files newer than 36 hours should be left alone.
A system of data flow involving safecat should include a cron job to clean up temporary
files, or should obligate consumers of the data to do the cleanup, or both. In the case
of qmail, mail readers using maildir are expected to scan and clean up the temporary
directory.

The guarantee of safe delivery of data is only "as certain as UNIX will allow." In
particular, a disk hardware failure could result in safecat concluding that the data was
safe, when it was not. Similarly, a successful exit status from safecat is of no value if
the computer, its disks and backups all explode at some subsequent time.

In other words, if your data is vital to you, then you won't just use safecat. You'll
also invest in good equipment (possibly including a RAID disk), a UPS for the server and
drives, a regular backup schedule, and competent system administration. For many
purposes, however, safecat can be considered 100% reliable.

Also note that safecat was designed for spooling email messages; it is not the right tool
for spooling large files--files larger than 2GB, for example. Some operating systems have
a bug which causes safecat to fail silently when spooling files larger than 2GB. When
building safecat, you can take advantage of conditional support for large files on Linux;
see conf-cc for further information.

CREDITS

       The maildir algorithm was devised by Professor Daniel Bernstein, the author of qmail.
       Parts of this manpage borrow directly from maildir(5) by Professor Bernstein.  In
       particular, the section "THE MAILDIR ALGORITHM" transplants his explanation of the maildir
       algorithm in order to illustrate that safecat complies with it.

       The original code for safecat was written by the present author, but was since augmented
       with heavy borrowings from qmail code.  However, under no circumstances should the author
       of qmail be contacted concerning safecat bugs; all are the fault, and the responsibility,
       of the present author.

       Copyright (c) 2000, Len Budney. All rights reserved.