Provided by: fssync_1.7-1build1_all bug

Name

       fssync - File system synchronization tool (1-way, over SSH)

SYNOPSIS

       fssync -d db -r root [option...] host

DESCRIPTION

       fssync  is  a  1-way file-synchronization tool that tracks inodes and maintains a local database of files
       that are on the remote side, making it able to:

       • handle efficiently a huge number of dirs/files

       • detect renames/moves and hard-links

       It aims at minimizing network traffic and synchronizing every detail of a file system:

       • all types of inode: file, dir, block/character/fifo, socket, symlink

       • preserve hard links

       • modification time, ownership/permission/ACL, extended attributes

       • sparse files

       Other features:

       • it can be configured to exclude files from synchronization

       • fssync can be interrupted and resumed at any time, making it tolerant to random failures (e.g.  network
         error)

       • algorithm  to  synchronize  file content is designed to handle big files like VM images efficiently, by
         updating fixed-size modified blocks in-place

       Main usage of fssync is to prevent data loss in case of hardware failure, where  RAID1  is  not  possible
       (e.g. in laptops).

       On Btrfs <https://btrfs.wiki.kernel.org/> [1] file systems, fssync is an useful alternative to btrfs send
       (and receive) commands, thanks to filtering capabilities. This can be combined with Btrfs snapshotting at
       destination side for a full backup solution.

USAGE

       Use fssync --help to get the complete list of options.

       The  most  important  thing  to  remember  is  that  the  local database must match exactly what's on the
       destination host:

       • Files that are copied on the destination host must not be modified.  And  nothing  should  be  manually
         created inside destination directories.  If you still want to access data on remote host, you should do
         it through a read-only bind mounts (requires Linux >= 2.6.26).

       • You  must  have  1  database  per  destination,  if  you plan to have several copies of the same source
         directory.

       Look at -c option if you wonder whether your database matches the destination directory.

       First run of fssync:

       • The easiest way is to let fssync do everything. Specify a non-existing file path to  -d  option  and  a
         empty  or  non-existing  destination  directory  (see -R option). fssync will automatically creates the
         database and copy all dirs/files to remote host.

       • A faster way may be to do the initial copy by other means, like a raw copy of a  partition.  If  you're
         absolutely  sure  the  source  and destination are exactly the same, you can initialize the database by
         specifying - as host. If inode numbers are the same on both sides, which  is  the  case  if  data  were
         copied  at  block  level,  you can modify the source partition while you are initializing the DB on the
         destination one, and get back the DB locally.

       An example of wrapper around fssync, with a filter, can be found at examples/fssync_home

       fssync does never descend directories on other filesystems.  Inodes  masked  by  mount  points  are  also
       skipped, so they should be unmounted temporarily if you want them to be synchronized. The same result can
       be achieved by synchronizing from a bind mount.

       See  also  the NONE cipher switching <https://github.com/rapier1/openssh-portable> [3] patch if you don't
       need encryption and you want to speed up your SSH connection.

HOW IT WORKS

       fssync maintains a single SQLite table of all dirs/files that are on the remote side. Each row matches  a
       path, with its inode (on local side), other metadata (on remote side) and a checked flag.

       When  running,  fssync  iterates  recursively  through all local dirs/files and for each path that is not
       ignored (see -f option), it queries the DB to decide what to do. If  already  checked,  path  is  skipped
       immediately.  When  a  path  is  synchronized, it is marked as checked. At the end, all rows that are not
       checked corresponds to paths that don't exist anymore. Once they are deleted  on  the  remote  side,  all
       checked flags are reset.

   Failure tolerance
       In  fact,  fssync  doesn't require that the database matches perfectly the destination. It tolerates some
       differences in order to recover any interrupted synchronization caused  by  a  network  failure,  a  file
       operation error, or anything other than an operating system crash of the local host (or something similar
       like a power failure).

       In most cases, this is done by the remote host, which automatically create (or overwrite) an inode of the
       expected type if necessary. The only exception is that the remote will never delete a non-empty directory
       on  its  own.   For  most  complex  cases,  fssync  journalizes the operation in the database: in case of
       failure, fssync will be able to recover on next sync.

   Race conditions
       A race condition means that other processes on the  local  host  are  modifying  inodes  that  fssync  is
       synchronizing.  fssync  handles  any  kind of race condition.  In fact, fssync has nothing to do for most
       cases.

       When a race condition happens, fssync does not guarantee that the remote data is in a  consistent  state.
       Each  sync always fixes existing inconsistencies but may introduces others, so fssync is not suitable for
       hot backuping of databases.

       With Btrfs, you can get consistency by snapshotting at source side.

SIMILAR PROJECTS

       The idea of maintaining a local database actually comes from csync2 <http://oss.linbit.com/csync2/>  [4].
       I  was about to adopt it when I realized that I really needed a tool that always detects renames/moves of
       big files. That's why I see fssync as a partial rewrite  of  csync2,  with  inode  tracking  and  without
       bidirectional  synchronization.   The  local  database  really  makes  fssync  &  csync2  faster than the
       well-known rsync <http://rsync.samba.org/> [5].

SEE ALSO

       sqlite3(1), ssh(1)

BUGS/LIMITATIONS/TODO

       1. For performance reasons, the SQLite database is never flushed to disk while fssync is  running.  Which
          means  that  if  the operating system crashes, the DB might become corrupted, and even if it isn't, it
          may not reflect anymore the status of the remote host and later runs may  fail  (for  example,  fssync
          refuses to replace a non-empty folder it doesn't know by a non-folder).  So in any case, it is advised
          to rebuild the DB.

          If  the  DB  is  not  corrupted  and you don't want to rebuild it, you can try to update it by running
          fssync again as soon as possible, so that the same changes are replayed.  fssync  should  be  able  to
          detect that all remote operations are already performed. See also -c and -F options.

       2. fssync  should  not  trash  the  page  cache by using posix_fadvise(2).  Unfortunately, Linux does not
          implement POSIX_FADV_NOREUSE yet (see <https://lkml.org/lkml/2011/6/24/136> for more information).  We
          could do like Bup <https://github.com/bup/bup> [2], which uses information returned by  mincore(2)  in
          order    to    eject   pages   after   save   more   selectively   <https://github.com/bup/bup/commit/
          b062252a5bca9b64d7b3034b6fd181424641f61e> [6].

       3. fssync process on remote side might leave parent directories with wrong  permissions  or  modification
          times if it is terminated during specific operation like recovery (at the very beginning), cleanup (at
          the  end),  rename (if a directory is moved). That is, all operations that need to temporarily alter a
          directory that is not being checked.  "Wontfix" for now, because it is  unlikely  to  happen  and  any
          solution would be quite heavy, for little benefit.

       4. What is not synchronized:

          • access & change times: I won't implement it.

          • inode  flags  (see  chattr(1)  and lsattr(1)): some flags like C or c are important on Btrfs so this
            could be a nice improvement, at least if it was implemented partially.

          • file-system specific properties ?

       5. Don't rely on permissions settings to prevent access to inodes on destination side.  This  is  because
          metadata  are  synchronized  after  data  (in  the case of a directory, it means all inodes under this
          directory is synchronized before its metadata)  and  in  some  cases,  an  attacker  could  access  to
          sensitive  data  while  fssync  is  running.  Access  should  be  denied on a parent directory of your
          destination tree (or at the root of this tree if you're careful enough to keep  it  secure  on  source
          side).

NOTES

       [1]  <https://btrfs.wiki.kernel.org/>

       [2]  <https://github.com/bup/bup>

       [3]  <https://github.com/rapier1/openssh-portable>

       [4]  <http://oss.linbit.com/csync2/>

       [5]  <http://rsync.samba.org/>

       [6]  <https://github.com/bup/bup/commit/b062252a5bca9b64d7b3034b6fd181424641f61e>

Author

       Julien Muchembled <jm@jmuchemb.eu>

                                                                                                       fssync(1)