Provided by: bup-doc_0.33.6~git20241212-1_all bug

NAME

       bup-split - save individual files to bup backup sets

SYNOPSIS

       bup split [-t] [-c] [-n name] COMMON_OPTIONS

       bup split -b COMMON_OPTIONS

       bup split –copy COMMON_OPTIONS

       bup split –noop [-t|-b] COMMON_OPTIONS

       COMMON_OPTIONS
              [-r host:path] [-v] [-q] [-d seconds-since-epoch] [--bench] [--max-pack-size=bytes]
              [-#] [--bwlimit=bytes] [--max-pack-objects=n] [--fanout=count]  [--keep-boundaries]
              [--git-ids | filenames...]

DESCRIPTION

       bup  split  concatenates  the  contents  of the given files (or if no filenames are given,
       reads from stdin), splits the content into chunks of around 8k using  a  rolling  checksum
       algorithm,  and saves the chunks into a bup repository.  Chunks which have previously been
       stored are not stored again (ie.  they are `deduplicated').

       Because of the way the rolling checksum works,  chunks  tend  to  be  very  stable  across
       changes to a given file, including adding, deleting, and changing bytes.

       For  example,  if you use bup split to back up an XML dump of a database, and the XML file
       changes slightly from one run to the next, nearly all the data will still be  deduplicated
       and the size of each backup after the first will typically be quite small.

       Another  technique  is  to pipe the output of the tar(1) or cpio(1) programs to bup split.
       When individual files in the tarball change slightly or are added or  removed,  bup  still
       processes the remainder of the tarball efficiently.  (Note that bup save is usually a more
       efficient way to accomplish this, however.)

       To get the data back, use bup-join(1).

MODES

       These options select the primary behavior of the command, with -n being  the  most  likely
       choice.

       -n, --name=name
              after  creating  the  dataset,  create  a  git  branch named name so that it can be
              accessed using that name.   If  name  already  exists,  the  new  dataset  will  be
              considered  a  descendant  of  the old name.  (Thus, you can continually create new
              datasets with the same name, and later view the history of that dataset to see  how
              it has changed over time.)  The original data will also be available as a top-level
              file named “data” in the VFS, accessible via bup fuse, bup ftp, etc.

       -t, --tree
              output the git tree id of the resulting dataset.

       -c, --commit
              output the git commit id of the resulting dataset.

       -b, --blobs
              output a series of git blob ids that correspond  to  the  chunks  in  the  dataset.
              Incompatible with -n, -t, and -c.

       --noop read  the  data  and  split it into blocks based on the “bupsplit” rolling checksum
              algorithm, but don’t store anything in the repo.  Can be combined with -b or -t  to
              compute  (but not store) the git blobs or tree ids for the dataset.  This is mostly
              useful for benchmarking and validating the bupsplit algorithm.   Incompatible  with
              -n and -c.

       --copy like  --noop,  but  also  write  the  data  to  stdout.   This  can  be  useful for
              benchmarking  the  speed  of  read+bupsplit+write  for  large  amounts   of   data.
              Incompatible with -n, -t, -c, and -b.

OPTIONS

       -r, --remote=host:path
              save  the  backup  set  to  the  given remote server.  If path is omitted, uses the
              default path on the remote server  (you  still  need  to  include  the  `:').   The
              connection  to  the remote server is made with SSH.  If you’d like to specify which
              port, user or private key to use for the SSH connection, we recommend you  use  the
              ~/.ssh/config  file.  Even though the destination is remote, a local bup repository
              is still required.

       -d, --date=seconds-since-epoch
              specify the date inscribed in the commit (seconds since 1970-01-01).

       -q, --quiet
              disable progress messages.

       -v, --verbose
              increase verbosity (can be used more than once).

       --git-ids
              stdin is a list of git object ids instead of raw data.  bup  split  will  read  the
              contents  of  each  named git object (if it exists in the bup repository) and split
              it.  This might be useful for converting a git repository with large  binary  files
              to  use  bup-style hashsplitting instead.  This option is probably most useful when
              combined with --keep-boundaries.

       --keep-boundaries
              if multiple filenames are given on the command line, they are normally concatenated
              together  as  if  the  content  all  came  from a single file.  That is, the set of
              blobs/trees produced is identical to what it would have been if there  had  been  a
              single  input  file.   However,  if  you  use --keep-boundaries, each file is split
              separately.  You still only get a single tree or commit or  series  of  blobs,  but
              each  blob  comes  from  only  one  of the files; the end of one of the input files
              always ends a blob.

       --bench
              print benchmark timings to stderr.

       --max-pack-size=bytes
              never create git packfiles larger than the given number of  bytes.   Default  is  1
              billion bytes.  Usually there is no reason to change this.

       --max-pack-objects=numobjs
              never  create git packfiles with more than the given number of objects.  Default is
              200 thousand objects.  Usually there is no reason to change this.

       --fanout=numobjs
              when splitting very large files, try and keep the number of elements in trees to an
              average of numobjs.

       --bwlimit=bytes/sec
              don’t  transmit  more  than bytes/sec bytes per second to the server.  This is good
              for making your backups not suck up all your network bandwidth.  Use a suffix  like
              k, M, or G to specify multiples of 1024, 1024*1024, 1024*1024*1024 respectively.

       -#, --compress=#
              set  the  compression level to # (a value from 0-9, where 9 is the highest and 0 is
              no compression).  The default is 1 (fast, loose compression)

EXAMPLES

              $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
              tar: Removing leading /' from member names
              Indexing objects: 100% (196/196), done.

              $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
              1961

SEE ALSO

       bup-join(1), bup-index(1), bup-save(1), bup-on(1), ssh_config(5)

BUP

       Part of the bup(1) suite.

AUTHORS

       Avery Pennarun ⟨apenwarr@gmail.com