mantic (1) datalad-create-sibling-ria.1.gz

Provided by: datalad_0.19.3-2_all bug

NAME

       datalad create-sibling-ria - creates a sibling to a dataset in a RIA store

SYNOPSIS

       datalad create-sibling-ria [-h] -s NAME [-d DATASET] [--storage-name NAME] [--alias ALIAS]
              [--post-update-hook]  [--shared  {false|true|umask|group|all|world|everybody|0xxx}]
              [--group   GROUP]   [--storage-sibling  MODE]  [--existing  MODE]  [--new-store-ok]
              [--trust-level TRUST-LEVEL] [-r]  [-R  LEVELS]  [--no-storage-sibling]  [--push-url
              ria+<ssh|file>://<host>[/path]] [--version] ria+<ssh|file|http(s)>://<host>[/path]

DESCRIPTION

       Communication with a dataset in a RIA store is implemented via two siblings. A regular Git
       remote (repository sibling) and a git-annex special  remote  for  data  transfer  (storage
       sibling) -- with the former having a publication dependency on the latter. By default, the
       name of the storage sibling is derived from the repository  sibling's  name  by  appending
       "-storage".

       The  store's  base  path  is  expected to not exist, be an empty directory, or a valid RIA
       store.

       Notes -----

   *RIA URL format*
       Interactions with new or existing RIA stores require RIA URLs to  identify  the  store  or
       specific datasets inside of it.

       The   general   structure   of   a   RIA   URL   pointing   to  a  store  takes  the  form
       ``ria+[scheme]://<storelocation>``                                                  (e.g.,
       ``ria+ssh://[user@]hostname:/absolute/path/to/ria-store``,                              or
       ``ria+file:///absolute/path/to/ria-store``)

       The general structure of a RIA URL pointing to a dataset  in  a  store  (for  example  for
       cloning)  takes  a  similar  form,  but  appends  either the datasets UUID or a "~" symbol
       followed by the dataset's alias name: ``ria+[scheme]://<storelocation>#<dataset-UUID>`` or
       ``ria+[scheme]://<storelocation>#~<aliasname>``.     In    addition,    specific   version
       identifiers  can   be   appended   to   the   URL   with   an   additional   "@"   symbol:
       ``ria+[scheme]://<storelocation>#<dataset-UUID>@<dataset-version>``,                 where
       ``dataset-version`` refers to a branch or tag.

   *RIA store layout*
       A RIA store is a directory tree with a dedicated subdirectory  for  each  dataset  in  the
       store.   The   subdirectory  name  is  constructed  from  the  DataLad  dataset  ID,  e.g.
       ``124/68afe-59ec-11ea-93d7-f0d5bf7b5561``, where the first three characters of the ID  are
       used  for  an  intermediate subdirectory in order to mitigate files system limitations for
       stores containing a large number of datasets.

       By default, a dataset in a RIA store consists of two components: A Git repository (for all
       dataset  contents  stored  in  Git)  and  a storage sibling (for dataset content stored in
       git-annex).

       It is possible to selectively disable either component using ``storage-sibling 'off'``  or
       ``storage-sibling  'only'``,  respectively.  If neither component is disabled, a dataset's
       subdirectory layout in a RIA  store  contains  a  standard  bare  Git  repository  and  an
       ``annex/``  subdirectory  inside  of  it.   The  latter holds a Git-annex object store and
       comprises    the    storage    sibling.      Disabling     the     standard     git-remote
       (``storage-sibling='only'``)  will result in not having the bare git repository, disabling
       the storage sibling (``storage-sibling='off'``) will result in not having  the  ``annex/``
       subdirectory.

       Optionally, there can be a further subdirectory ``archives`` with (compressed) 7z archives
       of annex objects. The storage remote is able to pull annex objects from these archives, if
       it  cannot  find in the regular annex object store. This feature can be useful for storing
       large collections of rarely changing data on systems that limit the number of  files  that
       can be stored.

       Each  dataset  directory  also  contains a ``ria-layout-version`` file that identifies the
       data organization (as, for example, described above).

       Lastly, there is a global ``ria-layout-version``  file  at  the  store's  base  path  that
       identifies where dataset subdirectories themselves are located. At present, this file must
       contain a single line stating the version (currently "1").  This  line  MUST  end  with  a
       newline character.

       It  is  possible  to  define  an  alias  for an individual dataset in a store by placing a
       symlink to the dataset location into an ``alias/`` directory in the  root  of  the  store.
       This        enables       dataset       access       via       URLs       of       format:
       ``ria+<protocol>://<storelocation>#~<aliasname>``.

       Compared to standard git-annex  object  stores,  the  ``annex/``  subdirectories  used  as
       storage  siblings  follow  a  different  layout  naming  scheme ('dirhashmixed' instead of
       'dirhashlower').  This is mostly noted as a technical detail, but also  serves  to  remind
       git-annex  powerusers  to  refrain from running git-annex commands directly in-store as it
       can cause severe damage due to the layout difference. Interactions should be  handled  via
       the ORA special remote instead.

   *Error logging*
       To  enable error logging at the remote end, append a pipe symbol and an "l" to the version
       number in ria-layout-version (like so: ``1|l0`).

       Error logging will create files in an "error_log" directory whenever the git-annex special
       remote  (storage  sibling)  raises  an  exception, storing the Python traceback of it. The
       logfiles are named according to the scheme ``<dataset id>.<annex uuid of the remote>.log``
       showing "who" ran into this issue with which dataset. Because logging can potentially leak
       personal data (like local file paths for example),  it  can  be  disabled  client-side  by
       setting                    the                    configuration                   variable
       ``annex.ora-remote.<storage-sibling-name>.ignore-remote-config``.

OPTIONS

       ria+<ssh|file|http(s)>://<host>[/path]
              URL identifying the target RIA store and  access  protocol.  If  ``--push-url``  is
              given in addition, this is used for read access only. Otherwise it will be used for
              write access too and to create the repository sibling in the RIA store. Note,  that
              HTTP(S) currently is valid for consumption only thus requiring to provide ``--push-
              url``. Constraints: value must be a string or value must be NONE

       -h, --help, --help-np
              show this help message. --help-np forcefully  disables  the  use  of  a  pager  for
              displaying the help message

       -s NAME, --name NAME
              Name  of  the  sibling. With RECURSIVE, the same name will be used to label all the
              subdatasets' siblings. Constraints: value must be a string or value must be NONE

       -d DATASET, --dataset DATASET
              specify the dataset to process. If no dataset is  given,  an  attempt  is  made  to
              identify  the  dataset  based  on the current working directory. Constraints: Value
              must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be
              NONE

       --storage-name NAME
              Name  of  the  storage sibling (git-annex special remote). Must not be identical to
              the sibling name. If not specified, defaults to the sibling  name  plus  '-storage'
              suffix.  If  only  a  storage  sibling is created, this setting is ignored, and the
              primary sibling name is used. Constraints: value must be a string or value must  be
              NONE

       --alias ALIAS
              Alias  for  the  dataset  in  the RIA store. Add the necessary symlink so that this
              dataset can be cloned from the RIA store using the given ALIAS instead of  its  ID.
              With  `recursive=True`,  only  the  top dataset will be aliased. Constraints: value
              must be a string or value must be NONE

       --post-update-hook
              Enable Git's default post-update-hook for the created sibling. This is useful  when
              the  sibling  is  made  accessible  via  a "dumb server" that requires running 'git
              update-server-info' to let Git interact properly with it.

       --shared {false|true|umask|group|all|world|everybody|0xxx}
              If given, configures the permissions in  the  RIA  store  for  multi-users  access.
              Possible  values  for this option are identical to those of `git init --shared` and
              are described in its documentation. Constraints: value must be a  string  or  value
              must be convertible to type bool or value must be NONE

       --group GROUP
              Filesystem  group  for  the  repository.  Specifying  the  group  is  crucial  when
              --shared=group. Constraints: value must be a string or value must be NONE

       --storage-sibling MODE
              By default, an ORA storage sibling and a Git repository sibling are  created  (on).
              Alternatively,  creation of the storage sibling can be disabled (off), or a storage
              sibling created only and no  Git  sibling  (only).  In  the  latter  mode,  no  Git
              installation  is  required  on  the  target host. Constraints: value must be one of
              ('only',) or value must be convertible to type bool or value must be NONE [Default:
              True]

       --existing MODE
              Action  to  perform,  if  a (storage) sibling is already configured under the given
              name and/or a target already exists.  In  this  case,  a  dataset  can  be  skipped
              ('skip'),  an  existing  target  repository  be  forcefully re-initialized, and the
              sibling (re-)configured ('reconfigure'), or  the  command  be  instructed  to  fail
              ('error').  Constraints:  value  must  be  one  of ('skip', 'error', 'reconfigure')
              [Default: 'error']

       --new-store-ok
              When set, a new store will be created, if necessary. Otherwise, a sibling will only
              be created if the url points to an existing RIA store.

       --trust-level TRUST-LEVEL
              specify  a  trust level for the storage sibling. If not specified, the default git-
              annex trust level is used. 'trust' should be used with  care  (see  the  git-annex-
              trust  man  page).  Constraints:  value  must  be  one  of  ('trust',  'semitrust',
              'untrust')

       -r, --recursive
              if set, recurse into potential subdatasets.

       -R LEVELS, --recursion-limit LEVELS
              limit recursion into subdatasets to the given number of levels. Constraints:  value
              must be convertible to type 'int' or value must be NONE

       --no-storage-sibling
              This option is deprecated. Use '--storage-sibling off' instead.

       --push-url ria+<ssh|file>://<host>[/path]
              URL  identifying  the  target RIA store and access protocol for write access to the
              storage sibling. If given this will also be used for  creation  of  the  repository
              sibling in the RIA store. Constraints: value must be a string or value must be NONE

       --version
              show the module and its version which provides the command

AUTHORS

        datalad is developed by The DataLad Team and Contributors <team@datalad.org>.