Ubuntu Manpage: datalad foreach-dataset - run a command or Python code on the dataset and/or each of its

NAME

       datalad  foreach-dataset  - run a command or Python code on the dataset and/or each of its
       sub-datasets.

SYNOPSIS

       datalad  foreach-dataset  [-h]   [--cmd-type   {auto|external|exec|eval}]   [-d   DATASET]
              [--state {present|absent|any}] [-r] [-R LEVELS] [--contains PATH] [--bottomup] [-s]
              [--output-streams {capture|pass-through|relpath}]  [--chpwd  {ds|pwd}]  [--safe-to-
              consume {auto|all-subds-done|superds-done|always}] [-J NJOBS] [--version] ...

DESCRIPTION

       This  command  provides  a  convenience for the cases were no dedicated DataLad command is
       provided to operate across the hierarchy of datasets. It is very similar to `git submodule
       foreach` command with the following major differences

       -  by  default  (unless  --subdatasets-only)  it  would  include operation on the original
       dataset as well, - subdatasets could be  traversed  in  bottom-up  order,  -  can  execute
       commands in parallel (see JOBS option), but would account for the order, e.g. in bottom-up
       order command is executed in super-dataset only after it is executed in all subdatasets.

       Additional notes:

       - for execution of "external" commands we use the environment used to execute external git
       and git-annex commands.

   Command format
       --cmd-type  external:  A  few  placeholders are supported in the command via Python format
       specification:

       - "{pwd}" will be replaced with the full path of the current working directory.  -  "{ds}"
       and  "{refds}"  will  provide  instances  of  the  dataset  currently  operated on and the
       reference "context" dataset which was provided via  ``dataset``  argument.   -  "{tmpdir}"
       will be replaced with the full path of a temporary directory.

   Examples
       Aggressively  git clean  all datasets, running 5 parallel jobs::

        % datalad foreach-dataset -r -J 5 git clean -dfx

OPTIONS

       COMMAND
              command for execution. A leading '--' can be used to disambiguate this command from
              the preceding options to DataLad. For --cmd-type exec or eval only a single command
              argument (Python code) is supported.

       -h, --help, --help-np
              show  this  help  message.  --help-np  forcefully  disables  the use of a pager for
              displaying the help message

       --cmd-type {auto|external|exec|eval}
              type of the command. EXTERNAL: to be run in a child process using dataset's runner;
              'exec':  Python  source  code  to execute using 'exec(), no value returned; 'eval':
              Python source code to evaluate using 'eval()', return value is placed into 'result'
              field.  'auto': If used via Python API, and `cmd` is a Python function, it will use
              'eval', and otherwise would assume 'external'. Constraints: value must  be  one  of
              ('auto', 'external', 'exec', 'eval') [Default: 'auto']

       -d DATASET, --dataset DATASET
              specify  the  dataset  to operate on. If no dataset is given, an attempt is made to
              identify the dataset based on the  input  and/or  the  current  working  directory.
              Constraints:  Value  must  be  a Dataset or a valid identifier of a Dataset (e.g. a
              path) or value must be NONE

       --state {present|absent|any}
              indicate which (sub)datasets to consider: either only locally present,  absent,  or
              any  of  those  two  kinds. Constraints: value must be one of ('present', 'absent',
              'any') [Default: 'present']

       -r, --recursive
              if set, recurse into potential subdatasets.

       -R LEVELS, --recursion-limit LEVELS
              limit recursion into subdatasets to the given number of levels. Constraints:  value
              must be convertible to type 'int' or value must be NONE

       --contains PATH
              limit  to the subdatasets containing the given path. If a root path of a subdataset
              is given, the last considered dataset will be the subdataset  itself.  This  option
              can  be  given multiple times, in which case datasets that contain any of the given
              paths will be considered. Constraints: value must be a string or value must be NONE

       --bottomup
              whether to report subdatasets in bottom-up order along each branch in  the  dataset
              tree, and not top-down.

       -s, --subdatasets-only
              whether  to  exclude  top  level  dataset. It is implied if a non-empty CONTAINS is
              used.

       --output-streams {capture|pass-through|relpath}, --o-s {capture|pass-through|relpath}
              ways to handle outputs. 'capture' and return  outputs  from  'cmd'  in  the  record
              ('stdout',  'stderr');  'pass-through' to the screen (and thus absent from returned
              record); prefix with 'relpath' captured output (similar  to  like  grep  does)  and
              write  to  stdout and stderr. In 'relpath', relative path is relative to the top of
              the dataset if DATASET is specified, and if not - relative  to  current  directory.
              Constraints:  value must be one of ('capture', 'pass-through', 'relpath') [Default:
              'pass-through']

       --chpwd {ds|pwd}

       --safe-to-consume {auto|all-subds-done|superds-done|always}
              Important only in the case of parallel (jobs greater than 1) execution. 'all-subds-
              done'  instructs  to  not consider superdataset until command finished execution in
              all subdatasets (it is the value in case  of  'auto'  if  traversal  is  bottomup).
              'superds-done'  instructs  to not process subdatasets until command finished in the
              super-dataset (it is the value in case of 'auto' in traversal  is  not  bottom  up,
              which is the default). With 'always' there is no constraint on either to execute in
              sub or super dataset. Constraints: value must be one of ('auto',  'all-subds-done',
              'superds-done', 'always') [Default: 'auto']

       -J NJOBS, --jobs NJOBS
              how  many  parallel  jobs (where possible) to use. "auto" corresponds to the number
              defined by 'datalad.runtime.max-annex-jobs' configuration item  NOTE:  This  option
              can  only  parallelize  input  retrieval (get) and output recording (save). DataLad
              does NOT parallelize your scripts for you. Constraints: value must  be  convertible
              to type 'int' or value must be NONE or value must be one of ('auto',)

       --version
              show the module and its version which provides the command

AUTHORS

        datalad is developed by The DataLad Team and Contributors <team@datalad.org>.