Provided by: datalad_0.11.6-1ubuntu2_all bug


       datalad addurls - create and update a dataset from a list of URLs.


       datalad  addurls  [-h]  [-d DATASET] [-t TYPE] [-x REGEXP] [-m FORMAT] [--message MESSAGE]
              [-n] [--fast] [--ifexists ACTION] [--missing-value  VALUE]  [--nosave]  [--version-


   Format specification
       Several  arguments take format strings.  These are similar to normal Python format strings
       where the names from URL-FILE (column  names  for  a  CSV  or  properties  for  JSON)  are
       available as placeholders.  If URL-FILE is a CSV file, a positional index can also be used
       (i.e., "{0}" for the first column).  Note that a placeholder cannot contain a ':' or '!'.

       In addition, the FILENAME-FORMAT arguments has a few special placeholders.

       - _repindex

         The constructed file names must be unique across all fields rows.  To
         avoid collisions, the special placeholder "_repindex" can be added to
         the formatter.  Its value will start at 0 and increment every time a
         file name repeats.

       - _url_hostname, _urlN, _url_basename*

         Various parts of the formatted URL are available.  Take
         "" as an example.

         "" is stored as "_url_hostname".  Components of the URL's
         path can be referenced as "_urlN".  "_url0" and "_url1" would map to
         "asciicast" and "", respectively.  The final
         part of the path is also available as "_url_basename".

         This name is broken down further.  "_url_basename_root" and
         "_url_basename_ext" provide access to the root name and extension.
         These values are similar to the result of os.path.splitext, but, in the
         case of multiple periods, the extension is identified using the same
         length heuristic that git-annex uses.  As a result, the extension of
         "file.tar.gz" would be ".tar.gz", not ".gz".  In addition, the fields
         "_url_basename_root_py" and "_url_basename_ext_py" provide access to
         the result of os.path.splitext.

       - _url_filename*

         These are similar to _url_basename* fields, but they are obtained with
         a server request.  This is useful if the file name is set in the
         Content-Disposition header.

       Consider a file "avatars.csv" that contains::


       To download each link into a file name composed of the 'who' and 'ext'  fields,  we  could

       $ datalad addurls -d avatar_ds --fast avatars.csv '{link}' '{who}.{ext}'

       The `-d avatar_ds` is used to create a new dataset in "$PWD/avatar_ds".

       If  we  were  already  in  a dataset and wanted to create a new subdataset in an "avatars"
       subdirectory, we could use "//" in the FILENAME-FORMAT argument::

       $ datalad addurls --fast avatars.csv '{link}' 'avatars//{who}.{ext}'


        For users familiar with 'git annex addurl': A large part of this
        plugin's functionality can be viewed as transforming data from
        URL-FILE into a "url filename" format that fed to 'git annex addurl
        --batch --with-files'.


              A file that contains URLs or information  that  can  be  used  to  construct  URLs.
              Depending on the value of --input-type, this should be a CSV file (with a header as
              the first row) or a JSON file (structured as a list of objects with string values).

              A  format  string  that  specifies  the  URL  for  each  entry.  See  the   'Format
              Specification' section above.

              Like  URL-FORMAT,  but  this  format  string  specifies the file to which the URL's
              content will be downloaded. The file name may contain  directories.  The  separator
              "//"  can  be  used to indicate that the left-side directory should be created as a
              new subdataset. See the 'Format Specification' section above.

       -h, --help, --help-np
              show this help message. --help-np forcefully  disables  the  use  of  a  pager  for
              displaying the help message

       -d DATASET, --dataset DATASET
              Add the URLs to this dataset (or possibly subdatasets of this dataset). An empty or
              non-existent directory is passed to create a new dataset. New  subdatasets  can  be
              specified  with  FILENAME-FORMAT.  Constraints:  Value must be a Dataset or a valid
              identifier of a Dataset (e.g. a path)

       -t TYPE, --input-type TYPE
              Whether URL-FILE should be considered a CSV file or a JSON file. The default value,
              "ext",  means  to  consider  URL-FILE  as  a  JSON  file  if  it ends with ".json".
              Otherwise, treat it as a CSV file. Constraints: value must be one of ('ext', 'csv',
              'json') [Default: 'ext']

       -x REGEXP, --exclude-autometa REGEXP
              By  default,  metadata  field=value  pairs are constructed with each column in URL-
              FILE, excluding any single column that is specified via URL-FORMAT.  This  argument
              can be used to exclude columns that match a regular expression. If set to '*' or an
              empty string, automatic metadata extraction is disabled completely.  This  argument
              does not affect metadata set explicitly with --meta. [Default: None]

       -m FORMAT, --meta FORMAT
              A   format   string   that   specifies   metadata.   It  should  be  structured  as
              "<field>=<value>". As an example, "location={3}" would mean that the value for  the
              "location" metadata field should be set the value of the fourth column. This option
              can be given multiple times. [Default: None]

       --message MESSAGE
              Use this message when committing the URL  additions.  Constraints:  value  must  be
              NONE, or value must be a string [Default: None]

       -n, --dry-run
              Report  which  URLs  would  be  downloaded  to which files and then exit. [Default:

       --fast If True, add the URLs, but don't download their content.  Underneath,  this  passes
              the --fast flag to `git annex addurl`. [Default: False]

       --ifexists ACTION
              What  to  do  if a constructed file name already exists. The default behavior is to
              proceed with the `git annex addurl`, which will fail if the file size has  changed.
              If  set  to  'overwrite',  remove the old file before adding the new one. If set to
              'skip', do not add the new file. Constraints: value must be NONE, or value must  be
              one of ('overwrite', 'skip') [Default: None]

       --missing-value VALUE
              When  an  empty  string  is encountered, use this value instead. Constraints: value
              must be NONE, or value must be a string [Default: None]

              by default all modifications to a dataset are immediately saved. Giving this option
              will disable this behavior. [Default: True]

              Try  to  add a version ID to the URL. This currently only has an effect on URLs for
              AWS S3 buckets. [Default: False]


        datalad is developed by The DataLad Team and Contributors <>.

                                            2019-08-19                         datalad addurls(1)