Provided by: charliecloud-runtime_0.26-1_amd64 bug

NAME

       ch-run - Run a command in a Charliecloud container

SYNOPSIS

          $ ch-run [OPTION...] IMAGE -- CMD [ARG...]

DESCRIPTION

       Run  command CMD in a fully unprivileged Charliecloud container using the image located at
       IMAGE, which can be either a directory or, if the proper support is  enabled,  a  SquashFS
       archive.

          -b, --bind=SRC[:DST]
                 Bind-mount  SRC at guest DST. The default destination if not specified is to use
                 the same path as the host; i.e., the default is --bind=SRC:SRC. Can be repeated.

                 If --write is given and DST does not exist, it  will  be  created  as  an  empty
                 directory.  However,  DST  must  be entirely within the image itself; DST cannot
                 enter a previous bind  mount.   For  example,  --bind  /foo:/tmp/foo  will  fail
                 because  /tmp  is  shared with the host via bind-mount (unless $TMPDIR is set to
                 something else or --private-tmp is given).

                 Most images do have  ten  directories  /mnt/[0-9]  already  available  as  mount
                 points.

                 Symlinks  in  DST are followed, and absolute links can have surprising behavior.
                 Bind-mounting happens  after  namespace  setup  but  before  pivoting  into  the
                 container  image,  so absolute links use the host root. For example, suppose the
                 image has a symlink /foo -> /mnt.  Then, --bind=/bar:/foo will bind-mount on the
                 host’s  /mnt,  which  is inaccessible on the host because namespaces are already
                 set up and also inaccessible in the container because of  the  subsequent  pivot
                 into  the  image.  Currently, this problem is only detected when DST needs to be
                 created: ch-run will refuse to follow absolute symlinks in this case,  to  avoid
                 directory creation surprises.

          -c, --cd=DIR
                 Initial working directory in container.

          --ch-ssh
                 Bind ch-ssh(1) into container at /usr/bin/ch-ssh.

          --env-no-expand
                 don’t expand variables when using --set-env

          -g, --gid=GID
                 Run as group GID within container.

          -j, --join
                 Use the same container (namespaces) as peer ch-run invocations.

          --join-pid=PID
                 Join the namespaces of an existing process.

          --join-ct=N
                 Number of ch-run peers (implies --join; default: see below).

          --join-tag=TAG
                 Label for ch-run peer group (implies --join; default: see below).

          -m, --mount=DIR
                 Use  DIR  for  the  SquashFS  mount  point,  which  must  already  exist. If not
                 specified, the default  is  /var/tmp/$USER.ch/mnt,  which  will  be  created  if
                 needed.

          --no-home
                 By  default,  your  host  home  directory (i.e., $HOME) is bind-mounted at guest
                 /home/$USER. This is accomplished by mounting a new tmpfs at /home, which  hides
                 any image content under that path. If this is specified, neither of these things
                 happens and the image’s /home is exposed unaltered.

          --no-passwd
                 By default, temporary /etc/passwd and /etc/group files are created according  to
                 the  UID  and  GID  maps  for the container and bind-mounted into it. If this is
                 specified, no such temporary  files  are  created  and  the  image’s  files  are
                 exposed.

          -t, --private-tmp
                 By  default,  the  host’s  /tmp (or $TMPDIR if set) is bind-mounted at container
                 /tmp. If this is specified, a new tmpfs  is  mounted  on  the  container’s  /tmp
                 instead.

          --set-env, --set-env=FILE, --set-env=VAR=VALUE
                 Set environment variable(s). With:

                     • no  argument: as listed in file /ch/environment within the image. It is an
                       error if the file does not exist or  cannot  be  read.   (Note  that  with
                       SquashFS  images,  it  is not currently possible to use other files within
                       the image.)

                     • FILE (i.e., no equals in argument): as specified  in  file  at  host  path
                       FILE. Again, it is an error if the file cannot be read.

                     • NAME=VALUE (i.e., equals sign in argument): set variable NAME to VALUE.

                 See below for details on how environment variables work in ch-run.

          -u, --uid=UID
                 Run as user UID within container.

          --unset-env=GLOB
                 Unset environment variables whose names match GLOB.

          -v, --verbose
                 Be more verbose (can be repeated).

          -w, --write
                 Mount image read-write (by default, the image is mounted read-only).

          -?, --help
                 Print help and exit.

          --usage
                 Print a short usage message and exit.

          -V, --version
                 Print version and exit.

       Note:  Because  ch-run  is  fully unprivileged, it is not possible to change UIDs and GIDs
       within the container (the relevant system calls fail). In particular, setuid, setgid,  and
       setcap  executables  do not work. As a precaution, ch-run calls prctl(PR_SET_NO_NEW_PRIVS,
       1) to disable these executables within the container. This does not  reduce  functionality
       but  is  a  “belt  and  suspenders” precaution to reduce the attack surface should bugs in
       these system calls or elsewhere arise.

IMAGE FORMAT

       ch-run supports two different image formats.

       The first is a simple directory that  contains  a  Linux  filesystem  tree.  This  can  be
       accomplished by:

       • ch-convert directly from ch-image or another builder to a directory.

       • Charliecloud’s  tarball  workflow:  build or pull the image, ch-convert it to a tarball,
         transfer the tarball to the target system, then ch-convert the tarball to a directory.

       • Manually mount a SquashFS image, e.g. with squashfuse(1) and then un-mount it after  run
         with fusermount -u.

       • Any other workflow that produces an appropriate directory tree.

       The  second  is  a  SquashFS image archive mounted internally by ch-run, available if it’s
       linked with the optional libsquashfuse_ll.  ch-run mounts the image  filesystem,  services
       all  FUSE requests, and unmounts it, all within ch-run. See --mount above to set the mount
       point location.

       Prior versions of Charliecloud provided wrappers  for  the  squashfuse  and  squashfuse_ll
       SquashFS  mount  commands  and  fusermount -u unmount command. We removed these because we
       concluded they had minimal value-add over the standard, unwrapped commands.

       WARNING:
          Currently, Charliecloud unmounts  the  SquashFS  filesystem  when  user  command  CMD’s
          process  exits.  It does not monitor any of its child processes. Therefore, if the user
          command spawns child processes and then exits before them (e.g., some  daemons),  those
          children  will  have  the  image  unmounted  from  underneath  them.  In this case, the
          workaround is  to  mount/unmount  using  external  tools.  We  expect  to  remove  this
          limitation in a future version.

HOST FILES AND DIRECTORIES AVAILABLE IN CONTAINER VIA BIND MOUNTS

       In addition to any directories specified by the user with --bind, ch-run has standard host
       files and directories that are bind-mounted in as well.

       The following host files and directories are bind-mounted at  the  same  location  in  the
       container.  These give access to the host’s devices and various kernel facilities. (Recall
       that Charliecloud provides minimal isolation and containerized processes are mostly normal
       unprivileged  processes.)  They cannot be disabled and are required; i.e., they must exist
       both on host and within the image.

          • /dev/proc/sys

       Optional; bind-mounted only if path exists on both host  and  within  the  image,  without
       error or warning if not.

          • /etc/hosts  and  /etc/resolv.conf.  Because  Charliecloud  containers  share the host
            network namespace, they need the same hostname resolution configuration.

          • /etc/machine-id. Provides a unique ID for the  OS  installation;  matching  the  host
            works  for  most  situations.  Needed  to  support  D-Bus,  some  software  licensing
            situations, and likely other use cases. See also issue #1050.

          • /var/lib/hugetlbfs at guest  /var/opt/cray/hugetlbfs,  and  /var/opt/cray/alps/spool.
            These support Cray MPI.

          • $PREFIX/bin/ch-ssh   at   guest   /usr/bin/ch-ssh.  SSH  wrapper  that  automatically
            containerizes after connecting.

       Additional bind mounts done by default but can be disabled; see the options above.

          • $HOME at /home/$USER (and image /home is hidden).  Makes user  data  and  init  files
            available.

          • /tmp  (or $TMPDIR if set) at guest /tmp. Provides a temporary directory that persists
            between container runs and is shared with non-containerized application components.

          • temporary files at /etc/passwd and /etc/group. Usernames and group names need  to  be
            customized for each container run.

MULTIPLE PROCESSES IN THE SAME CONTAINER WITH --JOIN

       By  default,  different  ch-run invocations use different user and mount namespaces (i.e.,
       different containers). While  this  has  no  impact  on  sharing  most  resources  between
       invocations, there are a few important exceptions.  These include:

       1. ptrace(2),  used by debuggers and related tools. One can attach a debugger to processes
          in descendant namespaces, but not sibling namespaces.  The practical effect of this  is
          that (without --join), you can’t run a command with ch-run and then attach to it with a
          debugger also run with ch-run.

       2. Cross-memory attach (CMA) is used by cooperating processes  to  communicate  by  simply
          reading  and  writing  one another’s memory. This is also not permitted between sibling
          namespaces. This affects various MPI implementations that  use  CMA  to  pass  messages
          between ranks on the same node, because it’s faster than traditional shared memory.

       --join  is  designed to address this by placing related ch-run commands (the “peer group”)
       in the same container. This is done by one of  the  peers  creating  the  namespaces  with
       unshare(2) and the others joining with setns(2).

       To  do  so,  we  need  to  know  the  number  of peers and a name for the group. These are
       specified by additional arguments that can (hopefully) be left at default values  in  most
       cases:

       • --join-ct  sets  the  number  of  peers.  The  default  is the value of the first of the
         following   environment   variables   that   is   defined:   OMPI_COMM_WORLD_LOCAL_SIZE,
         SLURM_STEP_TASKS_PER_NODE, SLURM_CPUS_ON_NODE.

       • --join-tag  sets  the tag that names the peer group. The default is environment variable
         SLURM_STEP_ID, if defined; otherwise, the PID of ch-run’s parent. Tags  can  be  re-used
         for  peer groups that start at different times, i.e., once all peer ch-run have replaced
         themselves with the user command, the tag can be re-used.

       Caveats:

       • One cannot currently add peers after the fact, for example, if one decides  to  start  a
         debugger  after  the  fact.  (This  is  only  required for code with bugs and is thus an
         unusual use case.)

       • ch-run instances race. The winner of this race sets up the  namespaces,  and  the  other
         peers  use  the winner to find the namespaces to join. Therefore, if the user command of
         the winner exits, any remaining peers will not be able to join the namespaces,  even  if
         they  are still active. There is currently no general way to specify which ch-run should
         be the winner.

       • If --join-ct is too high, the winning ch-run’s user command exits before all peers join,
         or  ch-run  itself  crashes, IPC resources such as semaphores and shared memory segments
         will be leaked. These appear as files in /dev/shm/ and can be removed with rm(1).

       • Many of the arguments given to the race losers, such as the image path and --bind,  will
         be ignored in favor of what was given to the winner.

ENVIRONMENT VARIABLES

       ch-run leaves environment variables unchanged, i.e. the host environment is passed through
       unaltered, except:

       • limited tweaks to avoid significant guest breakage;

       • user-set variables via --set-env;

       • user-unset variables via --unset-env; and

       • set CH_RUNNING.

       This section describes these features.

       The default tweaks happen first, then --set-env and --unset-env in the order specified  on
       the  command  line,  and then CH_RUNNING. The two options can be repeated arbitrarily many
       times, e.g. to add/remove multiple variable sets or add only some variables in a file.

   Default behavior
       By default, ch-run makes the following environment variable changes:

       • $CH_RUNNING: Set to Weird Al Yankovic. While a process can figure out that  it’s  in  an
         unprivileged  container  and  what  namespaces are active without this hint, that can be
         messy, and there is no way to tell that it’s a Charliecloud container specifically. This
         variable  makes  such a test simple and well-defined. (Note: This variable is unaffected
         by --unset-env.)

       • $HOME: If the path to your home directory is  not  /home/$USER  on  the  host,  then  an
         inherited $HOME will be incorrect inside the guest. This confuses some software, such as
         Spack. Thus, we change $HOME to /home/$USER, unless --no-home  is  specified,  in  which
         case it is left unchanged.

       • $PATH: Newer Linux distributions replace some root-level directories, such as /bin, with
         symlinks to their counterparts in /usr.

         Some of these distributions (e.g., Fedora 24) have also dropped /bin  from  the  default
         $PATH.  This  is a problem when the guest OS does not have a merged /usr (e.g., Debian 8
         “Jessie”). Thus, we add /bin to $PATH if it’s not already present.

         Further reading:

            • The case for the /usr MergeFedoraDebian$TMPDIR: Unset, because this is almost certainly a host path, and that host path is made
         available in the guest at /tmp unless --private-tmp is given.

   Setting variables with --set-env
       The  purpose  of  --set-env  is  to set environment variables within the container. Values
       given replace any already in the environment (i.e., inherited from the host shell) or  set
       by earlier --set-env. This flag takes an optional argument with two possible forms:

       1. If  the  argument  contains  an  equals  sign  (=,  ASCII 61), that sets an environment
          variable directly. For example, to set FOO to the string value bar:

             $ ch-run --set-env=FOO=bar ...

          Single straight quotes around the value (', ASCII 39) are  stripped,  though  be  aware
          that both single and double quotes are also interpreted by the shell. For example, this
          example is similar to the prior one; the double quotes are removed by the shell and the
          single quotes are removed by ch-run:

             $ ch-run --set-env="'BAZ=qux'" ...

       2. If the argument does not contain an equals sign, it is a host path to a file containing
          zero or more variables using the same syntax as  above  (except  with  no  prior  shell
          processing).  This file contains a sequence of assignments separated by newlines. Empty
          lines are ignored, and no comments are interpreted. (This syntax is designed to  accept
          the output of printenv and be easily produced by other simple mechanisms.) For example:

             $ cat /tmp/env.txt
             FOO=bar
             BAZ='qux'
             $ ch-run --set-env=/tmp/env.txt ...

          For directory images only (because the file is read before containerizing), guest paths
          can be given by prepending the image path.

       3. If there is no argument, the file /ch/environment within the image is used.  This  file
          is  commonly populated by ENV instructions in the Dockerfile. For example, equivalently
          to form 2:

             $ cat Dockerfile
             [...]
             ENV FOO=bar
             ENV BAZ=qux
             [...]
             $ ch-image build -t foo .
             $ ch-convert foo /var/tmp/foo.sqfs
             $ ch-run --set-env /var/tmp/foo.sqfs -- ...

          (Note the image path is interpreted correctly, not as the --set-env argument.)

          At present, there is no way to use files other  than  /ch/environment  within  SquashFS
          images.

       Environment  variables  are  expanded  for  values  that  look  like  search paths, unless
       --env-no-expand is given prior to --set-env. In this case, the value is a sequence of zero
       or  more  possibly-empty  items  separated  by colon (:, ASCII 58). If an item begins with
       dollar sign ($, ASCII 36), then the rest of  the  item  is  the  name  of  an  environment
       variable.  If this variable is set to a non-empty value, that value is substituted for the
       item; otherwise (i.e., the variable is unset or the empty string), the  item  is  deleted,
       including  a  delimiter  colon.  The  purpose  of  omitting  empty  expansions is to avoid
       surprising behavior such as an empty element in $PATH meaning the current directory.

       For example, to set HOSTPATH to the search path in the current shell (this is expanded  by
       ch-run, though letting the shell do it happens to be equivalent):

          $ ch-run --set-env='HOSTPATH=$PATH' ...

       To prepend /opt/bin to this current search path:

          $ ch-run --set-env='PATH=/opt/bin:$PATH' ...

       To prepend /opt/bin to the search path set by the Dockerfile, as retrieved from guest file
       /ch/environment (here we really cannot let the shell expand $PATH):

          $ ch-run --set-env --set-env='PATH=/opt/bin:$PATH' ...

       Examples of valid assignment, assuming that environment variable BAR is  set  to  bar  and
       UNSET is unset or set to the empty string:

                         ┌───────────────────┬───────┬────────────────────────┐
                         │Assignment         │ Name  │ Value                  │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=barFOObar                    │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=bar=bazFOObar=baz                │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FLAGS=-march=fooFLAGS-march=foo -mtune=bar  │
                         │-mtune=bar         │       │                        │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FLAGS='-march=fooFLAGS-march=foo -mtune=bar  │
                         │-mtune=bar'        │       │                        │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=$BARFOObar                    │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=$BAR:bazFOObar:baz                │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=FOO   │ empty string           │
                         └───────────────────┴───────┴────────────────────────┘

                         │FOO=$UNSETFOO   │ empty string           │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=baz:$UNSET:quxFOObaz:qux (not baz::qux) │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=:bar:baz::FOO:bar:baz::             │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=''FOO   │ empty string           │
                         ├───────────────────┼───────┼────────────────────────┤
                         │FOO=''''FOO'' (two single quotes) │
                         └───────────────────┴───────┴────────────────────────┘

       Example invalid assignments:

                                  ┌───────────┬──────────────────────┐
                                  │Assignment │ Problem              │
                                  ├───────────┼──────────────────────┤
                                  │FOO bar    │ no equals separator  │
                                  ├───────────┼──────────────────────┤
                                  │=bar       │ name cannot be empty │
                                  └───────────┴──────────────────────┘

       Example valid assignments that are probably not what you want:

                      ┌─────────────────┬───────┬───────────┬─────────────────────┐
                      │Assignment       │ Name  │ Value     │ Problem             │
                      ├─────────────────┼───────┼───────────┼─────────────────────┤
                      │FOO="bar"FOO"bar"     │ double       quotes │
                      │                 │       │           │ aren’t stripped     │
                      ├─────────────────┼───────┼───────────┼─────────────────────┤
                      │FOO=bar # bazFOObar # baz │ comments        not │
                      │                 │       │           │ supported           │
                      ├─────────────────┼───────┼───────────┼─────────────────────┤
                      │FOO=bar\tbazFOObar\tbaz  │ backslashes are not │
                      │                 │       │           │ special             │
                      ├─────────────────┼───────┼───────────┼─────────────────────┤
                      │ FOO=bar FOObar       │ leading  space   in │
                      │                 │       │           │ key                 │
                      ├─────────────────┼───────┼───────────┼─────────────────────┤
                      │FOO= barFOO bar     │ leading   space  in │
                      │                 │       │           │ value               │
                      ├─────────────────┼───────┼───────────┼─────────────────────┤
                      │$FOO=bar$FOObar       │ variables       not │
                      │                 │       │           │ expanded in key     │
                      ├─────────────────┼───────┼───────────┼─────────────────────┤
                      │FOO=$BAR baz:quxFOOqux       │ variable   BAR  baz │
                      │                 │       │           │ not set             │
                      └─────────────────┴───────┴───────────┴─────────────────────┘

   Removing variables with --unset-env
       The purpose of --unset-env=GLOB is to remove unwanted environment variables. The  argument
       GLOB  is  a  glob  pattern (dialect fnmatch(3) with no flags); all variables with matching
       names are removed from the environment.

       WARNING:
          Because the shell also interprets glob patterns, if  any  wildcard  characters  are  in
          GLOB, it is important to put it in single quotes to avoid surprises.

       GLOB must be a non-empty string.

       Example 1: Remove the single environment variable FOO:

          $ export FOO=bar
          $ env | fgrep FOO
          FOO=bar
          $ ch-run --unset-env=FOO $CH_TEST_IMGDIR/chtest -- env | fgrep FOO
          $

       Example  2:  Hide  from  a  container the fact that it’s running in a Slurm allocation, by
       removing all variables beginning with SLURM. You might want to do  this  to  test  an  MPI
       program with one rank and no launcher:

          $ salloc -N1
          $ env | egrep '^SLURM' | wc
             44      44    1092
          $ ch-run $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
          [... long error message ...]
          $ ch-run --unset-env='SLURM*' $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
          0: MPI version:
          Open MPI v3.1.3, package: Open MPI root@c897a83f6f92 Distribution, ident: 3.1.3, repo rev: v3.1.3, Oct 29, 2018
          0: init ok cn001.localdomain, 1 ranks, userns 4026532530
          0: send/receive ok
          0: finalize ok

       Example 3: Clear the environment completely (remove all variables):

          $ ch-run --unset-env='*' $CH_TEST_IMGDIR/chtest -- env
          $

       Note  that  some  programs, such as shells, set some environment variables even if started
       with no init files:

          $ ch-run --unset-env='*' $CH_TEST_IMGDIR/debian9 -- bash --noprofile --norc -c env
          SHLVL=1
          PWD=/
          _=/usr/bin/env
          $

EXAMPLES

       Run the command echo hello inside a Charliecloud container using  the  unpacked  image  at
       /data/foo:

          $ ch-run /data/foo -- echo hello
          hello

       Run an MPI job that can use CMA to communicate:

          $ srun ch-run --join /data/foo -- bar

SYSLOG

       By  default,  ch-run logs its command line to syslog. (This can be disabled by configuring
       with --disable-syslog.) This includes: (1) the  invoking  real  UID,  (2)  the  number  of
       command line arguments, and (3) the arguments, separated by spaces. For example:

          Dec 10 18:19:08 mybox ch-run: uid=1000 args=7: ch-run -v /var/tmp/00_tiny -- echo hello "wor l}\$d"

       Logging is one of the first things done during program initialization, even before command
       line parsing. That is, almost all command lines are logged, even if erroneous,  and  there
       is no logging of program success or failure.

       Arguments  are  serialized  with  the  following  procedure.  The  purpose is to provide a
       human-readable reconstruction of the command line while also allowing each argument to  be
       recovered byte-for-byte.

          • If  an  argument  contains  only printable ASCII bytes that are not whitespace, shell
            metacharacters, double quote (", ASCII 34 decimal), or backslash (\​, ASCII 92), then
            log it unchanged.

          • Otherwise,  (a) enclose the argument in double quotes and (b) backslash-escape double
            quotes, backslashes, and characters interpreted  by  Bash  (including  POSIX  shells)
            within double quotes.

       The  verbatim  command  line  typed  in  the shell cannot be recovered, because not enough
       information is provided to UNIX programs. For example, echo  'foo' is given to programs as
       a sequence of two arguments, echo and foo; the two spaces and single quotes are removed by
       the shell. The zero byte, ASCII NUL, cannot appear in arguments because it would terminate
       the string.

EXIT STATUS

       If  there  is  an error during containerization, ch-run exits with status non-zero. If the
       user command is started successfully, the exit status is that of the  user  command,  with
       one  exception:  if  the  image  is an internally mounted SquashFS filesystem and the user
       command is killed by a signal, the exit status is 1 regardless of the signal value.

REPORTING BUGS

       If Charliecloud was obtained from your Linux distribution,  use  your  distribution’s  bug
       reporting procedures.

       Otherwise, report bugs to: https://github.com/hpc/charliecloud/issues

SEE ALSO

       charliecloud(7)

       Full documentation at: <https://hpc.github.io/charliecloud>

COPYRIGHT

       2014–2021, Triad National Security, LLC