Ubuntu Manpage: ch-image - Build and manage images; completely unprivileged

Provided by: charliecloud-builders_0.37-1build1_amd64

NAME

       ch-image - Build and manage images; completely unprivileged

SYNOPSIS

          $ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT
          $ ch-image [...] build-cache [...]
          $ ch-image [...] delete IMAGE_GLOB [IMAGE_GLOB ...]
          $ ch-image [...] gestalt [SELECTOR]
          $ ch-image [...] import PATH IMAGE_REF
          $ ch-image [...] list [-l] [IMAGE_REF]
          $ ch-image [...] pull [...] IMAGE_REF [DEST_REF]
          $ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF]
          $ ch-image [...] reset
          $ ch-image [...] undelete IMAGE_REF
          $ ch-image { --help | --version | --dependencies }

DESCRIPTION

ch-image is a tool for building and manipulating container images, but not running them
(for that you want ch-run). It is completely unprivileged, with no setuid/setgid/setcap
helpers. Many operations can use caching for speed. The action to take is specified by a
sub-command.

Options that print brief information and then exit:

-h, --help
Print help and exit successfully. If specified before the sub-command, print
general help and list of sub-commands; if after the sub-command, print help
specific to that sub-command.

--dependencies
Report dependency problems on standard output, if any, and exit. If all is well,
there is no output and the exit is successful; in case of problems, the exit is
unsuccessful.

--version
Print version number and exit successfully.

Common options placed before or after the sub-command:

-a, --arch ARCH
Use ARCH for architecture-aware registry operations. (See section “Architecture”
below for details.)

--always-download
Download all files when pulling, even if they are already in builder storage.
Note that ch-image pull will always retrieve the most up-to-date image; this
option is mostly for debugging.

--auth Authenticate with the remote repository, then (if successful) make all
subsequent requests in authenticated mode. For most subcommands, the default is
to never authenticate, i.e., make all requests anonymously. The exception is
push, which implies --auth.

--break MODULE:LINE
Set a PDB breakpoint at line number LINE of module named MODULE (typically the
filename with .py removed, or __main__ for ch-image itself). That is, a PDB
debugger shell will open before executing the specified line.

This is accomplished by re-parsing the module, injecting import pdb;
pdb.set_trace() into the parse tree, re-compiling the tree, and replacing the
module’s code with the result. This has various gotchas, including
(1) module-level code in the target module is executed twice, (2) the option is
parsed with bespoke early code so command line argument parsing itself can be
debugged, (3) breakpoints on function definition will trigger while the module
is being re-executed, not when the function is called (break on the first line
of the function body instead), and (4) other weirdness we haven’t yet
characterized.

--cache
Enable build cache. Default if a sufficiently new Git is available. See section
Build cache for details.

--cache-large SIZE
Set the cache’s large file threshold to SIZE MiB, or 0 for no large files, which
is the default. Values greater than zero can speed up many builds but can also
cause performance degradation. Experimental. See section Large file threshold
for details.

--debug
Add a stack trace to fatal error hints. This can also be done by setting the
environment variable CH_IMAGE_DEBUG.

--no-cache
Disable build cache. Default if a sufficiently new Git is not available. This
option turns off the cache completely; if you want to re-execute a Dockerfile
and store the new results in cache, use --rebuild instead.

--no-lock
Disable storage directory locking. This lets you run as many concurrent ch-image
instances as you want against the same storage directory, which risks corruption
but may be OK for some workloads.

--no-xattrs
Enforce default handling of xattrs, i.e. do not save them in the build cache or
restore them on rebuild. This is the default, but the option is provided to
override the $CH_XATTRS environment variable.

--password-many
Re-prompt the user every time a registry password is needed.

--profile
Dump profile to files /tmp/chofile.p (cProfile dump format) and /tmp/chofile.txt
(text summary). You can convert the former to a PDF call graph with gprof2dot -f
pstats /tmp/chofile.p | dot -Tpdf -o /tmp/chofile.pdf. This excludes time spend
in subprocesses. Profile data should still be written on fatal errors, but not
if the program crashes.

-q, --quiet
Be quieter; can be repeated. Incompatible with -v and suppresses --debug
regardless of option order. See the FAQ entry on verbosity for details.

--rebuild
Execute all instructions, even if they are build cache hits, except for FROM
which is retrieved from cache on hit.

-s, --storage DIR
Set the storage directory (see below for important details).

--tls-no-verify
Don’t verify TLS certificates of the repository. (Do not use this option unless
you understand the risks.)

-v, --verbose
Print extra chatter; can be repeated. See the FAQ entry on verbosity for
details.

--xattrs
Save xattrs and ACLs in the build cache, and restore them when rebuilding from
the cache.

ARCHITECTURE

Charliecloud provides the option --arch ARCH to specify the architecture for
architecture-aware registry operations. The argument ARCH can be: (1) yolo, to bypass
architecture-aware code and use the registry’s default architecture; (2) host, to use the
host’s architecture, obtained with the equivalent of uname -m (default if --arch not
specified); or (3) an architecture name. If the specified architecture is not available,
the error message will list which ones are.

Notes:

1. ch-image is limited to one image per image reference in builder storage at a time,
regardless of architecture. For example, if you say ch-image pull --arch=foo baz and
then ch-image pull --arch=bar baz, builder storage will contain one image called “baz”,
with architecture “bar”.

2. Images’ default architecture is usually amd64, so this is usually what you get with
--arch=yolo. Similarly, if a registry image is architecture-unaware, it will still be
pulled with --arch=amd64 and --arch=host on x86-64 hosts (other host architectures must
specify --arch=yolo to pull architecture-unaware images).

3. uname -m and image registries often use different names for the same architecture. For
example, what uname -m reports as “x86_64” is known to registries as “amd64”.
--arch=host should translate if needed, but it’s useful to know this is happening.
Directly specified architecture names are passed to the registry without translation.

4. Registries treat architecture as a pair of items, architecture and sometimes variant
(e.g., “arm” and “v7”). Charliecloud treats architecture as a simple string and
converts to/from the registry view transparently.

AUTHENTICATION

Charliecloud does not have configuration files; thus, it has no separate login subcommand
to store secrets. Instead, Charliecloud will prompt for a username and password when
authentication is needed. Note that some repositories refer to the secret as something
other than a “password”; e.g., GitLab calls it a “personal access token (PAT)”, Quay calls
it an “application token”, and nVidia NGC calls it an “API token”.

For non-interactive authentication, you can use environment variables CH_IMAGE_USERNAME
and CH_IMAGE_PASSWORD. Only do this if you fully understand the implications for your
specific use case, because it is difficult to securely store secrets in environment
variables.

By default for most subcommands, all registry access is anonymous. To instead use
authenticated access for everything, specify --auth or set the environment variable
$CH_IMAGE_AUTH=yes. The exception is push, which always runs in authenticated mode. Even
for pulling public images, it can be useful to authenticate for registries that have
per-user rate limits, such as Docker Hub. (Older versions of Charliecloud started with
anonymous access, then tried to upgrade to authenticated if it seemed necessary. However,
this turned out to be brittle; see issue #1318.)

The username and password are remembered for the life of the process and silently
re-offered to the registry if needed. One case when this happens is on push to a private
registry: many registries will first offer a read-only token when ch-image checks if
something exists, then re-authenticate when upgrading the token to read-write for upload.
If your site uses one-time passwords such as provided by a security device, you can
specify --password-many to provide a new secret each time.

These values are not saved persistently, e.g. in a file. Note that we do use normal Python
variables for this information, without pinning them into physical RAM with mlock(2) or
any other special treatment, so we cannot guarantee they will never reach non-volatile
storage.

Technical details

Most registries use something called Bearer authentication, where the client
(e.g., Charliecloud) includes a token in the headers of every HTTP request.

The authorization dance is different from the typical UNIX approach, where there
is a separate login sequence before any content requests are made. The client
starts by simply making the HTTP request it wants (e.g., to GET an image
manifest), and if the registry doesn’t like the client’s token (or if there is
no token because the client doesn’t have one yet), it replies with HTTP 401
Unauthorized, but crucially it also provides instructions in the response header
on how to get a token. The client then follows those instructions, obtains a
token, re-tries the request, and (hopefully) all is well. This approach also
allows a client to upgrade a token if needed, e.g. when transitioning from
asking if a layer exists to uploading its content.

The distinction between Charliecloud’s anonymous mode and authenticated modes is
that it will only ask for anonymous tokens in anonymous mode and authenticated
tokens in authenticated mode. That is, anonymous mode does involve an
authentication procedure to obtain a token, but this “authentication” is done
anonymously. (Yes, it’s confusing.)

Registries also often reply HTTP 401 when an image does not exist, rather than
the seemingly more correct HTTP 404 Not Found. This is to avoid information
leakage about the existence of images the client is not allowed to pull, and
it’s why Charliecloud never says an image simply does not exist.

STORAGE DIRECTORY

ch-image maintains state using normal files and directories located in its storage
directory; contents include various caches and temporary images used for building.

In descending order of priority, this directory is located at:

-s, --storage DIR
Command line option.

$CH_IMAGE_STORAGE
Environment variable. The path must be absolute, because the variable is likely
set in a very different context than when it’s used, which seems error-prone on
what a relative path is relative to.

/var/tmp/$USER.ch
Default. (Previously, the default was /var/tmp/$USER/ch-image. If a valid
storage directory is found at the old default path, ch-image tries to move it to
the new default path.)

Unlike many container implementations, there is no notion of storage drivers, graph
drivers, etc., to select and/or configure.

The storage directory can reside on any single filesystem (i.e., it cannot be split across
multiple filesystems). However, it contains lots of small files and metadata traffic can
be intense. For example, the Charliecloud test suite uses approximately 400,000 files and
directories in the storage directory as of this writing. Place it on a filesystem
appropriate for this; tmpfs’es such as /var/tmp are a good choice if you have enough RAM
(/tmp is not recommended because ch-run bind-mounts it into containers by default).

While you can currently poke around in the storage directory and find unpacked images
runnable with ch-run, this is not a supported use case. The supported workflow uses
ch-convert to obtain a packed image; see the tutorial for details.

The storage directory format changes on no particular schedule. ch-image is normally able
to upgrade directories produced by a given Charliecloud version up to one year after that
version’s release. Upgrades outside this window and downgrades are not supported. In these
cases, ch-image will refuse to run until you delete and re-initialize the storage
directory with ch-image reset.

WARNING:
Network filesystems, especially Lustre, are typically bad choices for the storage
directory. This is a site-specific question and your local support will likely have
strong opinions.

BUILD CACHE

Overview
Subcommands that create images, such as build and pull, can use a build cache to speed
repeated operations. That is, an image is created by starting from the empty image and
executing a sequence of instructions, largely Dockerfile instructions but also some others
like “pull” and “import”. Some instructions are expensive to execute (e.g., RUN wget
http://slow.example.com/bigfile or transferring data billed by the byte), so it’s often
cheaper to retrieve their results from cache instead.

The build cache uses a relatively new Git under the hood; see the installation
instructions for version requirements. Charliecloud implements workarounds for Git’s
various storage limitations, so things like file metadata and Git repositories within the
image should work. Important exception: No files named .git* or other Git metadata are
permitted in the image’s root directory.

Extended attributes (xattrs) are ignored by the build cache by default. Cache support for
xattrs belonging to unprivileged xattr namespaces (e.g. user) can be enabled by specifying
the --xattrs option or by setting the CH_XATTRS environment variable. If CH_XATTRS is set,
you override it with --no-xattrs. Note that extended attributes in privileged xattr
namespaces (e.g. :code:‘trusted‘) cannot be read by :code:‘ch-image‘ and will always be
lost without warning.

The cache has three modes: enabled, disabled, and a hybrid mode called rebuild where the
cache is fully enabled for FROM instructions, but all other operations re-execute and
re-cache their results. The purpose of rebuild is to do a clean rebuild of a Dockerfile
atop a known-good base image.

Enabled mode is selected with --cache or setting $CH_IMAGE_CACHE to enabled, disabled mode
with --no-cache or disabled, and rebuild mode with --rebuild or rebuild. The default mode
is enabled if an appropriate Git is installed, otherwise disabled.

Compared to other implementations
NOTE:
This section is a lightly edited excerpt from our paper “Charliecloud’s layer-free,
Git-based container build cache”.

Existing tools such as Docker and Podman implement their build cache with a layered
(union) filesystem such as OverlayFS or FUSE-OverlayFS and tar archives to represent the
content of each layer; this approach is standardized by OCI. The layered cache works, but
it has drawbacks in three critical areas:

1. Diff format. The tar format is poorly standardized and not designed for diffs.
Notably, tar cannot represent file deletion. The workaround used for OCI layers is
specially named whiteout files, which means the tar archives cannot be unpacked by
standard UNIX tools and require special container-specific processing.

2. Cache overhead. Each time a Dockerfile instruction is started, a new overlay filesystem
is mounted atop the existing layer stack. File metadata operations in the instruction
then start at the top layer and descend the stack until the layer containing the
desired file is reached. The cost of these operations is therefore proportional to the
number of layers, i.e., the number of instructions between the empty root image and the
instruction being executed. This results in a best practice of large, complex
instructions to minimize their number, which can conflict with simpler, more numerous
instructions the user might prefer.

3. De-duplication. Identical files on layers with an ancestry relationship (i.e.,
instruction A precedes B in a build) are stored only once. However, identical files on
layers without this relationship are stored multiple times. For example, if
instructions B and B’ both follow A — perhaps because B was modified and the image
rebuilt — then any files created by both B and B’ will be stored twice.

Also, similar files are never de-duplicated, regardless of ancestry. For example, if
instruction A creates a file and subsequently instruction B modifies a single bit in
that file, both versions are stored in their entirety.

Our Git-based cache addresses the three drawbacks: (1) Git is purpose-built to store
changing directory trees, (2) cache overhead is imposed only at instruction commit time,
and (3) Git de-duplicates both identical and similar files. Also, it is based on an
extremely widely used tool that enjoys development support from well-resourced actors, in
particular on scaling (e.g., Microsoft’s large-repository accelerator Scalar was recently
merged into Git).

In addition to these structural advantages, performance experiments reported in our paper
above show that the Git-based approach is as good as (and sometimes better than)
overlay-based caches. On build time, the two approaches are broadly similar, with one or
the other being faster depending on context. Both had performance problems on NFS.
Notably, however, the Git-based cache was much faster for a 129-instruction Dockerfile. On
disk usage, the winner depended on the condition. For example, we saw the layered cache
storing large sibling layers redundantly; on the other hand, the Git-based cache has some
obvious redundancies as well, and one must compact it for full de-duplication benefit.
However, Git’s de-duplication was quite effective in some conditions and we suspect will
prove even better in more realistic scenarios.

That is, we believe our results show that the Git-based build cache is highly competitive
with the layered approach, with no obvious inferiority so far and hints that it may be
superior on important dimensions. We have ongoing work to explore these questions in more
detail.

De-duplication and garbage collection
Charliecloud’s build cache takes advantage of Git’s file de-duplication features. This
operates across the entire build cache, i.e., files are de-duplicated no matter where in
the cache they are found or the relationship between their container images. Files are
de-duplicated at different times depending on whether they are identical or merely
similar.

Identical files are de-duplicated at git add time; in ch-image build terms, that’s upon
committing a successful instruction. That is, it’s impossible to store two files with the
same content in the build cache. If you try — say with RUN yum install -y foo in one
Dockerfile and RUN yum install -y foo bar in another, which are different instructions but
both install RPM foo’s files — the content is stored once and each copy gets its own
metadata and a pointer to the content, much like filesystem hard links.

Similar files, however, are only de-duplicated during Git’s garbage collection process.
When files are initially added to a Git repository (with git add), they are stored inside
the repository as (possibly compressed) individual files, called objects in Git jargon.
Upon garbage collection, which happens both automatically when certain parameters are met
and explicitly with git gc, these files are archived and (re-)compressed together into a
single file called a packfile. Also, existing packfiles may be re-written into the new
one.

During this process, similar files are identified, and each set of similar files is stored
as one base file plus diffs to recover the others. (Similarity detection seems to be based
primarily on file size.) This delta process is agnostic to alignment, which is an
advantage over alignment-sensitive block-level de-duplicating filesystems. Exception:
“Large” files are not compressed or de-duplicated. We use the Git default threshold of 512
MiB (as of this writing).

Charliecloud runs Git garbage collection at two different times. First, a lighter-weight
garbage pass runs automatically when the number of loose files (objects) grows beyond a
limit. This limit is in flux as we learn more about build cache performance, but it’s
quite a bit higher than the Git default. This garbage runs in the background and can
continue after the build completes; you may see Git processes using a lot of CPU.

An important limitation of the automatic garbage is that large packfiles (again, this is
in flux, but it’s several GiB) will not be re-packed, limiting the scope of similar file
detection. To address this, a heavier garbage collection can be run manually with ch-image
build-cache --gc. This will re-pack (and re-write) the entire build cache, de-duplicating
all similar files. In both cases, garbage uses all available cores.

git build-cache prints the specific garbage collection parameters in use, and -v can be
added for more detail.

Large file threshold
Because Git uses content-addressed storage, upon commit, it must read in full all files
modified by an instruction. This I/O cost can be a significant fraction of build time for
some images. To mitigate this, regular files larger than the experimental large file
threshold are stored outside the Git repository, somewhat like Git Large File Storage.

ch-image copies large files in and out of images at each instruction commit. It tries to
do this with a fast metadata-only copy-on-write operation called “reflink”, but that is
only supported with the right Python version, Linux kernel version, and filesystem. If
unsupported, Charliecloud falls back to an expensive standard copy, which is likely slower
than letting Git deal with the files. See File copy performance for details.

Every version of a large file is stored verbatim and uncompressed (e.g., a large file with
a one-byte change will be stored in full twice), so Git’s de-duplication does not apply.
However, on filesystems with reflink support, files can share extents (e.g., each of the
two files will have its own extent containing the changed byte, but the rest of the
extents will remain shared). This provides de-duplication between large files images that
share ancestry. Also, unused large files are deleted by ch-image build-cache --gc.

A final caveat: Large files in any image with the same path, mode, size, and mtime (to
nanosecond precision if possible) are considered identical, even if their content is not
actually identical (e.g., touch(1) shenanigans can corrupt an image).

Option --cache-large sets the threshold in MiB; if not set, environment variable
CH_IMAGE_CACHE_LARGE is used; if that is not set either, the default value 0 indicates
that no files are considered large.

(Note that Git has an unrelated setting called core.bigFileThreshold.)

Example
Suppose we have this Dockerfile:

$ cat a.df
FROM alpine:3.17
RUN echo foo
RUN echo bar

On our first build, we get:

$ ch-image build -t foo -f a.df .
1. FROM alpine:3.17
[ ... pull chatter omitted ... ]
2. RUN echo foo
copying image ...
foo
3. RUN echo bar
bar
grown in 3 instructions: foo

Note the dot after each instruction’s line number. This means that the instruction was
executed. You can also see this by the output of the two echo commands.

But on our second build, we get:

$ ch-image build -t foo -f a.df .
1* FROM alpine:3.17
2* RUN echo foo
3* RUN echo bar
copying image ...
grown in 3 instructions: foo

Here, instead of being executed, each instruction’s results were retrieved from cache.
(Charliecloud uses lazy retrieval; nothing is actually retrieved until the end, as seen by
the “copying image” message.) Cache hit for each instruction is indicated by an asterisk
(*) after the line number. Even for such a small and short Dockerfile, this build is
noticeably faster than the first.

We can also try a second, slightly different Dockerfile. Note that the first three
instructions are the same, but the third is different:

$ cat c.df
FROM alpine:3.17
RUN echo foo
RUN echo qux
$ ch-image build -t c -f c.df .
1* FROM alpine:3.17
2* RUN echo foo
3. RUN echo qux
copying image ...
qux
grown in 3 instructions: c

Here, the first two instructions are hits from the first Dockerfile, but the third is a
miss, so Charliecloud retrieves that state and continues building.

We can also inspect the cache:

$ ch-image build-cache --tree
* (c) RUN echo qux
| * (a) RUN echo bar
|/
* RUN echo foo
* (alpine+3.9) PULL alpine:3.17
* (root) ROOT

named images: 4
state IDs: 5
commits: 5
files: 317
disk used: 3 MiB

Here there are four named images: a and c that we built, the base image alpine:3.17
(written as alpine+3.9 because colon is not allowed in Git branch names), and the empty
base of everything root. Also note how a and c diverge after the last common instruction
RUN echo foo.

BUILD

Build an image from a Dockerfile and put it in the storage directory.

Synopsis
$ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT

Description
See below for differences with other Dockerfile interpreters. Charliecloud supports an
extended instruction (RSYNC), a few other instructions behave slightly differently, and a
few are ignored.

Note that FROM implicitly pulls the base image if needed, so you may want to read about
the pull subcommand below as well.

Required argument:

CONTEXT
Path to context directory. This is the root of COPY instructions in the
Dockerfile. If a single hyphen (-) is specified: (a) read the Dockerfile from
standard input, (b) specifying --file is an error, and (c) there is no context,
so COPY will fail. (See --file for how to provide the Dockerfile on standard
input while also having a context.)

Options:

-b, --bind SRC[:DST]
For RUN instructions only, bind-mount SRC at guest DST. The default destination
if not specified is to use the same path as the host; i.e., the default is
equivalent to --bind=SRC:SRC. If DST does not exist, try to create it as an
empty directory, though images do have ten directories /mnt/[0-9] already
available as mount points. Can be repeated.

Note: See documentation for ch-run --bind for important caveats and gotchas.

Note: Other instructions that modify the image filesystem, e.g. COPY, can only
access host files from the context directory, regardless of this option.

--build-arg KEY[=VALUE]
Set build-time variable KEY defined by ARG instruction to VALUE. If VALUE not
specified, use the value of environment variable KEY.

-f, --file DOCKERFILE
Use DOCKERFILE instead of CONTEXT/Dockerfile. If a single hyphen (-) is
specified, read the Dockerfile from standard input; like docker build, the
context directory is still available in this case.

--force[=MODE]
Use unprivileged build with root emulation mode MODE, which can be fakeroot,
seccomp (the default), or none. See section “Privilege model” below for details
on what this does and when you might need it.

--force-cmd=CMD,ARG1[,ARG2...]
If command CMD is found in a RUN instruction, add the comma-separated ARGs to
it. For example, --force-cmd=foo,-a,--bar=baz would transform RUN foo -c into
RUN foo -a --bar=baz -c. This is intended to suppress validation that defeats
--force=seccomp and implies that option. Can be repeated. If specified,
replaces (does not extend) the default suppression options. Literal commas can
be escaped with backslash; importantly however, backslash will need to be
protected from the shell also. Section “Privilege model” below explains why you
might need this.

-n, --dry-run
Don’t actually execute any Dockerfile instructions.

--parse-only
Stop after parsing the Dockerfile.

-t, --tag TAG
Name of image to create. If not specified, infer the name:

1. If Dockerfile named Dockerfile with an extension: use the extension with
invalid characters stripped, e.g. Dockerfile.@FOO.bar → foo.bar.

2. If Dockerfile has extension df or dockerfile: use the basename with the same
transformation, e.g. baz.@QUX.dockerfile -> baz.qux.

3. If context directory is not /: use its name, i.e. the last component of the
absolute path to the context directory, with the same transformation,

4. Otherwise (context directory is /): use root.

If no colon present in the name, append :latest.

Uses ch-run -w -u0 -g0 --no-passwd --unsafe to execute RUN instructions.

Privilege model
Overview
ch-image is a fully unprivileged image builder. It does not use any setuid or setcap
helper programs, and it does not use configuration files /etc/subuid or /etc/subgid. This
contrasts with the “rootless” or “fakeroot” modes of some competing builders, which do
require privileged supporting code or utilities.

Without root emulation, this approach does confuse programs that expect to have real root
privileges, most notably distribution package installers. This subsection describes why
that happens and what you can do about it.

ch-image executes all instructions as the normal user who invokes it. For RUN, this is
accomplished with ch-run arguments including -w --uid=0 --gid=0. That is, your host EUID
and EGID are both mapped to zero inside the container, and only one UID (zero) and GID
(zero) are available inside the container. Under this arrangement, processes running in
the container for each RUN appear to be running as root, but many privileged system calls
will fail without the root emulation methods described below. This affects any fully
unprivileged container build, not just Charliecloud.

The most common time to see this is installing packages. For example, here is RPM failing
to chown(2) a file, which makes the package update fail:

Updating : 1:dbus-1.10.24-13.el7_6.x86_64 2/4
Error unpacking rpm package 1:dbus-1.10.24-13.el7_6.x86_64
error: unpacking of archive failed on file /usr/libexec/dbus-1/dbus-daemon-launch-helper;5cffd726: cpio: chown
Cleanup : 1:dbus-libs-1.10.24-12.el7.x86_64 3/4
error: dbus-1:1.10.24-13.el7_6.x86_64: install failed

This one is (ironically) apt-get failing to drop privileges:

E: setgroups 65534 failed - setgroups (1: Operation not permitted)
E: setegid 65534 failed - setegid (22: Invalid argument)
E: seteuid 100 failed - seteuid (22: Invalid argument)
E: setgroups 0 failed - setgroups (1: Operation not permitted)

Charliecloud provides two different mechanisms to avoid these problems. Both involve lying
to the containerized process about privileged system calls, but at very different levels
of complexity.

Root emulation mode fakeroot
This mode uses fakeroot(1) to maintain an elaborate web of deceit that is internally
consistent. This program intercepts both privileged system calls (e.g., setuid(2)) as well
as other system calls whose return values depend on those calls (e.g., getuid(2)), faking
success for privileged system calls (perhaps making no system call at all) and altering
return values to be consistent with earlier fake success. Charliecloud automatically
installs the fakeroot(1) program inside the container and then wraps RUN instructions
having known privilege needs with it. Thus, this mode is only available for certain
distributions.

The advantage of this mode is its consistency; e.g., careful programs that check the new
UID after attempting to change it will not notice anything amiss. Its disadvantage is
complexity: detailed knowledge and procedures for multiple Linux distributions.

This mode has three basic steps:

1. After FROM, analyze the image to see what distribution it contains, which determines
the specific workarounds.

2. Before the user command in the first RUN instruction where the injection seems
needed, install fakeroot(1) in the image, if one is not already installed, as well
as any other necessary initialization commands. For example, we turn off the apt
sandbox (for Debian Buster) and configure EPEL but leave it disabled (for
CentOS/RHEL).

3. Prepend fakeroot to RUN instructions that seem to need it, e.g. ones that contain
apt, apt-get, dpkg for Debian derivatives and dnf, rpm, or yum for RPM-based
distributions.

RUN instructions that do not seem to need modification are unaffected by this mode.

The details are specific to each distribution. ch-image analyzes image content (e.g.,
grepping /etc/debian_version) to select a configuration; see lib/force.py for details.
ch-image prints exactly what it is doing.

WARNING:
Because of fakeroot mode’s complexity, we plan to remove it if seccomp mode performs
well enough. If you have a situation where fakeroot mode works and seccomp does not,
please let us know.

Root emulation mode seccomp (default)
This mode uses the kernel’s seccomp(2) system call filtering to intercept certain
privileged system calls, do absolutely nothing, and return success to the program.

Some system calls are quashed regardless of their arguments: capset(2); chown(2) and
friends; kexec_load(2) (used to validate the filter itself); ; and setuid(2), setgid(2),
and setgroups(2) along with the other system calls that change user or group. mknod(2) and
mknodat(2) are quashed if they try to create a device file (e.g., creating FIFOs works
normally).

The advantages of this approach is that it’s much simpler, it’s faster, it’s completely
agnostic to libc, and it’s mostly agnostic to distribution. The disadvantage is that it’s
a very lazy liar; even the most cursory consistency checks will fail, e.g., getuid(2)
after setuid(2).

While this mode does not provide consistency, it does offer a hook to help prevent
programs asking for consistency. For example, apt-get -o APT::Sandbox::User=root will
prevent apt-get from attempting to drop privileges, which it verifies, exiting with
failure if the correct IDs are not found (which they won’t be under this approach). This
can be expressed with --force-cmd=apt-get,-o,APT::Sandbox::User=root, though this
particular case is built-in and does not need to be specified. The full default
configuration, which is applied regardless of the image distribution, can be examined in
the source file force.py. If any --force-cmd are specified, this replaces (rather than
extends) the default configuration.

Note that because the substitutions are a simple regex with no knowledge of shell syntax,
they can cause unwanted modifications. For example, RUN apt-get install -y apt-get will be
run as /bin/sh -c "apt-get -o APT::Sandbox::User=root install -y apt-get -o
APT::Sandbox::User=root". One workaround is to add escape syntax transparent to the shell;
e.g., RUN apt-get install -y apt-get.

This mode executes all RUN instructions with the seccomp(2) filter and has no knowledge of
which instructions actually used the intercepted system calls. Therefore, the printed
“instructions modified” number is only a count of instructions with a hook applied as
described above.

RUN logging
In terminal output, image metadata, and the build cache, the RUN instruction is always
logged as RUN.S, RUN.F, or RUN.N. The letter appended to the instruction reflects the
root emulation mode used during the build in which the instruction was executed. RUN.S
indicates seccomp, RUN.F indicates fakeroot, and RUN.N indicates that neither form of root
emulation was used (--force=none).

Compatibility and behavior differences
ch-image is an independent implementation and shares no code with other Dockerfile
interpreters. It uses a formal Dockerfile parsing grammar developed from the Dockerfile
reference documentation and miscellaneous other sources, which you can examine in the
source code.

We believe this independence is valuable for several reasons. First, it helps the
community examine Dockerfile syntax and semantics critically, think rigorously about what
is really needed, and build a more robust standard. Second, it yields disjoint sets of
bugs (note that Podman, Buildah, and Docker all share the same Dockerfile parser). Third,
because it is a much smaller code base, it illustrates how Dockerfiles work more clearly.
Finally, it allows straightforward extensions if needed to support scientific computing.

ch-image tries hard to be compatible with Docker and other interpreters, though as an
independent implementation, it is not bug-compatible.

The following subsections describe differences from the Dockerfile reference that we
expect to be approximately permanent. For not-yet-implemented features and bugs in this
area, see related issues on GitHub.

None of these are set in stone. We are very interested in feedback on our assessments and
open questions. This helps us prioritize new features and revise our thinking about what
is needed for HPC containers.

Context directory
The context directory is bind-mounted into the build, rather than copied like Docker.
Thus, the size of the context is immaterial, and the build reads directly from storage
like any other local process would (i.e., it is reasonable use / for the context).
However, you still can’t access anything outside the context directory.

Variable substitution
Variable substitution happens for all instructions, not just the ones listed in the
Dockerfile reference.

ARG and ENV cause cache misses upon definition, in contrast with Docker where these
variables miss upon use, except for certain cache-excluded variables that never cause
misses, listed below.

Note that ARG and ENV have different syntax despite very similar semantics.

ch-image passes the following proxy environment variables in to the build. Changes to
these variables do not cause a cache miss. They do not require an ARG instruction, as
documented in the Dockerfile reference. Unlike Docker, they are available if the
same-named environment variable is defined; --build-arg is not required.

HTTP_PROXY
http_proxy
HTTPS_PROXY
https_proxy
FTP_PROXY
ftp_proxy
NO_PROXY
no_proxy

In addition to those listed in the Dockerfile reference, these environment variables are
passed through in the same way:

SSH_AUTH_SOCK
USER

Finally, these variables are also pre-defined but are unrelated to the host environment:

PATH=/ch/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TAR_OPTIONS=--no-same-owner

ARG
Variables set with ARG are available anywhere in the Dockerfile, unlike Docker, where they
only work in FROM instructions, and possibly in other ARG before the first FROM.

FROM
The FROM instruction accepts option --arg=NAME=VALUE, which serves the same purpose as the
ARG instruction. It can be repeated.

LABEL
The LABEL instruction accepts key=value pairs to add metadata for an image. Unlike Docker,
multiline values are not supported; see issue #1512. Can be repeated.

COPY
NOTE:
The behavior described here matches Docker’s now-deprecated legacy builder. Docker’s
new builder, BuildKit, has different behavior in some cases, which we have not
characterized.

Especially for people used to UNIX cp(1), the semantics of the Dockerfile COPY instruction
can be confusing.

Most notably, when a source of the copy is a directory, the contents of that directory,
not the directory itself, are copied. This is documented, but it’s a real gotcha because
that’s not what cp(1) does, and it means that many things you can do in one cp(1) command
require multiple COPY instructions.

Also, the reference documentation is incomplete. In our experience, Docker also behaves as
follows; ch-image does the same in an attempt to be bug-compatible.

1. You can use absolute paths in the source; the root is the context directory.

2. Destination directories are created if they don’t exist in the following situations:

1. If the destination path ends in slash. (Documented.)

2. If the number of sources is greater than 1, either by wildcard or explicitly,
regardless of whether the destination ends in slash. (Not documented.)

3. If there is a single source and it is a directory. (Not documented.)

3. Symbolic links behave differently depending on how deep in the copied tree they are.
(Not documented.)

1. Symlinks at the top level — i.e., named as the destination or the source, either
explicitly or by wildcards — are dereferenced. They are followed, and whatever they
point to is used as the destination or source, respectively.

2. Symlinks at deeper levels are not dereferenced, i.e., the symlink itself is copied.

4. If a directory appears at the same path in source and destination, and is at the 2nd
level or deeper, the source directory’s metadata (e.g., permissions) are copied to the
destination directory. (Not documented.)

5. If an object (a) appears in both the source and destination, (b) is at the 2nd level or
deeper, and (c) is different file types in source and destination, the source object
will overwrite the destination object. (Not documented.)

We expect the following differences to be permanent:

• Wildcards use Python glob semantics, not the Go semantics.

• COPY --chown is ignored, because it doesn’t make sense in an unprivileged build.

Features we do not plan to support
• Parser directives are not supported. We have not identified a need for any of them.

• EXPOSE: Charliecloud does not use the network namespace, so containerized processes can
simply listen on a host port like other unprivileged processes.

• HEALTHCHECK: This instruction’s main use case is monitoring server processes rather than
applications. Also, it requires a container supervisor daemon, which we have no plans to
add.

• MAINTAINER is deprecated.

• STOPSIGNAL requires a container supervisor daemon process, which we have no plans to
add.

• USER does not make sense for unprivileged builds.

• VOLUME: Charliecloud has good support for bind mounts; we anticipate that it will
continue to focus on that and will not introduce the volume management features that
Docker has.

RSYNC (Dockerfile extension)
WARNING:
This instruction is experimental and may change or be removed.

Overview
Copying files is often simple but has numerous difficult corner cases, e.g. when dealing
with symbolic or hard links. The standard instruction COPY deals with many of these corner
cases differently from other UNIX utilities, lacks complete documentation, and behaves
inconsistently between different Dockerfile interpreters (e.g., Docker’s legacy builder
vs. BuildKit), as detailed above. On the other hand, rsync(1) is an extremely capable,
widely used file copy tool, with detailed options to specify behavior and 25 years of
history dealing with weirdness.

RSYNC (also spelled NSYNC) is a Charliecloud extension that gives copying behavior
identical to rsync(1). In fact, Charliecloud’s current implementation literally calls the
host’s rsync(1) to do the copy, though this may change in the future. There is no list
form of RSYNC.

The two key usage challenges are trailing slashes on paths and symlink handling. In
particular, the default symlink handling seemed reasonable to us, but you may want
something different. See the arguments and examples below. Importantly, COPY is not any
less fraught, and you have no choice about what to do with symlinks.

Arguments
RSYNC takes the same arguments as rsync(1), so refer to its man page for a detailed
explanation of all the options (with possible emphasis on its symlink options). Sources
are relative to the context directory even if they look absolute with a leading slash. Any
globbed sources are processed by ch-image(1) using Python rules, i.e., rsync(1) sees the
expanded sources with no wildcards. Relative destinations are relative to the image’s
current working directory, while absolute destinations refer to the image’s root.

For arguments that read input from a file (e.g. --exclude-from or --files-from), relative
paths are relative to the context directory, absolute paths refer to the image root, and -
(standard input) is an error.

For example,

WORKDIR /foo
RSYNC --foo src1 src2 dst

is translated to (the equivalent of):

$ mkdir -p /foo
$ rsync -@=-1 -AHSXpr --info=progress2 -l --safe-links \
--foo /context/src1 /context/src2 /storage/imgroot/foo/dst2

Note the extensive default arguments to rsync(1). RSYNC takes a single instruction option
beginning with + (plus) that is shorthand for a group of rsync(1) options. This single
option is one of:

+m Preserves metadata and directory structure. Symlinks are skipped with a warning.
Equivalent to all of:

• -@=-1: use nanosecond precision when comparing timestamps.

• -A: preserve ACLs.

• -H: preserve hard link groups.

• -S: preserve file sparseness when possible.

• -X: preserve xattrs in user.* namespace.

• -p: preserve permissions.

• -r: recurse into directories.

• --info=progress2 (only if stderr is a terminal): show progress meter (note
subtleties in interpretation).

+l (default)
Like +u, but silently skips “unsafe” symlinks whose target is outside the
top-of-transfer directory. Preserves:

• Metadata.

• Directory structure.

• Symlinks, if a link’s target is within the “top-of-transfer directory”. This
is not the context directory and often not the source either. Also, this
creates broken symlinks if the target is not within the source but is within
the top-of-transfer. See examples below.

Equivalent to the rsync(1) options listed for +m plus --links (copy symlinks as
symlinks unless otherwise specified) and --safe-links (silently skip unsafe
symlinks).

+u Like +l, but replaces with their target “unsafe” symlinks whose target is
outside the top-of-transfer directory, and thus can copy data outside the
context directory into the image. Preserves:

• Metadata.

• Directory structure.

Equivalent to the rsync(1) options listed for +m plus --links (copy symlinks as
symlinks unless otherwise specified) and --copy-unsafe-links (copy the target of
unsafe symlinks).

+z No default arguments. Directories will not be descended, no metadata will be
preserved, and both hard and symbolic links will be ignored, except as otherwise
specified by rsync(1) options starting with a hyphen. (Note that -a/--archive
is discouraged because it omits some metadata and handles symlinks
inappropriately for containers.)

NOTE:
rsync(1) supports a configuration file ~/.popt that alters its command line processing.
Currently, this configuration is respected for RSYNC arguments, but that may change
without notice.

Disallowed rsync(1) features
A small number of rsync(1) features are actively disallowed:

1. rsync: and ssh: transports are an error. Charliecloud needs access to the entire
input to compute cache hit or miss, and these transports make that impossible. It is
possible these will become available in the future (please let us know if that is
your use case!). For now, the workaround is to install rsync(1) in the image and
use it in a RUN instruction, though only the instruction text will be considered for
the cache.

2. Option arguments must be delimited with = (equals). For example, to set the block
size to 4 MiB, you must say --block-size=4M or -B=4M. -B4M will be interpreted as
the three arguments -B, -4, and -M; --block-size 4M will be interpreted as
--block-size with no argument and a copy source named 4M. This is so Charliecloud
can process rsync(1) options without knowing which ones take an argument.

3. Invalid rsync(1) options:

--daemon
Running rsync(1) in daemon mode does not make sense for container build.

-n, --dry-run
This makes the copy a no-op, and Charliecloud may want to use it internally
in the future.

--remove-source-files
This would let the instruction alter the context directory.

Note that there are likely other flags that don’t make sense and/or cause undesirable
behavior. We have not characterized this problem.

Build cache
The instruction is a cache hit if the metadata of all source files is unchanged
(specifically: filename, file type and permissions, xattrs, size, and last modified time).
Unlike Docker, Charliecloud does not use file contents. This has two implications. First,
it is possible to fool the cache by manually restoring the last-modified time. Second,
RSYNC is I/O-intensive even when it hits, because it must stat(2) every source file before
checking the cache. However, this is still less I/O than reading the file content too.

Notably, Charliecloud’s cache ignores rsync(1)’s own internal notion of whether anything
would be transferred (e.g., rsync -ni). This may change in the future.

Examples and tutorial
All of these examples use the same input, whose content will be introduced gradually,
using edited output of ls -oghR (which is like ls -lhR but omits user and group). Examples
assume a umask of 0007. The Dockerfile instructions listed also assume a preceding:

FROM alpine:3.17
RUN mkdir /dst

i.e., a simple base image containing a top-level directory dst.

Many additional examples are available in the source code in the file
test/build/50_rsync.bats.

We begin by copying regular files. The context directory ctx contains, in part, two
directories containing one regular file each. Note that one of these files (file-basic1)
and one of the directories (basic1) have strange permissions.

./ctx:
drwx---r-x 2 60 Oct 11 13:20 basic1
drwxrwx--- 2 60 Oct 11 13:20 basic2

./ctx/basic1:
-rw----r-- 1 12 Oct 11 13:20 file-basic1

./ctx/basic2:
-rw-rw---- 1 12 Oct 11 13:20 file-basic2

The simplest form of RSYNC is to copy a single file into a specified directory:

RSYNC /basic1/file-basic1 /dst

resulting in:

$ ls -oghR dst
dst:
-rw----r-- 1 12 Oct 11 13:26 file-basic1

Note that file-basic1’s metadata — here its odd permissions — are preserved. 1 is the
number of hard links to the file, and 12 is the file size.

One can also rename the destination by specifying a new file name, and with +z, not copy
metadata (from here on the ls command is omitted for brevity):

RSYNC +z /basic1/file-basic1 /dst/file-basic1_nom

dst:
-rw------- 1 12 Sep 21 15:51 file-basic1_nom

A trailing slash on the destination creates a new directory and places the source file
within:

RSYNC /basic1/file-basic1 /dst/new/

dst:
drwxrwx--- 1 22 Oct 11 13:26 new

dst/new:
-rw----r-- 1 12 Oct 11 13:26 file-basic1

With multiple source files, the destination trailing slash is optional:

RSYNC /basic1/file-basic1 /basic2/file-basic2 /dst/newB

dst:
drwxrwx--- 1 44 Oct 11 13:26 newB

dst/newB:
-rw----r-- 1 12 Oct 11 13:26 file-basic1
-rw-rw---- 1 12 Oct 11 13:26 file-basic2

For directory sources, the presence or absence of a trailing slash is highly significant.
Without one, the directory itself is placed in the destination (recall that this would
rename a source file):

RSYNC /basic1 /dst/basic1_new

dst:
drwxrwx--- 1 12 Oct 11 13:28 basic1_new

dst/basic1_new:
drwx---r-x 1 22 Oct 11 13:28 basic1

dst/basic1_new/basic1:
-rw----r-- 1 12 Oct 11 13:28 file-basic1

A source trailing slash means copy the contents of a directory rather than the directory
itself. Importantly, however, the directory’s metadata is copied to the destination
directory.

RSYNC /basic1/ /dst/basic1_renamed

dst:
drwx---r-x 1 22 Oct 11 13:28 basic1_renamed

dst/basic1_renamed:
-rw----r-- 1 12 Oct 11 13:28 file-basic1

One gotcha is that RSYNC +z is a no-op if the source is a directory:

RSYNC +z /basic1 /dst/basic1_newC

dst:

At least -r is needed with +z in this case:

RSYNC +z -r /basic1/ /dst/basic1_newD

dst:
drwx------ 1 22 Oct 11 13:28 basic1_newD

dst/basic1_newD:
-rw------- 1 12 Oct 11 13:28 file-basic1

Multiple source directories can be specified, including with wildcards. This example also
illustrates that copies files are by default merged with content already existing in the
image.

RUN mkdir /dst/dstC && echo file-dstC > /dst/dstC/file-dstC
RSYNC /basic* /dst/dstC

dst:
drwxrwx--- 1 42 Oct 11 13:33 dstC

dst/dstC:
drwx---r-x 1 22 Oct 11 13:33 basic1
drwxrwx--- 1 22 Oct 11 13:33 basic2
-rw-rw---- 1 10 Oct 11 13:33 file-dstC

dst/dstC/basic1:
-rw----r-- 1 12 Oct 11 13:33 file-basic1

dst/dstC/basic2:
-rw-rw---- 1 12 Oct 11 13:33 file-basic2

Trailing slashes can be specified independently for each source:

RUN mkdir /dst/dstF && echo file-dstF > /dst/dstF/file-dstF
RSYNC /basic1 /basic2/ /dst/dstF

dst:
drwxrwx--- 1 52 Oct 11 13:33 dstF

dst/dstF:
drwx---r-x 1 22 Oct 11 13:33 basic1
-rw-rw---- 1 12 Oct 11 13:33 file-basic2
-rw-rw---- 1 10 Oct 11 13:33 file-dstF

dst/dstF/basic1:
-rw----r-- 1 12 Oct 11 13:33 file-basic1

Bare / (i.e., the entire context directory) is considered to have a trailing slash:

RSYNC / /dst

dst:
drwx---r-x 1 22 Oct 11 13:33 basic1
drwxrwx--- 1 22 Oct 11 13:33 basic2

dst/basic1:
-rw----r-- 1 12 Oct 11 13:33 file-basic1

dst/basic2:
-rw-rw---- 1 12 Oct 11 13:33 file-basic2

To replace (rather than merge with) existing content, use --delete. Note also that
wildcards can be combined with trailing slashes and that the directory gets the metadata
of the first slashed directory.

RUN mkdir /dst/dstG && echo file-dstG > /dst/dstG/file-dstG
RSYNC --delete /basic*/ /dst/dstG

dst:
drwx---r-x 1 44 Oct 11 14:00 dstG

dst/dstG:
-rw----r-- 1 12 Oct 11 14:00 file-basic1
-rw-rw---- 1 12 Oct 11 14:00 file-basic2

Symbolic links in the source(s) add significant complexity. Like rsync(1), RSYNC can do
one of three things with a given symlink:

1. Ignore it, silently or with a warning.

2. Preserve it: copy as a symlink, with the same target.

3. Dereference it: copy the target instead.

These actions are selected independently for safe symlinks and unsafe symlinks. Safe
symlinks are those which point to a target within the top of transfer, which is the
deepest directory in the source path with a trailing slash. For example, /foo/bar’s
top-of-transfer is /foo (regardless of whether bar is a directory or file), while
/foo/bar/’s top-of-transfer is /foo/bar.

For the symlink examples, the context contains two sub-directories with a variety of
symlinks, as well as a sibling file and directory outside the context. All of these links
are valid on the host. In this listing, the absolute path to the parent of the context
directory is replaced with /....

.:
drwxrwx--- 9 200 Oct 11 14:00 ctx
drwxrwx--- 2 60 Oct 11 14:00 dir-out
-rw-rw---- 1 9 Oct 11 14:00 file-out

./ctx:
drwxrwx--- 3 320 Oct 11 14:00 sym1

./ctx/sym1:
lrwxrwxrwx 1 13 Oct 11 14:00 dir-out_rel -> ../../dir-out
drwxrwx--- 2 60 Oct 11 14:00 dir-sym1
lrwxrwxrwx 1 8 Oct 11 14:00 dir-sym1_direct -> dir-sym1
lrwxrwxrwx 1 10 Oct 11 14:00 dir-top_rel -> ../dir-top
lrwxrwxrwx 1 47 Oct 11 14:00 file-out_abs -> /.../file-out
lrwxrwxrwx 1 14 Oct 11 14:00 file-out_rel -> ../../file-out
-rw-rw---- 1 10 Oct 11 14:00 file-sym1
lrwxrwxrwx 1 57 Oct 11 14:00 file-sym1_abs -> /.../ctx/sym1/file-sym1
lrwxrwxrwx 1 9 Oct 11 14:00 file-sym1_direct -> file-sym1
lrwxrwxrwx 1 17 Oct 11 14:00 file-sym1_upover -> ../sym1/file-sym1
lrwxrwxrwx 1 51 Oct 11 14:00 file-top_abs -> /.../ctx/file-top
lrwxrwxrwx 1 11 Oct 11 14:00 file-top_rel -> ../file-top

./ctx/sym1/dir-sym1:
-rw-rw---- 1 14 Oct 11 14:00 dir-sym1.file

./dir-out:
-rw-rw---- 1 13 Oct 11 14:00 dir-out.file

By default, safe symlinks are preserved while unsafe symlinks are silently ignored:

RSYNC /sym1 /dst

dst:
drwxrwx--- 1 206 Oct 11 17:10 sym1

dst/sym1:
drwxrwx--- 1 26 Oct 11 17:10 dir-sym1
lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1
lrwxrwxrwx 1 10 Oct 11 17:10 dir-top_rel -> ../dir-top
-rw-rw---- 1 10 Oct 11 17:10 file-sym1
lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1
lrwxrwxrwx 1 17 Oct 11 17:10 file-sym1_upover -> ../sym1/file-sym1
lrwxrwxrwx 1 17 Oct 11 17:10 file-sym2_upover -> ../sym2/file-sym2
lrwxrwxrwx 1 11 Oct 11 17:10 file-top_rel -> ../file-top

dst/sym1/dir-sym1:
-rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file

The source files have four rough fates:

1. Regular files and directories (file-sym1 and dir-sym1). These are copied into the
image unchanged, including metadata.

2. Safe symlinks, now broken. This is one of the gotchas of RSYNC’s top-of-transfer
directory (here host path ./ctx, image path /) differing from the source directory
(./ctx/sym1, /sym1), because the latter lacks a trailing slash. dir-top_rel,
file-sym2_upover, and file-top_rel all ascend only as high as ./ctx (host path, /
image) before re-descending. This is within the top-of-transfer, so the symlinks are
safe and thus copied unchanged, but their targets were not included in the copy.

3. Safe symlinks, still valid.

1. dir-sym1_direct and file-sym1_direct point directly to files in the same directory.

2. dir-sym1_upover and file-sym1_upover point to files in the same directory, but by
first ascending into their parent — within the top-of-transfer, so they are safe —
and then re-descending. If sym1 were renamed during the copy, these links would
break.

4. Unsafe symlinks, which are ignored by the copy and do not appear in the image.

1. Absolute symlinks are always unsafe (*_abs).

2. dir-out_rel and file-out_rel are relative symlinks that ascend above the
top-of-transfer, in this case to targets outside the context, and are thus unsafe.

The top-of-transfer can be changed to sym1 with a trailing slash. This also adds sym1 to
the destination so the resulting directory structure is the same.

RSYNC /sym1/ /dst/sym1

dst:
drwxrwx--- 1 96 Oct 11 17:10 sym1

dst/sym1:
drwxrwx--- 1 26 Oct 11 17:10 dir-sym1
lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1
-rw-rw---- 1 10 Oct 11 17:10 file-sym1
lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1

dst/sym1/dir-sym1:
-rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file

*_upover and *-out_rel are now unsafe and replaced with their targets.

Another common use case is to follow unsafe symlinks and copy their targets in place of
the links. This is accomplished with +u:

RSYNC +u /sym1/ /dst/sym1

dst:
drwxrwx--- 1 352 Oct 11 17:10 sym1

dst/sym1:
drwxrwx--- 1 24 Oct 11 17:10 dir-out_rel
drwxrwx--- 1 26 Oct 11 17:10 dir-sym1
lrwxrwxrwx 1 8 Oct 11 17:10 dir-sym1_direct -> dir-sym1
drwxrwx--- 1 24 Oct 11 17:10 dir-top_rel
-rw-rw---- 1 9 Oct 11 17:10 file-out_abs
-rw-rw---- 1 9 Oct 11 17:10 file-out_rel
-rw-rw---- 1 10 Oct 11 17:10 file-sym1
-rw-rw---- 1 10 Oct 11 17:10 file-sym1_abs
lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1
-rw-rw---- 1 10 Oct 11 17:10 file-sym1_upover
-rw-rw---- 1 10 Oct 11 17:10 file-sym2_abs
-rw-rw---- 1 10 Oct 11 17:10 file-sym2_upover
-rw-rw---- 1 9 Oct 11 17:10 file-top_abs
-rw-rw---- 1 9 Oct 11 17:10 file-top_rel

dst/sym1/dir-out_rel:
-rw-rw---- 1 13 Oct 11 17:10 dir-out.file

dst/sym1/dir-sym1:
-rw-rw---- 1 14 Oct 11 17:10 dir-sym1.file

dst/sym1/dir-top_rel:
-rw-rw---- 1 13 Oct 11 17:10 dir-top.file

Now all the unsafe symlinks noted above are present in the image, but they have changed to
the normal files and directories pointed to.

WARNING:
This feature lets you copy files outside the context into the image, unlike other
container builders where COPY can never access anything outside the context.

The sources themselves, if symlinks, do not get special treatment:

RSYNC /sym1/file-sym1_direct /sym1/file-sym1_upover /dst

dst:
lrwxrwxrwx 1 9 Oct 11 17:10 file-sym1_direct -> file-sym1

Note that file-sym1_upover does not appear in the image, despite being named explicitly in
the instruction, because it is an unsafe symlink.

If the destination is a symlink to a file, and the source is a file, the link is replaced
and the target is unchanged. (If the source is a directory, that is an error.)

RUN touch /dst/file-dst && ln -s file-dst /dst/file-dst_direct
RSYNC /file-top /dst/file-dst_direct

dst:
-rw-rw---- 1 0 Oct 11 17:42 file-dst
-rw-rw---- 1 9 Oct 11 17:42 file-dst_direct

If the destination is a symlink to a directory, the link is followed:

RUN mkdir /dst/dir-dst && ln -s dir-dst /dst/dir-dst_direct
RSYNC /file-top /dst/dir-dst_direct

dst:
drwxrwx--- 1 16 Oct 11 17:50 dir-dst
lrwxrwxrwx 1 7 Oct 11 17:50 dir-dst_direct -> dir-dst

dst/dir-dst:
-rw-rw---- 1 9 Oct 11 17:50 file-top

Examples
Build image bar using ./foo/bar/Dockerfile and context directory ./foo/bar:

$ ch-image build -t bar -f ./foo/bar/Dockerfile ./foo/bar
[...]
grown in 4 instructions: bar

Same, but infer the image name and Dockerfile from the context directory path:

$ ch-image build ./foo/bar
[...]
grown in 4 instructions: bar

Build using humongous vendor compilers you want to bind-mount instead of installing into
the image:

$ ch-image build --bind /opt/bigvendor:/opt .
$ cat Dockerfile
FROM centos:7

RUN /opt/bin/cc hello.c
#COPY /opt/lib/*.so /usr/local/lib # fail: COPY doesn’t bind mount
RUN cp /opt/lib/*.so /usr/local/lib # possible workaround
RUN ldconfig

BUILD-CACHE

          $ ch-image [...] build-cache [...]

       Print  basic  information  about the cache. If -v is given, also print some Git statistics
       and the Git repository configuration.

       If any of the following options are given, do the corresponding operation before printing.
       Multiple options can be given, in which case they happen in this order.

          --dot  Create  a  DOT  export  of  the tree named ./build-cache.dot and a PDF rendering
                 ./build-cache.pdf. Requires graphviz and git2dot.

          --gc   Run Git garbage collection  on  the  cache,  including  full  de-duplication  of
                 similar  files.  This  will  immediately  remove all cache entries not currently
                 reachable from a named branch (which is likely to cause corruption if the  build
                 cache is being accessed concurrently by another process). The operation can take
                 a long time on large caches.

          --reset
                 Clear and re-initialize the build cache.

          --tree Print a text tree of the cache using Git’s git log --graph  feature.  If  -v  is
                 also given, the tree has more detail.

DELETE

          $ ch-image [...] delete IMAGE_GLOB [IMAGE_GLOB ... ]

       Delete the image(s) described by each IMAGE_GLOB from the storage directory (including all
       build stages).

       IMAGE_GLOB can be either  a  plain  image  reference  or  an  image  reference  with  glob
       characters  to  match multiple images. For example, ch-image delete 'foo*' will delete all
       images whose names start with foo.  Multiple images and/or globs can also be  given  in  a
       single command line.

       Importantly,  this  sub-command  does  not  also  remove  the  image from the build cache.
       Therefore, it can be used to reduce the size of the storage  directory,  trading  off  the
       time needed to retrieve an image from cache.

       WARNING:
          Glob  characters  must  be  quoted  or  otherwise  protected from the shell, which also
          desires to interpret them and will do so incorrectly.

GESTALT

          $ ch-image [...] gestalt [SELECTOR]

       Provide information about the configuration and available features of ch-image. End  users
       generally will not need this; it is intended for testing and debugging.

       SELECTOR is one of:

          • bucache.  Exit  successfully  if the build cache is available, unsuccessfully with an
            error message otherwise. With -v, also print version information about dependencies.

          • bucache-dot.  Exit  successfully  if  build  cache  DOT   trees   can   be   written,
            unsuccessfully  with  an  error  message  otherwise.  With  -v,  also  print  version
            information about dependencies.

          • python-path. Print the path to the Python interpreter in use and exit successfully.

          • storage-path. Print the storage directory path and exit successfully.

LIST

Print information about images. If no argument given, list the images in builder storage.

Synopsis
$ ch-image [...] list [-l] [IMAGE_REF]

Description
Optional argument:

-l, --long
Use long format (name, last change timestamp) when listing images.

-u, --undeletable
List images that can be undeleted. Can also be spelled --undeleteable.

IMAGE_REF
Print details of what’s known about IMAGE_REF, both locally and in the remote
registry, if any.

Examples
List images in builder storage:

$ ch-image list
alpine:3.17 (amd64)
alpine:latest (amd64)
debian:buster (amd64)

Print details about Debian Buster image:

$ ch-image list debian:buster
details of image: debian:buster
in local storage: no
full remote ref: registry-1.docker.io:443/library/debian:buster
available remotely: yes
remote arch-aware: yes
host architecture: amd64
archs available: 386 bae2738ed83
amd64 98285d32477
arm/v7 97247fd4822
arm64/v8 122a0342878

For remotely available images like Debian Buster, the associated digest is listed beside
each available architecture. Importantly, this feature does not provide the hash of the
local image, which is only calculated on push.

IMPORT

          $ ch-image [...] import PATH IMAGE_REF

       Copy the image at PATH into builder storage with name IMAGE_REF. PATH can be:

       • an image directory

       • a tarball with no top-level directory (a.k.a. a “tarbomb”)

       • a standard tarball with one top-level directory

       If  the  imported  image  contains Charliecloud metadata, that will be imported unchanged,
       i.e., images exported from ch-image builder storage will be  functionally  identical  when
       re-imported.

       WARNING:
          Descendant  images (i.e., FROM the imported IMAGE_REF) are linked using IMAGE_REF only.
          If a new image is imported under a new IMAGE_REF, all instructions descending from that
          IMAGE_REF will still hit, even if the new image is different.

PULL

       Pull  the  image described by the image reference IMAGE_REF from a repository to the local
       filesystem.

   Synopsis
          $ ch-image [...] pull [...] IMAGE_REF [DEST_REF]

       See the FAQ for the gory details on specifying image references.

   Description
       Destination:

          DEST_REF
                 If  specified,  use  this  as  the  destination  image  reference,  rather  than
                 IMAGE_REF.  This  lets  you  pull  an  image  with a complicated reference while
                 storing it locally with a simpler one.

       Options:

          --last-layer N
                 Unpack only N layers, leaving an incomplete image. This option is  intended  for
                 debugging.

          --parse-only
                 Parse  IMAGE_REF, print a parse report, and exit successfully without talking to
                 the internet or touching the storage directory.

       This script does a fair amount of validation and  fixing  of  the  layer  tarballs  before
       flattening  in  order to support unprivileged use despite image problems we frequently see
       in the wild. For example, device files are ignored, and file and directory permissions are
       increased  to  a  minimum  of  rwx------  and  rw------- respectively. Note, however, that
       symlinks pointing outside the image are permitted, because they  are  not  resolved  until
       runtime within a container.

       The  following  metadata  in the pulled image is retained; all other metadata is currently
       ignored. (If you have a need for additional metadata, please let us know!)

          • Current working directory set with WORKDIR is effective in downstream Dockerfiles.

          • Environment variables set with ENV are effective in downstream Dockerfiles  and  also
            written to /ch/environment for use in ch-run --set-env.

          • Mount  point directories specified with VOLUME are created in the image if they don’t
            exist, but no other action is taken.

       Note that some images (e.g., those with a “version 1 manifest”) do not contain metadata. A
       warning is printed in this case.

   Examples
       Download  the  Debian  Buster  image  matching the host’s architecture and place it in the
       storage directory:

          $ uname -m
          aarch32
          pulling image:    debian:buster
          requesting arch:  arm64/v8
          manifest list: downloading
          manifest: downloading
          config: downloading
          layer 1/1: c54d940: downloading
          flattening image
          layer 1/1: c54d940: listing
          validating tarball members
          resolving whiteouts
          layer 1/1: c54d940: extracting
          image arch: arm64
          done

       Same, specifying the architecture explicitly:

          $ ch-image --arch=arm/v7 pull debian:buster
          pulling image:    debian:buster
          requesting arch:  arm/v7
          manifest list: downloading
          manifest: downloading
          config: downloading
          layer 1/1: 8947560: downloading
          flattening image
          layer 1/1: 8947560: listing
          validating tarball members
          resolving whiteouts
          layer 1/1: 8947560: extracting
          image arch: arm (may not match host arm64/v8)

PUSH

       Push the image described by the image reference IMAGE_REF from the local filesystem  to  a
       repository.

   Synopsis
          $ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF]

       See the FAQ for the gory details on specifying image references.

   Description
       Destination:

          DEST_REF
                 If  specified,  use  this  as  the  destination  image  reference,  rather  than
                 IMAGE_REF. This lets you push to a repository without permanently adding  a  tag
                 to the image.

       Options:

          --image DIR
                 Use  the  unpacked  image  located  at  DIR  rather than an image in the storage
                 directory named IMAGE_REF.

       Because Charliecloud is fully unprivileged, the owner and group of files in its images are
       not  meaningful  in  the  broader ecosystem. Thus, when pushed, everything in the image is
       flattened to  user:group  root:root.  Also,  setuid/setgid  bits  are  removed,  to  avoid
       surprises if the image is pulled by a privileged container implementation.

   Examples
       Push a local image to the registry example.com:5000 at path /foo/bar with tag latest. Note
       that in this form, the local image must be named to match that remote reference.

          $ ch-image push example.com:5000/foo/bar:latest
          pushing image:   example.com:5000/foo/bar:latest
          layer 1/1: gathering
          layer 1/1: preparing
          preparing metadata
          starting upload
          layer 1/1: a1664c4: checking if already in repository
          layer 1/1: a1664c4: not present, uploading
          config: 89315a2: checking if already in repository
          config: 89315a2: not present, uploading
          manifest: uploading
          cleaning up
          done

       Same, except use local image alpine:3.17. In this form, the local image name does not have
       to match the destination reference.

          $ ch-image push alpine:3.17 example.com:5000/foo/bar:latest
          pushing image:   alpine:3.17
          destination:     example.com:5000/foo/bar:latest
          layer 1/1: gathering
          layer 1/1: preparing
          preparing metadata
          starting upload
          layer 1/1: a1664c4: checking if already in repository
          layer 1/1: a1664c4: not present, uploading
          config: 89315a2: checking if already in repository
          config: 89315a2: not present, uploading
          manifest: uploading
          cleaning up
          done

       Same, except use unpacked image located at /var/tmp/image rather than an image in ch-image
       storage. (Also, the sole layer is already present in the  remote  registry,  so  we  don’t
       upload it again.)

          $ ch-image push --image /var/tmp/image example.com:5000/foo/bar:latest
          pushing image:   example.com:5000/foo/bar:latest
          image path:      /var/tmp/image
          layer 1/1: gathering
          layer 1/1: preparing
          preparing metadata
          starting upload
          layer 1/1: 892e38d: checking if already in repository
          layer 1/1: 892e38d: already present
          config: 546f447: checking if already in repository
          config: 546f447: not present, uploading
          manifest: uploading
          cleaning up
          done

RESET

          $ ch-image [...] reset

       Delete all images and cache from ch-image builder storage.

UNDELETE

          $ ch-image [...] undelete IMAGE_REF

       If  IMAGE_REF  has been deleted but is in the build cache, recover it from the cache. Only
       available when the cache is enabled, and will not overwrite IMAGE_REF if it exists.

ENVIRONMENT VARIABLES

       CH_IMAGE_USERNAME, CH_IMAGE_PASSWORD
              Username and password for registry authentication. See important caveats in section
              “Authentication” above.

       CH_LOG_FILE
              If set, append log chatter to this file, rather than standard error. This is useful
              for debugging situations where standard error is consumed or lost.

              Also sets verbose mode if not already set (equivalent to --verbose).

       CH_LOG_FESTOON
              If set, prepend PID and timestamp to logged chatter.

       CH_XATTRS
              If set, save xattrs in the build cache and restore them when  rebuilding  from  the
              cache (equivalent to --xattrs).

REPORTING BUGS

       If  Charliecloud  was  obtained  from your Linux distribution, use your distribution’s bug
       reporting procedures.

       Otherwise, report bugs to: https://github.com/hpc/charliecloud/issues

COPYRIGHT

       2014–2023, Triad National Security, LLC and others

NAME

SYNOPSIS

DESCRIPTION

ARCHITECTURE

AUTHENTICATION

STORAGE DIRECTORY

BUILD CACHE

BUILD

BUILD-CACHE

DELETE

GESTALT

LIST

IMPORT

PULL

PUSH

RESET

UNDELETE

ENVIRONMENT VARIABLES

REPORTING BUGS

SEE ALSO

COPYRIGHT