Provided by: qemu-system-common_7.2+dfsg-5ubuntu2_amd64 bug

NAME

       virtiofsd - QEMU virtio-fs shared file system daemon

SYNOPSIS

       virtiofsd [OPTIONS]

DESCRIPTION

       Share  a  host  directory tree with a guest through a virtio-fs device.  This program is a
       vhost-user backend that implements the virtio-fs device.  Each virtio-fs  device  instance
       requires its own virtiofsd process.

       This  program  is  designed to work with QEMU's --device vhost-user-fs-pci but should work
       with any virtual machine monitor (VMM) that supports vhost-user.  See the Examples section
       below.

       This  program  must  be run as the root user.  The program drops privileges where possible
       during startup although it must be able to create and access files with any uid/gid:

       • The ability to invoke syscalls is limited using seccomp(2).

       • Linux capabilities(7) are dropped.

       In "namespace" sandbox mode the program switches into a  new  file  system  namespace  and
       invokes  pivot_root(2)  to  make  the  shared  directory tree its root.  A new pid and net
       namespace is also created to isolate the process.

       In "chroot" sandbox mode the program invokes chroot(2) to make the shared  directory  tree
       its root. This mode is intended for container environments where the container runtime has
       already set up the  namespaces  and  the  program  does  not  have  permission  to  create
       namespaces itself.

       Both  sandbox  modes  prevent  "file system escapes" due to symlinks and other file system
       objects that might lead to files outside the shared directory.

OPTIONS

       -h, --help
              Print help.

       -V, --version
              Print version.

       -d     Enable debug output.

       --syslog
              Print log messages to syslog instead of stderr.

       -o OPTION

              • debug - Enable debug output.

              • flock|no_flock - Enable/disable flock.  The default is no_flock.

              • modcaps=CAPLIST Modify the list of  capabilities  allowed;  CAPLIST  is  a  colon
                separated   list   of  capabilities,  each  preceded  by  either  +  or  -,  e.g.
                ''+sys_admin:-chown''.

              • log_level=LEVEL - Print only log messages matching LEVEL or more  severe.   LEVEL
                is one of err, warn, info, or debug.  The default is info.

              • posix_lock|no_posix_lock  -  Enable/disable  remote  POSIX locks.  The default is
                no_posix_lock.

              • readdirplus|no_readdirplus  -  Enable/disable  readdirplus.    The   default   is
                readdirplus.

              • sandbox=namespace|chroot  - Sandbox mode: - namespace: Create mount, pid, and net
                namespaces and pivot_root(2) into the shared directory.  - chroot: chroot(2) into
                shared directory (use in containers).  The default is "namespace".

              • source=PATH  -  Share  host  directory  tree  located  at  PATH.   This option is
                required.

              • timeout=TIMEOUT - I/O timeout in seconds.  The default depends on cache= option.

              • writeback|no_writeback - Enable/disable writeback cache.  The  cache  allows  the
                FUSE client to buffer and merge write requests.  The default is no_writeback.

              • xattr|no_xattr   -  Enable/disable  extended  attributes  (xattr)  on  files  and
                directories.  The default is no_xattr.

              • posix_acl|no_posix_acl -  Enable/disable  posix  acl  support.   Posix  ACLs  are
                disabled by default.

              • security_label|no_security_label   -   Enable/disable   security  label  support.
                Security labels are disabled by default. This will allow client  to  send  a  MAC
                label  of  file  during  file  creation. Typically this is expected to be SELinux
                security label. Server  will  try  to  set  that  label  on  newly  created  file
                atomically wherever possible.

              • killpriv_v2|no_killpriv_v2   -  Enable/disable  FUSE_HANDLE_KILLPRIV_V2  support.
                KILLPRIV_V2 is enabled by default as long as the  client  supports  it.  Enabling
                this option helps with performance in write path.

       --socket-path=PATH
              Listen on vhost-user UNIX domain socket at PATH.

       --socket-group=GROUP
              Set the vhost-user UNIX domain socket gid to GROUP.

       --fd=FDNUM
              Accept  connections  from vhost-user UNIX domain socket file descriptor FDNUM.  The
              file descriptor must already be listening for connections.

       --thread-pool-size=NUM
              Restrict the number of worker threads per request queue to NUM.  The default is 0.

       --cache=none|auto|always
              Select the desired trade-off between coherency and performance.  none  forbids  the
              FUSE  client  from  caching  to  achieve best coherency at the cost of performance.
              auto acts similar to NFS with a 1 second metadata cache  timeout.   always  sets  a
              long cache lifetime at the expense of coherency.  The default is auto.

EXTENDED ATTRIBUTE (XATTR) MAPPING

       By  default  the  name of xattr's used by the client are passed through to the server file
       system.  This can be a problem where either those xattr names are used by something on the
       server  (e.g.  selinux  client/server  confusion)  or  if  the  virtiofsd  is running in a
       container with restricted privileges where it cannot access some attributes.

   Mapping syntax
       A mapping of xattr names can be made using -o xattrmap=mapping where  the  mapping  string
       consists of a series of rules.

       The  first  matching  rule  terminates  the  mapping.   The  set  of  rules must include a
       terminating rule to match any remaining attributes at the end.

       Each rule consists of a number of fields separated with a  separator  that  is  the  first
       non-white  space  character  in  the rule.  This separator must then be used for the whole
       rule.  White space may be added before and after each rule.

       Using ':' as the separator a rule is of the form:

       :type:scope:key:prepend:

       scope is:

       •

         'client' - match 'key' against a xattr name from the client for
                setxattr/getxattr/removexattr

       •

         'server' - match 'prepend' against a xattr name from the server
                for listxattr

       •

         'all' - can be used to make a single rule where both the server
                and client matches are triggered.

       type is one of:

       • 'prefix' - is designed to prepend and strip a  prefix;   the  modified  attributes  then
         being passed on to the client/server.

       • 'ok'  -  Causes  the  rule  set  to  be  terminated when a match is found while allowing
         matching xattr's through unchanged.   It  is  intended  both  as  a  way  of  explicitly
         terminating the list of rules, and to allow some xattr's to skip following rules.

       • 'bad' - If a client tries to use a name matching 'key' it's denied using EPERM; when the
         server passes an attribute name matching 'prepend' it's hidden.  In many ways  it's  use
         is  very  like  'ok' as either an explicit terminator or for special handling of certain
         patterns.

       • 'unsupported' - If a client tries to  use  a  name  matching  'key'  it's  denied  using
         ENOTSUP;  when  the  server passes an attribute name matching 'prepend' it's hidden.  In
         many ways it's use is very like 'ok' as either an explicit  terminator  or  for  special
         handling of certain patterns.

       key  is  a  string  tested as a prefix on an attribute name originating on the client.  It
       maybe empty in which case a 'client' rule will always match on client names.

       prepend is a string tested as a prefix on an attribute name originating on the server, and
       used  as a new prefix.  It may be empty in which case a 'server' rule will always match on
       all names from the server.

       e.g.:
          :prefix:client:trusted.:user.virtiofs.:

          will match 'trusted.' attributes in client calls and prefix them before passing them to
          the server.

          :prefix:server::user.virtiofs.:

          will strip 'user.virtiofs.' from all server replies.

          :prefix:all:trusted.:user.virtiofs.:

          combines the previous two cases into a single rule.

          :ok:client:user.::

          will allow get/set xattr for 'user.' xattr's and ignore following rules.

          :ok:server::security.:

          will pass 'security.' xattr's in listxattr from the server and ignore following rules.

          :ok:all:::

          will terminate the rule search passing any remaining attributes in both directions.

          :bad:server::security.:

          would hide 'security.' xattr's in listxattr from the server.

       A simpler 'map' type provides a shorter syntax for the common case:

       :map:key:prepend:

       The  'map'  type adds a number of separate rules to add prepend as a prefix to the matched
       key (or all attributes if key is empty).  There may be at most one 'map' rule and it  must
       be the last rule in the set.

       Note: When the 'security.capability' xattr is remapped, the daemon has to do extra work to
       remove it during many operations, which the host kernel normally does itself.

   Security considerations
       Operating systems  typically  partition  the  xattr  namespace  using  well  defined  name
       prefixes. Each partition may have different access controls applied. For example, on Linux
       there are multiple partitions

          • system.* - access varies depending on attribute & filesystem

          • security.* - only processes with CAP_SYS_ADMIN

          • trusted.* - only processes with CAP_SYS_ADMIN

          • user.* - any process granted by file permissions / ownership

       While other OS such as FreeBSD have different name prefixes and access control rules.

       When remapping attributes on the host, it is important to ensure that the  remapping  does
       not allow a guest user to evade the guest access control rules.

       Consider  if  trusted.* from the guest was remapped to user.virtiofs.trusted* in the host.
       An unprivileged user in a Linux guest has the ability to write  to  xattrs  under  user.*.
       Thus  the user can evade the access control restriction on trusted.* by instead writing to
       user.virtiofs.trusted.*.

       As noted above, the partitions used and access controls applied, will  vary  across  guest
       OS, so it is not wise to try to predict what the guest OS will use.

       The  simplest  way to avoid an insecure configuration is to remap all xattrs at once, to a
       given fixed prefix.  This is shown in example (1) below.

       If selectively mapping only a subset of xattr  prefixes,  then  rules  must  be  added  to
       explicitly  block  direct  access to the target of the remapping. This is shown in example
       (2) below.

   Mapping examples
       1. Prefix all attributes with 'user.virtiofs.'

          -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"

       This uses two rules, using : as the field separator; the first rule  prefixes  and  strips
       'user.virtiofs.', the second rule hides any non-prefixed attributes that the host set.

       This is equivalent to the 'map' rule:

          -o xattrmap=":map::user.virtiofs.:"

       2. Prefix 'trusted.' attributes, allow others through

          "/prefix/all/trusted./user.virtiofs./
           /bad/server//trusted./
           /bad/client/user.virtiofs.//
           /ok/all///"

       Here there are four rules, using / as the field separator, and also demonstrating that new
       lines can be included between rules.  The first rule is the prefixing  of  'trusted.'  and
       stripping  of 'user.virtiofs.'.  The second rule hides unprefixed 'trusted.' attributes on
       the host.  The third rule stops a guest from explicitly setting the 'user.virtiofs.'  path
       directly  to  prevent access control bypass on the target of the earlier prefix remapping.
       Finally, the fourth rule lets all remaining attributes through.

       This is equivalent to the 'map' rule:

          -o xattrmap="/map/trusted./user.virtiofs./"

       3. Hide 'security.' attributes, and allow everything else

          "/bad/all/security./security./
           /ok/all///'

       The first rule combines what could be separate client and server rules into a single 'all'
       rule,  matching  'security.'  in  either client arguments or lists returned from the host.
       This stops the client seeing any 'security.' attributes on the server and stops it setting
       any.

SELINUX SUPPORT

       One  can  enable support for SELinux by running virtiofsd with option "-o security_label".
       But this will try to save guest's security context in xattr security.selinux on  host  and
       it might fail if host's SELinux policy does not permit virtiofsd to do this operation.

       Hence,   it   is   preferred   to   remap   guest's   "security.selinux"   xattr   to  say
       "trusted.virtiofs.security.selinux" on host.

       "-o xattrmap=:map:security.selinux:trusted.virtiofs.:"

       This will make sure that guest and host's SELinux xattrs on same file remain separate  and
       not  interfere  with each other. And will allow both host and guest to implement their own
       separate SELinux policies.

       Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need add this capability
       to daemon.

       "-o modcaps=+sys_admin"

       Giving  CAP_SYS_ADMIN  increases the risk on system. Now virtiofsd is more powerful and if
       gets compromised, it can do lot of damage to host system.  So keep this  trade-off  in  my
       mind while making a decision.

EXAMPLES

       Export /var/lib/fs/vm001/ on vhost-user UNIX domain socket /var/run/vm001-vhost-fs.sock:

          host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
          host# qemu-system-x86_64 \
                -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
                -device vhost-user-fs-pci,chardev=char0,tag=myfs \
                -object memory-backend-memfd,id=mem,size=4G,share=on \
                -numa node,memdev=mem \
                ...
          guest# mount -t virtiofs myfs /mnt

AUTHOR

       Stefan Hajnoczi <stefanha@redhat.com>, Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

COPYRIGHT

       2023, The QEMU Project Developers