oracular (7) fi_shm.7.gz

Provided by: libfabric-dev_1.17.0-3ubuntu1_amd64 bug

NAME

       fi_shm - The SHM Fabric Provider

OVERVIEW

       The  SHM  provider  is  a  complete  provider that can be used on Linux systems supporting
       shared memory and process_vm_readv/process_vm_writev calls.  The provider is  intended  to
       provide high-performance communication between processes on the same system.

SUPPORTED FEATURES

       This  release  contains  an  initial  implementation  of  the SHM provider that offers the
       following support:

       Endpoint types
              The provider supports only endpoint type FI_EP_RDM.

       Endpoint capabilities
              Endpoints cna support any combinations of the following data transfer capabilities:
              FI_MSG,  FI_TAGGED,  FI_RMA,  amd  FI_ATOMICS.   These  capabilities can be further
              defined by FI_SEND, FI_RECV, FI_READ, FI_WRITE, FI_REMOTE_READ, and FI_REMOTE_WRITE
              to limit the direction of operations.

       Modes  The provider does not require the use of any mode bits.

       Progress
              The  SHM  provider  supports FI_PROGRESS_MANUAL.  Receive side data buffers are not
              modified  outside  of  completion  processing  routines.   The  provider  processes
              messages  using  three  different  methods,  based on the size of the message.  For
              messages smaller than 4096 bytes, tx completions are  generated  immediately  after
              the  send.   For  larger  messages,  tx  completions  are  not  generated until the
              receiving side has processed the message.

       Address Format
              The SHM provider uses the address format FI_ADDR_STR,  which  follows  the  general
              format  pattern “[prefix]://[addr]”.  The application can provide addresses through
              the node or hints parameter.  As long as the address  is  in  a  valid  FI_ADDR_STR
              format  (contains “://”), the address will be used as is.  If the application input
              is incorrectly formatted or no input was provided, the SHM provider will resolve it
              according to the following SHM provider standards:

       (flags   &   FI_SOURCE)   ?    src_addr   :   dest_addr   =  -  if  (node  &&  service)  :
       “fi_ns://node:service” - if (service) :  “fi_ns://service”  -  if  (node  &&  !service)  :
       “fi_shm://node” - if (!node && !service) : “fi_shm://PID”

       !(flags & FI_SOURCE) - src_addr = “fi_shm://PID”

       In  other  words,  if  the  application provides a source and/or destination address in an
       acceptable FI_ADDR_STR format (contains “://”), the call to util_getinfo will successfully
       fill  in  src_addr  and  dest_addr  with  the  provided  input.  If the input is not in an
       ADDR_STR format, the shared memory provider will then create a proper FI_ADDR_STR  address
       with  either  the  “fi_ns://”  (node/service  format) or “fi_shm://” (shm format) prefixes
       signaling whether the addr is a “unique” address and  does  or  does  not  need  an  extra
       endpoint  name  identifier  appended  in  order  to make it unique.  For the shared memory
       provider, we assume that the service (with or without a node) is enough to make it unique,
       but a node alone is not sufficient.  If only a node is provided, the “fi_shm://” prefix is
       used to signify that it is not a unique address.  If no node or service are provided  (and
       in  the  case  of  setting the src address without FI_SOURCE and no hints), the process ID
       will be used as a default  address.   On  endpoint  creation,  if  the  src_addr  has  the
       “fi_shm://”  prefix,  the provider will append “:[uid]:[ep_idx]” as a unique endpoint name
       (essentially, in place of a service).  In the case of the “fi_ns://” prefix (or any  other
       prefix if one was provided by the application), no supplemental information is required to
       make it unique and it will remain with only the application-defined  address.   Note  that
       the actual endpoint name will not include the FI_ADDR_STR "*://" prefix since it cannot be
       included in any shared memory region names.  The provider will strip off the prefix before
       setting  the  endpoint name.  As a result, the addresses “fi_prefix1://my_node:my_service”
       and “fi_prefix2://my_node:my_service” would result in endpoints and regions  of  the  same
       name.   The  application  can  also  override the endpoint name after creating an endpoint
       using setname() without any address format restrictions.

       Msg flags The provider currently only supports the FI_REMOTE_CQ_DATA msg flag.

       MR registration mode The provider implements FI_MR_VIRT_ADDR memory mode.

       Atomic operations The provider supports all combinations of  datatype  and  operations  as
       long as the message is less than 4096 bytes (or 2048 for compare operations).

DSA

       Intel  Data  Streaming  Accelerator  (DSA)  is  an  integrated  accelerator  in Intel Xeon
       processors starting with Sapphire Rapids generation.  One of the capabilities of DSA is to
       offload  memory  copy operations from the CPU.  A system may have one or more DSA devices.
       Each DSA device may have one or more work queues.  The  DSA  specification  can  be  found
       here.

       The  SAR  protocol  of  SHM provider is enabled to take advantage of DSA to offload memory
       copy operations into and out of SAR buffers in  shared  memory  regions.   To  fully  take
       advantage   of   the   DSA  offload  capability,  memory  copy  operations  are  performed
       asynchronously.  Copy initiator thread constructs the DSA commands  and  submits  to  work
       queues.   A  copy  operation  may  consists  of more than one DSA commands.  In such case,
       commands are spread across all available work queues in round robin fashion.  The progress
       thread checks for DSA command completions.  If the copy command successfully completes, it
       then notifies the peer to consume the data.   If  DSA  encountered  a  page  fault  during
       command  execution,  the page fault is reported via completion records.  In such case, the
       progress thread accesses the page to resolve the page  fault  and  resubmits  the  command
       after  adjusting  for  partial  completions.   One  of  the benefits of making memory copy
       operations asynchronous is that now data transfers between different target endpoints  can
       be initiated in parallel.  Use of Intel DSA in SAR protocol is disabled by default and can
       be  enabled  using  an  environment  variable.   Note   that   CMA   must   be   disabled,
       e.g. FI_SHM_DISABLE_CMA=0,  in  order  for  DSA  to  be  used.  See the RUNTIME PARAMETERS
       section.

       Compiling with DSA capabilities depends on the accel-config library  which  can  be  found
       here.  Running with DSA requires using Linux Kernel 5.19.0-rc3 or later.

       DSA  devices  need  to  be  setup  just  once  before  runtime.   This  configuration file
       (https://github.com/intel/idxd-config/blob/stable/contrib/configs/os_profile.conf) can  be
       used as a template with accel-config utility to configure the DSA devices.

LIMITATIONS

       The  SHM  provider  has  hard-coded maximums for supported queue sizes and data transfers.
       These values are reflected in the related fabric attribute structures

       EPs must be bound to both RX and TX CQs.

       No support for counters.

RUNTIME PARAMETERS

       The shm provider checks for the following environment variables:

       FI_SHM_SAR_THRESHOLD
              Maximum message size to use segmentation protocol before switching  to  mmap  (only
              valid when CMA is not available).  Default: SIZE_MAX (18446744073709551615)

       FI_SHM_TX_SIZE
              Maximum number of outstanding tx operations.  Default 1024

       FI_SHM_RX_SIZE
              Maximum number of outstanding rx operations.  Default 1024

       FI_SHM_DISABLE_CMA
              Manually disables CMA.  Default false

       FI_SHM_USE_DSA_SAR
              Enables memory copy offload to Intel DSA in SAR protocol.  Default false

       FI_SHM_ENABLE_DSA_PAGE_TOUCH
              Enables  CPU  touching  of  memory  pages in a DSA command descriptor when the page
              fault is reported, so that there is valid address  translation  for  the  remaining
              addresses  in  the  command.   This minimizes DSA page faults.  Default false # SEE
              ALSO

       fabric(7), fi_provider(7), fi_getinfo(3)

AUTHORS

       OpenFabrics.