Provided by: perftest_24.01.0+0.38-1build2_amd64 bug

NAME

       ib_write_bw, ib_read_bw, ib_send_bw, ib_atomic_bw, ib_write_lat, ib_read_lat, ib_send_lat,
       ib_atomic_lat,      raw_ethernet_bw,       raw_ethernet_lat,       raw_ethernet_burst_lat,
       raw_ethernet_fs_rate - benchmarks for various types of infinabnd performance

DESCRIPTION

       Perftest is a package that includes various benchmarks that measures
              different  metrics  &  verbs  performance  which include many different options and
              modes.

   RUNNING TESTS
       Server:
               ./<test name> <options>

       Client:
               ./<test name> <options> <server IP address>

       Examples:
               1- Running bidirectional bandwidth test  using  Write  verb  for  5  seconds  with
              8388608 as a message size and 3 qps:
                  Server: ./ib_write_bw -s 8388608 -b -D 5 -q 3
                  Client: ./ib_write_bw -s 8388608 -b -D 5 -q 3 1.1.1.2

               2-  Running  latency test using Read verb for 5000 iterations with 32 as a message
              size:
                  Server: ./ib_read_lat -s 32 -n 5000
                  Client: ./ib_read_lat -s 32 -n 5000 192.168.0.1

   IMPORTANT NOTES
               1- The options that specific to modes in perftest must be the same for both server
       and client.
               2- Perftest applications may need to be ran with sudo when running from non root.
               3- Perftest applications usually installed to the /usr/bin/.
               4-  Perftest  may  print  some  failures with syndroms to the stderr, perftest get
              those errors from rdma-core.

OPTIONS

       -h, --help
               Lists the available options to the screen.

       -a, --all
               Run sizes from 2 till 2^23.
               Not relevant for Atomic and RawEth.

       -A, --atomic_type=<type>
               Type   of   atomic   operation    from    {CMP_AND_SWAP,FETCH_AND_ADD}    (default
              FETCH_AND_ADD).
               Relevant only for Atomic.

       -b, --bidirectional
               Measure bidirectional bandwidth (default unidirectional).
               Relevant only for BW.

       -c, --connection=<RC/XRC/UC/UD/DC/SRD>
               Connection type RC/XRC/UC/UD/DC/SRD (default RC).
               UD relevant only for Send verb.
               SRD relevant only for Read, Write and Send verbs.
               UC relevant only for Write and Send verbs.
               Not relevant for RawEth.

       --log_dci_streams=<log_num_dci_stream_channels> (default 0)
               Run DC initiator as DCS instead of DCI with <log_num dci_stream_channels>.
               Not relevant for RawEth.
               System support required.

       --log_active_dci_streams=<log_num_active_dci_stream_channels>                     (default
       log_num_dci_stream_channels)
               Not relevant for RawEth.
               System support required.

       --aes_xts
               Runs traffic with AES_XTS feature (encryption).
               Not relevant for RawEth and Write latency.
               System support required.

       --encrypt_on_tx
               Runs traffic with encryption on tx (default decryption on tx).
               Not relevant for RawEth and Write latency.
               System support required.

       --sig_before
               Puts signature on data before encrypting it (default after).
               Not relevant for RawEth and Write latency.
               System support required.

       --aes_block_size=<512,520,4048,4096,4160> (default 512)
               Not relevant for RawEth and Write latency.
               System support required.

       --data_enc_keys_number=<number of data encryption keys> (default 1)
               Not relevant for RawEth and Write latency.
               System support required.

       --kek_path <path to the key encryption key file>
               Not relevant for RawEth and Write latency.
               System support required.

       --credentials_path <path to the credentials file>
               Not relevant for RawEth and Write latency.
               System support required.

       --data_enc_key_app_path <path to the data encryption key app>
               Not relevant for RawEth and Write latency.
               System support required.

       -C, --report-cycles
               Report times in cpu cycle units (default microseconds).
               Relevant only for latency.

       -d, --ib-dev=<dev>
               Use IB device <dev> (default first device found).

       -D, --duration
               Run test for a customized period of seconds.

       -e, --events
               Sleep on CQ events (default poll).
               Not relevant for Write and RawEth.

       -X, --vector=<completion vector>
               Set <completion vector> used for events.
               Not relevant for Write and RawEth.

       -f, --margin
               measure results within margins. (default=2sec).

       -F, --CPU-freq
               Do not show a warning even if cpufreq_ondemand module is loaded, and  cpu-freq  is
              not on max.

       -g, --mcg
               Send messages to multicast group with 1 QP attached to it.
               When   there   is   no   multicast   gid  specified,  a  default  IPv6  typed  gid
              '255:1:0:0:0:2:201:133:0:0:0:0:0:0:0:0' will be used.
               Relevant only for send non fsRate.

       -H, --report-histogram
               Print out all results (default print summary only).
               Relevant only for latency and raw_ethernet_fs_rate.

       -i, --ib-port=<port>
               Use port <port> of IB device (default 1).

       -I, --inline_size=<size>
               Max size of message to be sent in inline.
               Not relevant for Read and Atomic.

       -l, --post_list=<list size>
               Post list of send WQEs of <list size> size (instead of single post).
               Relevant only for BW and raw_ethernet_burst_lat.

       --recv_post_list=<list size>
               Post list of receive WQEs of <list size> size (instead of single post).
               Relevant only for BW and raw_ethernet_burst_lat.

       -L, --hop_limit=<hop_limit>
               Set hop limit value (ttl for IPv4 RawEth QP). Values 0-255 (default 64).
               Relevant only for RawEth
               Not relevant for raw_ethernet_fs_rate.

       -m, --mtu=<mtu>
               MTU size : 64 - 9600  (default port mtu) for RawEth else 256 - 4096.
               Not relevant for raw_ethernet_fs_rate.

       -M, --MGID=<multicast_gid>
               In multicast, uses <multicast_gid> as the group MGID.
               <multicast_gid> can be either decimal or  hexadecimal,  e.g.  regarding  the  IPv4
              224.0.0.30 :
               Decimal:        0:0:0:0:0:0:0:0:0:0:255:255:224:0:0:30        ,       Hexadecimal:
              0:0:0:0:0:0:0:0:0:0:0xff:0xff:0xe0:0:0:0x1e
               Relevant only for send non fsRate.

       -n, --iters=<iters>
               Number of exchanges (at least 5, default for write 5000 else 1000 ).

       -N, --noPeak
               Cancel peak-bw calculation (default with peak up to iters=20000).
               Relevant only for bandwidth.

       -o, --outs=<num>
               Relevant only for Read and Atomic.

       -O, --dualport
               Run test in dual-port mode.
               Not relevant for RawEth.
               Relevant only for bandwidth.
               System support required.

       -p, --port=<port>
               Listen on/connect to port <port> (default 18515).

       -q, --qp=<num of qp's>
               Num of qp's(default 1).
               Relevant only for bandwidth.

       -Q, --cq-mod
               Generate Cqe only after <--cq-mod> completion.
               Relevant only for bandwidth.

       -r, --rx-depth=<dep>
               Rx queue size (default 512), if using srq, rx-depth controls max-wr  size  of  the
              srq.
               Relevant only for send non fsRate.

       -R, --rdma_cm
               Connect QPs with rdma_cm and run test on those QPs.
               Not relevant for RawEth.

       -s, --size=<size>
               Size of message to exchange (default 65536 for bw, for lat 2).
               Not relevant for Atomic.

       -S, --sl=<sl>
               SL (default 0).
               Not relevant for raw_ethernet_fs_rate.

       -t, --tx-depth=<dep>
               Size of tx queue (default 128 for bw else 1).
               Relevant only for bw and raw_ethernet_burst_lat.

       -T, --tos=<tos value>
               Set <tos_value> to RDMA-CM QPs. available only with -R flag. values 0-256 (default
              off).
               Not relevant for RawEth

       -u, --qp-timeout=<timeout>
               QP timeout, timeout value is 4 usec * 2 ^(timeout), default 14.

       -U, --report-unsorted
               (implies -H) print out unsorted results (default sorted).
               Relevant only for latency and raw_ethernet_burst_lat and raw_ethernet_fs_rate.

       -V, --version
               Display perftest version number.

       -W, --report-counters=<list of counter names>
               Report          performance           counter           change           (example:
              counters/port_xmit_data,hw_counters/out_of_buffer).

       -x, --gid-index=<index>
               Test uses GID with GID index.
               Not relevant for RawEth.

       -z, --comm_rdma_cm
               Communicate with rdma_cm module to exchange data - use regular QPs.
               Not relevant for RawEth.

       --out_json
               Save the report in a json file.

       --out_json_file=<file>
               Name  of  the  report  json  file.  (Default:  "perftest_out.json"  in the working
              directory).

       --cpu_util
               Show CPU Utilization in report, valid only in Duration mode.

       --dlid
               Set a Destination LID instead of getting it from the other side.
               Not relevant for raw_ethernet_fs_rate.

       --dont_xchg_versions
               Do not exchange versions and MTU with other side.
               Not relevant for RawEth.

       --force-link=<value>
               Force the link(s) to a specific type: IB or Ethernet.
               Not relevant for raw_ethernet_fs_rate.

       --use-srq
               Use a Shared Receive Queue. --rx-depth controls max-wr size of the SRQ.
               Relevant only for Send.

       --ipv6
               Use IPv6 GID. Default is IPv4.
               Not relevant for RawEth.

       --ipv6-addr=<IPv6>
               Use IPv6 address for parameters negotiation. Default is IPv4.
               Not relevant for RawEth.

       --bind_source_ip
               Source IP of the interface used for connection  establishment.  By  default  taken
              from routing table.
               Not relevant for RawEth.

       --latency_gap=<delay_time>
               delay time between each post send.
               Relevant only for latency.

       --mmap=file
               Use an mmap'd file as the buffer for testing P2P transfers.
               Not relevant for RawEth.

       --mmap-offset=<offset>
               The mmap offset.
               Not relevant for RawEth.

       --mr_per_qp
               Create memory region for each qp.
               Relevant only for bandwidth.

       --odp
               Use On Demand Paging instead of Memory Registration.
               System support required.

       --output=<units>
               Set verbosity output level: bandwidth , message_rate, latency.
               Latency measurement is Average calculation.
               bw (bandwidth / message_rate), latency (latency).

       --payload_file_path=<payload_txt_file_path>
               Set the payload by passing a txt file containing a pattern in the next form(little
              endian): '0xaaaaaaaa, 0xbbbbbbbb, ...
               Not relevant for RawEth and Write latency.

       --use_old_post_send
               Use old post send flow (ibv_post_send).

       --perform_warm_up
               Perform some iterations before start  measuring  in  order  to  warming-up  memory
              cache.
               Not relevant for raw_ethernet_fs_rate.

       --pkey_index=<pkey index>
               PKey index to use for QP.
               Not relevant for raw_ethernet_fs_rate.

       --report-both
               Report RX & TX results separately on Bidirectional BW tests.
               Relevant only for bidirectional bandwidth.

       --report_gbits
               Report Max/Average BW of test in Gbit/sec (instead of MiB/sec).
               Relevant only for bandwidth.

       --report-per-port
               Report BW data on both ports when running Dualport and Duration mode.
               Not relevant for RawEth.
               System support required.

       --reversed
               Reverse traffic direction - Server send to client.

       --run_infinitely
               Run test forever, print results every <duration> seconds.

       --retry_count=<value>
               Set retry count value in rdma_cm mode.
               Relevant only for rdma_cm mode.
               Not relevant for RawEth.

       --tclass=<value>
               Set the Traffic Class in GRH (if GRH is in use).
               Not relevant for raw_ethernet_fs_rate.

       --use-null-mr
               Allocate a null memory region for the client with ibv_alloc_null_mr(3)

       --use_cuda=<cuda device id>
               Use CUDA specific device for GPUDirect RDMA testing.
               Not relevant for raw_ethernet_fs_rate.
               System support required.

       --use_cuda_bus_id=<cuda full BUS id>
               Use  CUDA  specific  device,  based  on  its full PCIe address, for GPUDirect RDMA
              testing.
               Not relevant for raw_ethernet_fs_rate.
               System support required.

       --use_cuda_dmabuf
               Use CUDA DMA-BUF for GPUDirect RDMA testing.
               Not relevant for raw_ethernet_fs_rate.
               System support required.

       --use_hl=<hl device id>
               Use HabanaLabs specific device for HW accelerator direct RDMA testing.
               System support required.

       --use_neuron=<logical neuron core id>
               Use Neuron specific device for HW accelerator direct RDMA testing.
               System support required.

       --use_neuron_dmabuf
               Use Neuron DMA-BUF for HW accelerator direct RDMA testing.
               System support required.

       --use_rocm=<rocm device id>
               Use selected ROCm device for GPUDirect RDMA testing.
               Not relevant for raw_ethernet_fs_rate.
               System support required.

       --use_hugepages
               Use Hugepages instead of contig, memalign allocations.
               Not relevant for raw_ethernet_fs_rate.

       --wait_destroy=<seconds>
               Wait <seconds> before destroying allocated resources (QP/CQ/PD/MR..).
               Relevant only for bandwidth and raw_ethernet_burst_lat.

       --disable_pcie_relaxed
               Disable PCIe relaxed ordering.
               Relevant only for bandwidth and raw_ethernet_burst_lat.
               System support required.

       --burst_size=<size>
               Set the amount of messages to send in a burst when using rate limiter.
               Relevant only for bandwidth and raw_ethernet_burst_lat.

       --typical_pkt_size=<bytes>
               Set the size of packet to send in a burst. Only supports PP rate limiter.
               Relevant only for bandwidth and raw_ethernet_burst_lat.

       --rate_limit=<rate>
               Set the maximum rate of sent packages. default unit is [Gbps]. use --rate_units to
              change that.
               Relevant only for bandwidth and raw_ethernet_burst_lat.

       --rate_units=<units>
               [Mgp]  Set  the units for rate limit to MiBps (M), Gbps (g) or pps (p). default is
              Gbps (g).
               Relevant only for bandwidth and raw_ethernet_burst_lat.

       --rate_limit_type=<type>
               [HW/SW/PP] Limit the QP's by HW, PP or by SW. Disabled by default. When rate_limit
              is not specified HW limit is Default.
               Relevant only for bandwidth and raw_ethernet_burst_lat.

       --use_ooo
               Use out of order data placement.
               System support required.

       --write_with_imm
               Use write-with-immediate verb instead of write.
               Write tests only.

   RawEth only options
       -B, --source_mac
               Source MAC address by this format XX:XX:XX:XX:XX:XX **MUST** be entered.

       -E, --dest_mac
               Destination MAC address by this format XX:XX:XX:XX:XX:XX **MUST** be entered.

       -G, --use_rss
               Use  RSS  on  server  side. need to open 2^x qps (using -q flag. default is -q 2).
              open 2^x clients that transmit to this server.

       -J, --dest_ip
               Destination ip address by this format X.X.X.X for IPv4  or  X:X:X:X:X:X  for  IPv6
              (using to send packets with IP header).
               System support required for IPv6.

       -j, --source_ip
               Source  ip  address by this format X.X.X.X for IPv4 or X:X:X:X:X:X for IPv6 (using
              to send packets with IP header).
               System support required for IPv6.

       -K, --dest_port
               Destination port number (using to send packets with UDP header as default, or  you
              can use --tcp flag to send TCP Header).

       -k, --source_port
               Source  port  number (using to send packets with UDP header as default, or you can
              use --tcp flag to send TCP Header).

       -Y, --ethertype
               Ethertype value in the ethernet frame by this format 0xXXXX.

       -Z, --server
               Choose server side for the current machine (--server/--client must be selected ).

       --vlan_en
               Insert vlan tag in ethernet header.

       --vlan_pcp
               Specify vlan_pcp value for vlan tag, 0~7. 8  means  different  vlan_pcp  for  each
              packet.

       -P, --client
               Choose client side for the current machine (--server/--client must be selected).
               Not relevant for raw_ethernet_fs_rate.

       -v, --mac_fwd
               Run mac forwarding test.
               Not relevant for raw_ethernet_fs_rate.

       --flows
               Set number of TCP/UDP flows, starting from <src_port, dst_port>.
               Not relevant for raw_ethernet_fs_rate.

       --flows_burst
               Set number of burst size per TCP/UDP flow.
               Not relevant for raw_ethernet_fs_rate.

       --promiscuous
               Run promiscuous mode.
               Not relevant for raw_ethernet_fs_rate.

       --reply_every
               In latency test, receiver pong after number of received pings.
               Not relevant for raw_ethernet_fs_rate.

       --sniffer
               Run sniffer mode.
               Not relevant for raw_ethernet_fs_rate.
               System support required.

       --flow_label
               IPv6 flow label.
               Not relevant for raw_ethernet_fs_rate.

       --tcp
               Send TCP Packets. must include IP and Ports information.

       --raw_ipv6
               Send IPv6 Packets.
               System support required.

       --raw_mcast.
               Relevant only for bandwidth.

AUTHORS

       Hassan Khadour <hkhadour@nvidia.com>

       Talat Batheesh <talatb@nvidia.com>

                                                                                      perftest(1)