Provided by: freebsd-manpages_11.1-3_all bug

NAME

     PCBGROUP — Distributed Protocol Control Block Groups

SYNOPSIS

     options PCBGROUP

     #include <sys/param.h>
     #include <netinet/in.h>
     #include <netinet/in_pcb.h>

     void
     in_pcbgroup_init(struct inpcbinfo *pcbinfo, u_int hashfields, int hash_nelements);

     void
     in_pcbgroup_destroy(struct inpcbinfo *pcbinfo);

     struct inpcbgroup *
     in_pcbgroup_byhash(struct inpcbinfo *pcbinfo, u_int hashtype, uint32_t hash);

     struct inpcbgroup *
     in_pcbgroup_byinpcb(struct inpcb *inp);

     void
     in_pcbgroup_update(struct inpcb *inp);

     void
     in_pcbgroup_update_mbuf(struct inpcb *inp, struct mbuf *m);

     void
     in_pcbgroup_remove(struct inpcb *inp);

     int
     in_pcbgroup_enabled(struct inpcbinfo *pcbinfo);

     #include <netinet6/in6_pcb.h>

     struct inpcbgroup *
     in6_pcbgroup_byhash(struct inpcbinfo *pcbinfo, u_int hashtype, uint32_t hash);

DESCRIPTION

     This implementation introduces notions of affinity for connections and distribute work so as
     to reduce lock contention, with hardware work distribution strategies such as RSS.  In this
     construction, connection groups supplement, rather than replace, existing reservation tables
     for protocol 4-tuples, offering CPU-affine lookup tables with minimal cache line migration
     and lock contention during steady state operation.

     Internet protocols like UDP and TCP register to use connection groups by providing an
     ipi_hashfields value other than IPI_HASHFIELDS_NONE.  This indicates to the connection group
     code whether a 2-tuple or 4-tuple is used as an argument to hashes that assign a connection
     to a particular group.  This must be aligned with any hardware-offloaded distribution model,
     such as RSS or similar approaches taken in embedded network boards.  Wildcard sockets
     require special handling, as in Willmann 2006, and are shared between connection groups
     while being protected by group-local locks.  Connection establishment and teardown can be
     signficantly more expensive than without connection groups, but that steady-state processing
     can be significantly faster.

     Enabling PCBGROUP in the kernel only provides the infrastructure required to create and
     manage multiple PCB groups.  An implementation needs to fill in a few functions to provide
     PCB group hash information in order for PCBs to be placed in a PCB group.

   Operation
     By default, each PCB info block (struct pcbinfo) has a single hash for all PCB entries for
     the given protocol with a single lock protecting it.  This can be a significant source of
     lock contention on SMP hardware.  When a PCBGROUP is created, an array of separate hash
     tables are created, each with its own lock.  A separate table for wildcard PCBs is provided.
     By default, a PCBGROUP table is created for each available CPU.  The PCBGROUP code attempts
     to calculate a hash value from the given PCB or mbuf when looking up a PCBGROUP.  While
     processing a received frame, in_pcbgroup_byhash() can be used in conjunction with either a
     hardware-provided hash value (eg the RSS(9) calculated hash value provided by some NICs) or
     a software-provided hash value in order to choose a PCBGROUP table to query.  A single table
     lock is held while performing a wildcard match.  However, all of the table locks are
     acquired before modifying the wildcard table.  The PCBGROUP tables operate in conjunction
     with the normal single PCB list in a PCB info block.  Thus, inserting and removing a PCB
     will still incur the same costs as without PCBGROUP.  A protocol which uses PCBGROUP should
     fall back to the normal PCB list lookup if a call to the PCBGROUP layer does not yield a
     lookup hit.

   Usage
     Initialize a PCBGROUP in a PCB info block (struct pcbinfo) by calling in_pcbgroup_init().

     Add a connection to a PCBGROUP with in_pcbgroup_update().  Connections are removed by with
     in_pcbgroup_remove().  These in turn will determine which PCBGROUP bucket the given PCB is
     placed into and calculate the hash value appropriately.

     Wildcard PCBs are hashed differently and placed in a single wildcard PCB list.  If RSS(9) is
     enabled and in use, RSS-aware wildcard PCBs are placed in a single PCBGROUP based on RSS
     information.  Protocols may look up the PCB entry in a PCBGROUP by using the lookup
     functions in_pcbgroup_byhash() and in_pcbgroup_byinpcb().

IMPLEMENTATION NOTES

     The PCB code in sys/netinet and sys/netinet6 is aware of PCBGROUP and will call into the
     PCBGROUP code to do PCBGROUP assignment and lookup, preferring a PCBGROUP lookup to the
     default global PCB info table.

     An implementor wishing to experiment or modify the PCBGROUP assignment should modify this
     set of functions:

           in_pcbgroup_getbucket() and in6_pcbgroup_getbucket()
                     Map a given 32 bit hash value to a PCBGROUP.  By default this is hash %
                     number_of_pcbgroups.  However, this distribution may not align with NIC
                     receive queues or the netisr(9) configuration.

           in_pcbgroup_byhash() and in6_pcbgroup_byhash()
                     Map a 32 bit hash value and a hash type identifier to a PCBGROUP.  By
                     default, this simply returns NULL.  This function is used by the mbuf(9)
                     receive path in sys/netinet/in_pcb.c to map an mbuf to a PCBGROUP.

           in_pcbgroup_bytuple() and in6_pcbgroup_bytuple()
                     Map the source and destination address and port details to a PCBGROUP.  By
                     default, this does a very simple XOR hash.  This function is used by both
                     the PCB lookup code and as a fallback in the mbuf(9) receive path in
                     sys/netinet/in_pcb.c.

SEE ALSO

     mbuf(9), netisr(9), RSS(9)

     Paul Willmann, Scott Rixner, and Alan L. Cox, “An Evaluation of Network Stack
     Parallelization Strategies in Modern Operating Systems”, 2006 USENIX Annual Technical
     Conference, http://www.ece.rice.edu/~willmann/pubs/paranet_usenix.pdf, 2006.

HISTORY

     PCBGROUP first appeared in FreeBSD 9.0.

AUTHORS

     The PCBGROUP implementation was written by Robert N. M. Watson <rwatson@FreeBSD.org> under
     contract to Juniper Networks, Inc.

     This manual page written by Adrian Chadd <adrian@FreeBSD.org>.

NOTES

     The RSS(9) implementation currently uses #ifdef blocks to tie into PCBGROUP.  This is a sign
     that a more abstract programming API is needed.

     There is currently no support for re-balancing the PCBGROUP assignment, nor is there any
     support for overriding which PCBGROUP a socket/PCB should be in.

     No statistics are kept to indicate how often PCBGROUP lookups succeed or fail.