Provided by: ovn-common_2.5.9-0ubuntu0.16.04.3_amd64 bug

NAME

       ovn-architecture - Open Virtual Network architecture

DESCRIPTION

       OVN,  the  Open  Virtual Network, is a system to support virtual network abstraction.  OVN
       complements the existing capabilities of OVS to add native  support  for  virtual  network
       abstractions,  such  as  virtual L2 and L3 overlays and security groups.  Services such as
       DHCP are also desirable features.   Just  like  OVS,  OVN’s  design  goal  is  to  have  a
       production-quality implementation that can operate at significant scale.

       An OVN deployment consists of several components:

              •      A  Cloud  Management  System  (CMS), which is OVN’s ultimate client (via its
                     users and administrators).   OVN  integration  requires  installing  a  CMS-
                     specific  plugin  and  related  software (see below).  OVN initially targets
                     OpenStack as CMS.

                     We generally speak of ``the’’ CMS, but one can imagine  scenarios  in  which
                     multiple CMSes manage different parts of an OVN deployment.

              •      An OVN Database physical or virtual node (or, eventually, cluster) installed
                     in a central location.

              •      One or more (usually many) hypervisors.  Hypervisors must run  Open  vSwitch
                     and  implement  the  interface  described  in IntegrationGuide.md in the OVS
                     source  tree.   Any  hypervisor  platform  supported  by  Open  vSwitch   is
                     acceptable.

              •      Zero  or  more  gateways.   A gateway extends a tunnel-based logical network
                     into a  physical  network  by  bidirectionally  forwarding  packets  between
                     tunnels  and a physical Ethernet port.  This allows non-virtualized machines
                     to participate in logical networks.  A gateway may be  a  physical  host,  a
                     virtual  machine, or an ASIC-based hardware switch that supports the vtep(5)
                     schema.  (Support for the latter will come later in OVN implementation.)

                     Hypervisors and gateways are together called transport node or chassis.

       The diagram below shows how the major components of OVN  and  related  software  interact.
       Starting at the top of the diagram, we have:

              •      The Cloud Management System, as defined above.

              •      The  OVN/CMS  Plugin is the component of the CMS that interfaces to OVN.  In
                     OpenStack, this is a Neutron  plugin.   The  plugin’s  main  purpose  is  to
                     translate  the  CMS’s notion of logical network configuration, stored in the
                     CMS’s configuration database in a CMS-specific format, into an  intermediate
                     representation understood by OVN.

                     This  component  is  necessarily  CMS-specific,  so a new plugin needs to be
                     developed for each CMS that is integrated with OVN.  All of  the  components
                     below this one in the diagram are CMS-independent.

              •      The  OVN  Northbound  Database  receives  the intermediate representation of
                     logical network configuration  passed  down  by  the  OVN/CMS  Plugin.   The
                     database  schema is meant to be ``impedance matched’’ with the concepts used
                     in a CMS, so that it directly supports notions of logical switches, routers,
                     ACLs, and so on.  See ovn-nb(5) for details.

                     The  OVN  Northbound Database has only two clients: the OVN/CMS Plugin above
                     it and ovn-northd below it.

              •      ovn-northd(8) connects to the OVN Northbound Database above it and  the  OVN
                     Southbound   Database   below   it.    It  translates  the  logical  network
                     configuration in terms of conventional network concepts, taken from the  OVN
                     Northbound  Database,  into  logical  datapath  flows  in the OVN Southbound
                     Database below it.

              •      The OVN Southbound Database is the center of the system.   Its  clients  are
                     ovn-northd(8)  above  it and ovn-controller(8) on every transport node below
                     it.

                     The OVN Southbound Database contains three kinds of data:  Physical  Network
                     (PN)  tables  that  specify how to reach hypervisor and other nodes, Logical
                     Network (LN) tables that describe the logical network in terms of  ``logical
                     datapath  flows,’’  and Binding tables that link logical network components’
                     locations to the physical network.  The  hypervisors  populate  the  PN  and
                     Port_Binding tables, whereas ovn-northd(8) populates the LN tables.

                     OVN  Southbound Database performance must scale with the number of transport
                     nodes.  This  will  likely  require  some  work  on  ovsdb-server(1)  as  we
                     encounter bottlenecks.  Clustering for availability may be needed.

       The remaining components are replicated onto each hypervisor:

              •      ovn-controller(8)  is  OVN’s  agent on each hypervisor and software gateway.
                     Northbound, it connects to the OVN Southbound Database to  learn  about  OVN
                     configuration and status and to populate the PN table and the Chassis column
                     in Binding table with the hypervisor’s status.  Southbound, it  connects  to
                     ovs-vswitchd(8) as an OpenFlow controller, for control over network traffic,
                     and to the local ovsdb-server(1) to allow it to  monitor  and  control  Open
                     vSwitch configuration.

              •      ovs-vswitchd(8)  and  ovsdb-server(1)  are  conventional  components of Open
                     vSwitch.

                                         CMS
                                          |
                                          |
                              +-----------|-----------+
                              |           |           |
                              |     OVN/CMS Plugin    |
                              |           |           |
                              |           |           |
                              |   OVN Northbound DB   |
                              |           |           |
                              |           |           |
                              |       ovn-northd      |
                              |           |           |
                              +-----------|-----------+
                                          |
                                          |
                                +-------------------+
                                | OVN Southbound DB |
                                +-------------------+
                                          |
                                          |
                       +------------------+------------------+
                       |                  |                  |
         HV 1          |                  |    HV n          |
       +---------------|---------------+  .  +---------------|---------------+
       |               |               |  .  |               |               |
       |        ovn-controller         |  .  |        ovn-controller         |
       |         |          |          |  .  |         |          |          |
       |         |          |          |     |         |          |          |
       |  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
       |                               |     |                               |
       +-------------------------------+     +-------------------------------+

   Chassis Setup
       Each chassis in an OVN deployment must be configured with an Open vSwitch bridge dedicated
       for  OVN’s  use,  called  the  integration bridge.  System startup scripts may create this
       bridge prior to starting ovn-controller if desired.  If this bridge does  not  exist  when
       ovn-controller  starts,  it  will  be created automatically with the default configuration
       suggested below.  The ports on the integration bridge include:

              •      On any chassis, tunnel ports that  OVN  uses  to  maintain  logical  network
                     connectivity.  ovn-controller adds, updates, and removes these tunnel ports.

              •      On  a hypervisor, any VIFs that are to be attached to logical networks.  The
                     hypervisor  itself,  or  the  integration  between  Open  vSwitch  and   the
                     hypervisor  (described in IntegrationGuide.md) takes care of this.  (This is
                     not part of OVN or new to OVN; this is pre-existing  integration  work  that
                     has already been done on hypervisors that support OVS.)

              •      On  a  gateway,  the  physical  port  used for logical network connectivity.
                     System startup scripts add  this  port  to  the  bridge  prior  to  starting
                     ovn-controller.   This  can  be a patch port to another bridge, instead of a
                     physical port, in more sophisticated setups.

       Other ports should not be attached to the integration  bridge.   In  particular,  physical
       ports  attached  to  the underlay network (as opposed to gateway ports, which are physical
       ports attached to logical networks) must  not  be  attached  to  the  integration  bridge.
       Underlay physical ports should instead be attached to a separate Open vSwitch bridge (they
       need not be attached to any bridge at all, in fact).

       The integration bridge should be configured as described below.  The  effect  of  each  of
       these settings is documented in ovs-vswitchd.conf.db(5):

              fail-mode=secure
                     Avoids   switching   packets   between   isolated  logical  networks  before
                     ovn-controller starts up.  See Controller Failure Settings  in  ovs-vsctl(8)
                     for more information.

              other-config:disable-in-band=true
                     Suppresses  in-band  control  flows for the integration bridge.  It would be
                     unusual for such  flows  to  show  up  anyway,  because  OVN  uses  a  local
                     controller (over a Unix domain socket) instead of a remote controller.  It’s
                     possible, however, for some other bridge in the same system to have  an  in-
                     band  remote controller, and in that case this suppresses the flows that in-
                     band control would ordinarily set up.  See In-Band Control in DESIGN.md  for
                     more information.

       The customary name for the integration bridge is br-int, but another name may be used.

   Logical Networks
       A  logical  network  implements  the  same  concepts  as  physical  networks, but they are
       insulated from the physical network with tunnels or  other  encapsulations.   This  allows
       logical  networks  to  have  separate  IP  and  other address spaces that overlap, without
       conflicting, with those used for physical networks.  Logical  network  topologies  can  be
       arranged without regard for the topologies of the physical networks on which they run.

       Logical network concepts in OVN include:

              •      Logical switches, the logical version of Ethernet switches.

              •      Logical  routers,  the  logical version of IP routers.  Logical switches and
                     routers can be connected into sophisticated topologies.

              •      Logical datapaths are the logical version of an  OpenFlow  switch.   Logical
                     switches and routers are both implemented as logical datapaths.

   Life Cycle of a VIF
       Tables  and  their  schemas presented in isolation are difficult to understand.  Here’s an
       example.

       A VIF on a hypervisor is a virtual  network  interface  attached  either  to  a  VM  or  a
       container  running  directly on that hypervisor (This is different from the interface of a
       container running inside a VM).

       The steps in this example refer often to details of the OVN and  OVN  Northbound  database
       schemas.   Please  see  ovn-sb(5) and ovn-nb(5), respectively, for the full story on these
       databases.

              1.
                A VIF’s life cycle begins when a CMS administrator creates a new  VIF  using  the
                CMS  user  interface  or API and adds it to a switch (one implemented by OVN as a
                logical  switch).   The  CMS  updates  its  own  configuration.   This   includes
                associating  unique,  persistent  identifier vif-id and Ethernet address mac with
                the VIF.

              2.
                The CMS plugin updates the OVN Northbound database to include  the  new  VIF,  by
                adding  a  row to the Logical_Port table.  In the new row, name is vif-id, mac is
                mac, switch points to the OVN logical switch’s Logical_Switch record,  and  other
                columns are initialized appropriately.

              3.
                ovn-northd  receives  the  OVN Northbound database update.  In turn, it makes the
                corresponding updates to the OVN Southbound database, by adding rows to  the  OVN
                Southbound  database  Logical_Flow table to reflect the new port, e.g. add a flow
                to recognize that packets destined to  the  new  port’s  MAC  address  should  be
                delivered  to  it,  and  update  the  flow  that delivers broadcast and multicast
                packets to include the new port.  It also creates a record in the  Binding  table
                and populates all its columns except the column that identifies the chassis.

              4.
                On  every hypervisor, ovn-controller receives the Logical_Flow table updates that
                ovn-northd made in the previous step.  As long as the VM that  owns  the  VIF  is
                powered  off,  ovn-controller  cannot do much; it cannot, for example, arrange to
                send packets to or receive packets  from  the  VIF,  because  the  VIF  does  not
                actually exist anywhere.

              5.
                Eventually,  a  user powers on the VM that owns the VIF.  On the hypervisor where
                the VM is powered on, the integration between the  hypervisor  and  Open  vSwitch
                (described in IntegrationGuide.md) adds the VIF to the OVN integration bridge and
                stores vif-id in external-ids:iface-id to  indicate  that  the  interface  is  an
                instantiation  of  the  new  VIF.  (None of this code is new in OVN; this is pre-
                existing integration work that has already been done on hypervisors that  support
                OVS.)

              6.
                On   the   hypervisor   where  the  VM  is  powered  on,  ovn-controller  notices
                external-ids:iface-id in the new Interface.  In response, it  updates  the  local
                hypervisor’s  OpenFlow  tables  so  that packets to and from the VIF are properly
                handled.  Afterward, in the OVN Southbound DB, it  updates  the  Binding  table’s
                chassis column for the row that links the logical port from external-ids:iface-id
                to the hypervisor.

              7.
                Some CMS systems, including OpenStack, fully start a VM only when its  networking
                is ready.  To support this, ovn-northd notices the chassis column updated for the
                row in Binding table and pushes this upward by updating the up column in the  OVN
                Northbound database’s Logical_Port table to indicate that the VIF is now up.  The
                CMS, if it uses this feature, can then react by allowing the  VM’s  execution  to
                proceed.

              8.
                On every hypervisor but the one where the VIF resides, ovn-controller notices the
                completely populated row in the Binding table.  This provides ovn-controller  the
                physical  location  of  the  logical  port, so each instance updates the OpenFlow
                tables of its switch (based on logical datapath flows in the OVN DB  Logical_Flow
                table) so that packets to and from the VIF can be properly handled via tunnels.

              9.
                Eventually,  a user powers off the VM that owns the VIF.  On the hypervisor where
                the VM was powered off, the VIF is deleted from the OVN integration bridge.

              10.
                On the hypervisor where the VM was powered off, ovn-controller notices  that  the
                VIF  was  deleted.   In  response,  it  removes the Chassis column content in the
                Binding table for the logical port.

              11.
                On every hypervisor, ovn-controller notices  the  empty  Chassis  column  in  the
                Binding  table’s  row  for  the  logical port.  This means that ovn-controller no
                longer knows the physical location of the logical port, so each instance  updates
                its OpenFlow table to reflect that.

              12.
                Eventually,  when  the  VIF  (or its entire VM) is no longer needed by anyone, an
                administrator deletes the VIF using the CMS  user  interface  or  API.   The  CMS
                updates its own configuration.

              13.
                The  CMS plugin removes the VIF from the OVN Northbound database, by deleting its
                row in the Logical_Port table.

              14.
                ovn-northd receives the OVN  Northbound  update  and  in  turn  updates  the  OVN
                Southbound  database  accordingly,  by removing or updating the rows from the OVN
                Southbound database Logical_Flow table and Binding table that were related to the
                now-destroyed VIF.

              15.
                On  every hypervisor, ovn-controller receives the Logical_Flow table updates that
                ovn-northd made in the previous step.  ovn-controller updates OpenFlow tables  to
                reflect  the  update,  although  there  may  not be much to do, since the VIF had
                already become unreachable when it was  removed  from  the  Binding  table  in  a
                previous step.

   Life Cycle of a Container Interface Inside a VM
       OVN  provides  virtual  network  abstractions  by converting information written in OVN_NB
       database to OpenFlow flows in each  hypervisor.   Secure  virtual  networking  for  multi-
       tenants can only be provided if OVN controller is the only entity that can modify flows in
       Open vSwitch.  When the Open vSwitch integration bridge resides in the hypervisor, it is a
       fair  assumption  to make that tenant workloads running inside VMs cannot make any changes
       to Open vSwitch flows.

       If the infrastructure provider trusts the applications inside the containers not to  break
       out and modify the Open vSwitch flows, then containers can be run in hypervisors.  This is
       also the case when containers are run inside the VMs and Open vSwitch  integration  bridge
       with  flows added by OVN controller resides in the same VM.  For both the above cases, the
       workflow is the same as explained with an example in the previous section ("Life Cycle  of
       a VIF").

       This section talks about the life cycle of a container interface (CIF) when containers are
       created in the VMs and the Open vSwitch integration bridge resides inside the  hypervisor.
       In  this  case, even if a container application breaks out, other tenants are not affected
       because the containers running inside the VMs cannot modify the flows in the Open  vSwitch
       integration bridge.

       When  multiple containers are created inside a VM, there are multiple CIFs associated with
       them.  The network traffic associated with these CIFs  need  to  reach  the  Open  vSwitch
       integration  bridge  running  in  the  hypervisor  for  OVN  to  support  virtual  network
       abstractions.  OVN should  also  be  able  to  distinguish  network  traffic  coming  from
       different CIFs.  There are two ways to distinguish network traffic of CIFs.

       One way is to provide one VIF for every CIF (1:1 model).  This means that there could be a
       lot of network devices in the hypervisor.  This would slow down OVS  because  of  all  the
       additional  CPU cycles needed for the management of all the VIFs.  It would also mean that
       the entity creating the containers in a VM should also be able to create the corresponding
       VIFs in the hypervisor.

       The second way is to provide a single VIF for all the CIFs (1:many model).  OVN could then
       distinguish network traffic coming from different CIFs via a tag written in every  packet.
       OVN uses this mechanism and uses VLAN as the tagging mechanism.

              1.
                A  CIF’s  life cycle begins when a container is spawned inside a VM by the either
                the same CMS that created the VM or  a  tenant  that  owns  that  VM  or  even  a
                container  Orchestration  System  that  is  different than the CMS that initially
                created the VM.  Whoever the entity is, it will need to know the vif-id  that  is
                associated  with  the  network  interface  of  the VM through which the container
                interface’s network traffic is expected to go through.  The entity  that  creates
                the container interface will also need to choose an unused VLAN inside that VM.

              2.
                The  container  spawning  entity (either directly or through the CMS that manages
                the underlying infrastructure) updates the OVN Northbound database to include the
                new  CIF, by adding a row to the Logical_Port table.  In the new row, name is any
                unique identifier, parent_name is the vif-id of the VM through  which  the  CIF’s
                network  traffic  is  expected  to  go  through  and the tag is the VLAN tag that
                identifies the network traffic of that CIF.

              3.
                ovn-northd receives the OVN Northbound database update.  In turn,  it  makes  the
                corresponding  updates  to the OVN Southbound database, by adding rows to the OVN
                Southbound database’s Logical_Flow table to reflect the  new  port  and  also  by
                creating a new row in the Binding table and populating all its columns except the
                column that identifies the chassis.

              4.
                On every hypervisor, ovn-controller subscribes to  the  changes  in  the  Binding
                table.   When  a  new  row  is  created  by  ovn-northd  that includes a value in
                parent_port column of Binding table, the ovn-controller in the  hypervisor  whose
                OVN  integration  bridge  has  that same value in vif-id in external-ids:iface-id
                updates the local hypervisor’s OpenFlow tables so that packets to  and  from  the
                VIF  with the particular VLAN tag are properly handled.  Afterward it updates the
                chassis column of the Binding to reflect the physical location.

              5.
                One can only start the application inside  the  container  after  the  underlying
                network is ready.  To support this, ovn-northd notices the updated chassis column
                in Binding table and updates the up  column  in  the  OVN  Northbound  database’s
                Logical_Port table to indicate that the CIF is now up.  The entity responsible to
                start the container application queries this value and starts the application.

              6.
                Eventually the entity that created and started  the  container,  stops  it.   The
                entity, through the CMS (or directly) deletes its row in the Logical_Port table.

              7.
                ovn-northd  receives  the  OVN  Northbound  update  and  in  turn updates the OVN
                Southbound database accordingly, by removing or updating the rows  from  the  OVN
                Southbound  database  Logical_Flow  table  that were related to the now-destroyed
                CIF.  It also deletes the row in the Binding table for that CIF.

              8.
                On every hypervisor, ovn-controller receives the Logical_Flow table updates  that
                ovn-northd  made in the previous step.  ovn-controller updates OpenFlow tables to
                reflect the update.

   Architectural Physical Life Cycle of a Packet
       This section describes how a packet travels from  one  virtual  machine  or  container  to
       another  through OVN.  This description focuses on the physical treatment of a packet; for
       a description of the logical life cycle of a packet,  please  refer  to  the  Logical_Flow
       table in ovn-sb(5).

       This section mentions several data and metadata fields, for clarity summarized here:

              tunnel key
                     When  OVN  encapsulates  a  packet  in Geneve or another tunnel, it attaches
                     extra data to  it  to  allow  the  receiving  OVN  instance  to  process  it
                     correctly.    This   takes  different  forms  depending  on  the  particular
                     encapsulation, but in each case we refer to it here as the  ``tunnel  key.’’
                     See Tunnel Encapsulations, below, for details.

              logical datapath field
                     A  field  that  denotes the logical datapath through which a packet is being
                     processed.  OVN uses the field that OpenFlow 1.1+ simply  (and  confusingly)
                     calls  ``metadata’’  to  store  the logical datapath.  (This field is passed
                     across tunnels as part of the tunnel key.)

              logical input port field
                     A field that denotes the logical port from  which  the  packet  entered  the
                     logical datapath.  OVN stores this in Nicira extension register number 6.

                     Geneve  and STT tunnels pass this field as part of the tunnel key.  Although
                     VXLAN tunnels do not explicitly carry a logical input port,  OVN  only  uses
                     VXLAN  to  communicate  with gateways that from OVN’s perspective consist of
                     only a single logical port, so that OVN can set the logical input port field
                     to this one on ingress to the OVN logical pipeline.

              logical output port field
                     A  field  that denotes the logical port from which the packet will leave the
                     logical datapath.  This is initialized to 0 at the beginning of the  logical
                     ingress pipeline.  OVN stores this in Nicira extension register number 7.

                     Geneve  and  STT  tunnels  pass this field as part of the tunnel key.  VXLAN
                     tunnels do not transmit the logical output port field.

              conntrack zone field
                     A field that denotes the connection tracking zone.  The value only has local
                     significance  and is not meaningful between chassis.  This is initialized to
                     0 at the beginning of the logical ingress  pipeline.   OVN  stores  this  in
                     Nicira extension register number 5.

              VLAN ID
                     The VLAN ID is used as an interface between OVN and containers nested inside
                     a VM (see Life Cycle of a container interface inside a VM, above,  for  more
                     information).

       Initially,  a  VM or container on the ingress hypervisor sends a packet on a port attached
       to the OVN integration bridge.  Then:

              1.
                OpenFlow table  0  performs  physical-to-logical  translation.   It  matches  the
                packet’s ingress port.  Its actions annotate the packet with logical metadata, by
                setting the logical datapath field to identify  the  logical  datapath  that  the
                packet  is  traversing  and  the logical input port field to identify the ingress
                port.  Then it resubmits to table 16 to enter the logical ingress pipeline.

                It’s possible that a single ingress physical port maps to multiple logical  ports
                with  a type of localnet. The logical datapath and logical input port fields will
                be reset and the packet will be resubmitted to table 16 multiple times.

                Packets that originate from a container nested within  a  VM  are  treated  in  a
                slightly  different way.  The originating container can be distinguished based on
                the  VIF-specific  VLAN  ID,  so  the   physical-to-logical   translation   flows
                additionally  match  on VLAN ID and the actions strip the VLAN header.  Following
                this step, OVN treats packets from containers just like any other packets.

                Table 0 also processes packets that arrive from other chassis.  It  distinguishes
                them from other packets by ingress port, which is a tunnel.  As with packets just
                entering the OVN pipeline,  the  actions  annotate  these  packets  with  logical
                datapath  and  logical  ingress  port metadata.  In addition, the actions set the
                logical output port field, which is available because  in  OVN  tunneling  occurs
                after  the  logical  output port is known.  These three pieces of information are
                obtained from the tunnel encapsulation metadata (see  Tunnel  Encapsulations  for
                encoding  details).   Then  the actions resubmit to table 33 to enter the logical
                egress pipeline.

              2.
                OpenFlow tables 16 through 31 execute  the  logical  ingress  pipeline  from  the
                Logical_Flow  table  in  the OVN Southbound database.  These tables are expressed
                entirely in terms of logical concepts like logical ports and  logical  datapaths.
                A  big part of ovn-controller’s job is to translate them into equivalent OpenFlow
                (in particular it translates the table numbers: Logical_Flow tables 0 through  15
                become  OpenFlow  tables 16 through 31).  For a given packet, the logical ingress
                pipeline eventually executes zero or more output actions:

                •      If the  pipeline  executes  no  output  actions  at  all,  the  packet  is
                       effectively dropped.

                •      Most   commonly,   the   pipeline   executes   one  output  action,  which
                       ovn-controller implements by resubmitting the packet to table 32.

                •      If the pipeline can execute more than one output action, then each one  is
                       separately  resubmitted  to  table  32.  This can be used to send multiple
                       copies of the packet to multiple ports.  (If the packet was  not  modified
                       between  the  output  actions,  and some of the copies are destined to the
                       same hypervisor, then using a logical multicast  output  port  would  save
                       bandwidth between hypervisors.)

              3.
                OpenFlow  tables 32 through 47 implement the output action in the logical ingress
                pipeline.  Specifically, table 32 handles packets to remote hypervisors, table 33
                handles  packets  to  the  local  hypervisor, and table 34 discards packets whose
                logical ingress and egress port are the same.

                Logical patch ports are a special case.   Logical  patch  ports  do  not  have  a
                physical  location  and effectively reside on every hypervisor.  Thus, flow table
                33, for output to ports on the local hypervisor, naturally implements  output  to
                unicast  logical  patch ports too.  However, applying the same logic to a logical
                patch port that is part of a logical multicast group yields  packet  duplication,
                because  each hypervisor that contains a logical port in the multicast group will
                also output the packet  to  the  logical  patch  port.   Thus,  multicast  groups
                implement output to logical patch ports in table 32.

                Each  flow  in table 32 matches on a logical output port for unicast or multicast
                logical ports that include a logical port on a remote  hypervisor.   Each  flow’s
                actions  implement  sending a packet to the port it matches.  For unicast logical
                output ports on remote hypervisors, the actions set the tunnel key to the correct
                value,  then send the packet on the tunnel port to the correct hypervisor.  (When
                the remote hypervisor receives the packet, table 0 there will recognize it  as  a
                tunneled  packet  and  pass  it along to table 33.)  For multicast logical output
                ports, the actions send one copy of the packet to each remote hypervisor, in  the
                same  way  as  for unicast destinations.  If a multicast group includes a logical
                port or ports on the local hypervisor, then its actions also  resubmit  to  table
                33.   Table  32 also includes a fallback flow that resubmits to table 33 if there
                is no other match.

                Flows in table 33 resemble those in table 32 but for logical  ports  that  reside
                locally  rather  than  remotely.   For  unicast logical output ports on the local
                hypervisor, the actions just resubmit to table 34.  For  multicast  output  ports
                that  include  one  or  more logical ports on the local hypervisor, for each such
                logical port P, the actions change the logical output port to P, then resubmit to
                table 34.

                Table  34  matches and drops packets for which the logical input and output ports
                are the same.  It resubmits other packets to table 48.

              4.
                OpenFlow tables 48 through 63  execute  the  logical  egress  pipeline  from  the
                Logical_Flow  table  in  the  OVN  Southbound  database.  The egress pipeline can
                perform a final stage of validation before packet delivery.  Eventually,  it  may
                execute  an  output  action,  which  ovn-controller implements by resubmitting to
                table 64.  A packet for which the pipeline never executes output  is  effectively
                dropped (although it may have been transmitted through a tunnel across a physical
                network).

                The egress pipeline cannot change  the  logical  output  port  or  cause  further
                tunneling.

              5.
                OpenFlow table 64 performs logical-to-physical translation, the opposite of table
                0.  It matches the packet’s logical egress port.  Its actions output  the  packet
                to  the  port attached to the OVN integration bridge that represents that logical
                port.  If the logical egress port is a container nested with a  VM,  then  before
                sending the packet the actions push on a VLAN header with an appropriate VLAN ID.

                If  the  logical egress port is a logical patch port, then table 64 outputs to an
                OVS patch port that represents the logical patch port.  The packet re-enters  the
                OpenFlow  flow  table from the OVS patch port’s peer in table 0, which identifies
                the logical datapath and logical  input  port  based  on  the  OVS  patch  port’s
                OpenFlow port number.

   Life Cycle of a VTEP gateway
       A  gateway  is  a  chassis that forwards traffic between the OVN-managed part of a logical
       network and a physical VLAN,  extending a tunnel-based logical  network  into  a  physical
       network.

       The  steps  below refer often to details of the OVN and VTEP database schemas.  Please see
       ovn-sb(5), ovn-nb(5) and vtep(5), respectively, for the full story on these databases.

              1.
                A VTEP gateway’s life cycle begins with the administrator  registering  the  VTEP
                gateway   as   a   Physical_Switch   table  entry  in  the  VTEP  database.   The
                ovn-controller-vtep connected to this VTEP database, will recognize the new  VTEP
                gateway  and  create  a  new  Chassis  table  entry  for it in the OVN_Southbound
                database.

              2.
                The administrator can then create a new Logical_Switch table entry,  and  bind  a
                particular vlan on a VTEP gateway’s port to any VTEP logical switch.  Once a VTEP
                logical switch is bound to a VTEP gateway, the ovn-controller-vtep will detect it
                and  add its name to the vtep_logical_switches column of the Chassis table in the
                OVN_Southbound database.  Note, the tunnel_key column of VTEP logical  switch  is
                not  filled  at  creation.   The ovn-controller-vtep will set the column when the
                correponding vtep logical switch is bound to an OVN logical network.

              3.
                Now, the administrator can use the CMS to add a VTEP logical switch  to  the  OVN
                logical  network.  To do that, the CMS must first create a new Logical_Port table
                entry in the OVN_Northbound database.  Then, the type column of this  entry  must
                be set to "vtep".  Next, the vtep-logical-switch and vtep-physical-switch keys in
                the options column must also be  specified,  since  multiple  VTEP  gateways  can
                attach to the same VTEP logical switch.

              4.
                The   newly   created  logical  port  in  the  OVN_Northbound  database  and  its
                configuration will be passed  down  to  the  OVN_Southbound  database  as  a  new
                Port_Binding  table entry.  The ovn-controller-vtep will recognize the change and
                bind the logical port to the corresponding VTEP gateway  chassis.   Configuration
                of  binding  the  same VTEP logical switch to a different OVN logical networks is
                not allowed and a warning will be generated in the log.

              5.
                Beside binding to the VTEP gateway chassis, the ovn-controller-vtep  will  update
                the   tunnel_key   column  of  the  VTEP  logical  switch  to  the  corresponding
                Datapath_Binding table entry’s tunnel_key for the bound OVN logical network.

              6.
                Next, the ovn-controller-vtep will keep reacting to the configuration  change  in
                the   Port_Binding   in   the   OVN_Northbound   database,   and   updating   the
                Ucast_Macs_Remote table in the VTEP database.  This allows the  VTEP  gateway  to
                understand where to forward the unicast traffic coming from the extended external
                network.

              7.
                Eventually, the VTEP gateway’s life cycle ends when the administrator unregisters
                the  VTEP gateway from the VTEP database.  The ovn-controller-vtep will recognize
                the event and remove all related configurations (Chassis  table  entry  and  port
                bindings) in the OVN_Southbound database.

              8.
                When  the  ovn-controller-vtep  is  terminated, all related configurations in the
                OVN_Southbound database and the VTEP database will be cleaned, including  Chassis
                table  entries  for all registered VTEP gateways and their port bindings, and all
                Ucast_Macs_Remote table entries and the Logical_Switch tunnel keys.

DESIGN DECISIONS

   Tunnel Encapsulations
       OVN annotates logical network packets that it sends from one hypervisor  to  another  with
       the  following  three  pieces  of metadata, which are encoded in an encapsulation-specific
       fashion:

              •      24-bit logical datapath identifier, from the tunnel_key column  in  the  OVN
                     Southbound Datapath_Binding table.

              •      15-bit  logical  ingress port identifier.  ID 0 is reserved for internal use
                     within OVN.  IDs 1 through 32767, inclusive,  may  be  assigned  to  logical
                     ports (see the tunnel_key column in the OVN Southbound Port_Binding table).

              •      16-bit  logical  egress  port identifier.  IDs 0 through 32767 have the same
                     meaning as for logical ingress ports.  IDs 32768 through  65535,  inclusive,
                     may  be  assigned  to logical multicast groups (see the tunnel_key column in
                     the OVN Southbound Multicast_Group table).

       For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT encapsulations, for
       the following reasons:

              •      Only  STT and Geneve support the large amounts of metadata (over 32 bits per
                     packet) that OVN uses (as described above).

              •      STT and Geneve use randomized UDP or TCP source ports that allows  efficient
                     distribution  among  multiple  paths  in environments that use ECMP in their
                     underlay.

              •      NICs  are  available  to  offload   STT   and   Geneve   encapsulation   and
                     decapsulation.

       Due  to  its  flexibility, the preferred encapsulation between hypervisors is Geneve.  For
       Geneve encapsulation, OVN transmits the logical datapath identifier  in  the  Geneve  VNI.
       OVN  transmits  the  logical  ingress and logical egress ports in a TLV with class 0x0102,
       type 0, and a 32-bit value encoded as follows, from MSB to LSB:

              •      1 bits: rsv (0)

              •      15 bits: ingress port

              •      16 bits: egress port

       Environments whose NICs lack Geneve offload may prefer STT encapsulation  for  performance
       reasons.   For  STT encapsulation, OVN encodes all three pieces of logical metadata in the
       STT 64-bit tunnel ID as follows, from MSB to LSB:

              •      9 bits: reserved (0)

              •      15 bits: ingress port

              •      16 bits: egress port

              •      24 bits: datapath

       For connecting to gateways, in addition to Geneve and STT,  OVN  supports  VXLAN,  because
       only  VXLAN  support  is common on top-of-rack (ToR) switches.  Currently, gateways have a
       feature set that matches the capabilities as defined by the VTEP schema, so fewer bits  of
       metadata  are  necessary.  In the future, gateways that do not support encapsulations with
       large amounts of metadata may continue to have a reduced feature set.