information that can be used to let us know how we can make Open vSwitch
more generally useful.
+Asynchronous Messages
+=====================
+
+Over time, Open vSwitch has added many knobs that control whether a
+given controller receives OpenFlow asynchronous messages. This
+section describes how all of these features interact.
+
+First, a service controller never receives any asynchronous messages
+unless it changes its miss_send_len from the service controller
+default of zero in one of the following ways:
+
+ - Sending an OFPT_SET_CONFIG message with nonzero miss_send_len.
+
+ - Sending any NXT_SET_ASYNC_CONFIG message: as a side effect, this
+ message changes the miss_send_len to
+ OFP_DEFAULT_MISS_SEND_LEN (128) for service controllers.
+
+Second, OFPT_FLOW_REMOVED and NXT_FLOW_REMOVED messages are generated
+only if the flow that was removed had the OFPFF_SEND_FLOW_REM flag
+set.
+
+Third, OFPT_PACKET_IN and NXT_PACKET_IN messages are sent only to
+OpenFlow controller connections that have the correct connection ID
+(see "struct nx_controller_id" and "struct nx_action_controller"):
+
+ - For packet-in messages generated by a NXAST_CONTROLLER action,
+ the controller ID specified in the action.
+
+ - For other packet-in messages, controller ID zero. (This is the
+ default ID when an OpenFlow controller does not configure one.)
+
+Finally, Open vSwitch consults a per-connection table indexed by the
+message type, reason code, and current role. The following table
+shows how this table is initialized by default when an OpenFlow
+connection is made. An entry labeled "yes" means that the message is
+sent, an entry labeled "---" means that the message is suppressed.
+
+ master/
+ message and reason code other slave
+ ---------------------------------------- ------- -----
+ OFPT_PACKET_IN / NXT_PACKET_IN
+ OFPR_NO_MATCH yes ---
+ OFPR_ACTION yes ---
+ OFPR_INVALID_TTL --- ---
+
+ OFPT_FLOW_REMOVED / NXT_FLOW_REMOVED
+ OFPRR_IDLE_TIMEOUT yes ---
+ OFPRR_HARD_TIMEOUT yes ---
+ OFPRR_DELETE yes ---
+
+ OFPT_PORT_STATUS
+ OFPPR_ADD yes yes
+ OFPPR_DELETE yes yes
+ OFPPR_MODIFY yes yes
+
+The NXT_SET_ASYNC_CONFIG message directly sets all of the values in
+this table for the current connection. The
+OFPC_INVALID_TTL_TO_CONTROLLER bit in the OFPT_SET_CONFIG message
+controls the setting for OFPR_INVALID_TTL for the "master" role.
+
+
+OFPAT_ENQUEUE
+=============
+
+The OpenFlow 1.0 specification requires the output port of the OFPAT_ENQUEUE
+action to "refer to a valid physical port (i.e. < OFPP_MAX) or OFPP_IN_PORT".
+Although OFPP_LOCAL is not less than OFPP_MAX, it is an 'internal' port which
+can have QoS applied to it in Linux. Since we allow the OFPAT_ENQUEUE to apply
+to 'internal' ports whose port numbers are less than OFPP_MAX, we interpret
+OFPP_LOCAL as a physical port and support OFPAT_ENQUEUE on it as well.
+
+
+OFPT_FLOW_MOD
+=============
+
+The OpenFlow 1.0 specification for the behavior of OFPT_FLOW_MOD is
+confusing. The following table summarizes the Open vSwitch
+implementation of its behavior in the following categories:
+
+ - "match on priority": Whether the flow_mod acts only on flows
+ whose priority matches that included in the flow_mod message.
+
+ - "match on out_port": Whether the flow_mod acts only on flows
+ that output to the out_port included in the flow_mod message (if
+ out_port is not OFPP_NONE).
+
+ - "updates flow_cookie": Whether the flow_mod changes the
+ flow_cookie of the flow or flows that it matches to the
+ flow_cookie included in the flow_mod message.
+
+ - "updates OFPFF_ flags": Whether the flow_mod changes the
+ OFPFF_SEND_FLOW_REM flag of the flow or flows that it matches to
+ the setting included in the flags of the flow_mod message.
+
+ - "honors OFPFF_CHECK_OVERLAP": Whether the OFPFF_CHECK_OVERLAP
+ flag in the flow_mod is significant.
+
+ - "updates idle_timeout" and "updates hard_timeout": Whether the
+ idle_timeout and hard_timeout in the flow_mod, respectively,
+ have an effect on the flow or flows matched by the flow_mod.
+
+ - "updates idle timer": Whether the flow_mod resets the per-flow
+ timer that measures how long a flow has been idle.
+
+ - "updates hard timer": Whether the flow_mod resets the per-flow
+ timer that measures how long it has been since a flow was
+ modified.
+
+ - "zeros counters": Whether the flow_mod resets per-flow packet
+ and byte counters to zero.
+
+ - "sends flow_removed message": Whether the flow_mod generates a
+ flow_removed message for the flow or flows that it affects.
+
+An entry labeled "yes" means that the flow mod type does have the
+indicated behavior, "---" means that it does not, an empty cell means
+that the property is not applicable, and other values are explained
+below the table.
+
+ MODIFY DELETE
+ ADD MODIFY STRICT DELETE STRICT
+ === ====== ====== ====== ======
+match on priority --- --- yes --- yes
+match on out_port --- --- --- yes yes
+updates flow_cookie yes yes yes
+updates OFPFF_SEND_FLOW_REM yes + +
+honors OFPFF_CHECK_OVERLAP yes + +
+updates idle_timeout yes + +
+updates hard_timeout yes + +
+resets idle timer yes + +
+resets hard timer yes yes yes
+zeros counters yes + +
+sends flow_removed message --- --- --- % %
+
+(+) "modify" and "modify-strict" only take these actions when they
+ create a new flow, not when they update an existing flow.
+
+(%) "delete" and "delete_strict" generates a flow_removed message if
+ the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set.
+ (Each controller can separately control whether it wants to
+ receive the generated messages.)
+
+
+Flow Cookies
+============
+
+OpenFlow 1.0 and later versions have the concept of a "flow cookie",
+which is a 64-bit integer value attached to each flow. The treatment
+of the flow cookie has varied greatly across OpenFlow versions,
+however.
+
+In OpenFlow 1.0:
+
+ - OFPFC_ADD set the cookie in the flow that it added.
+
+ - OFPFC_MODIFY and OFPFC_MODIFY_STRICT updated the cookie for
+ the flow or flows that it modified.
+
+ - OFPST_FLOW messages included the flow cookie.
+
+ - OFPT_FLOW_REMOVED messages reported the cookie of the flow
+ that was removed.
+
+OpenFlow 1.1 made the following changes:
+
+ - Flow mod operations OFPFC_MODIFY, OFPFC_MODIFY_STRICT,
+ OFPFC_DELETE, and OFPFC_DELETE_STRICT, plus flow stats
+ requests and aggregate stats requests, gained the ability to
+ match on flow cookies with an arbitrary mask.
+
+ - OFPFC_MODIFY and OFPFC_MODIFY_STRICT were changed to add a
+ new flow, in the case of no match, only if the flow table
+ modification operation did not match on the cookie field.
+ (In OpenFlow 1.0, modify operations always added a new flow
+ when there was no match.)
+
+ - OFPFC_MODIFY and OFPFC_MODIFY_STRICT no longer updated flow
+ cookies.
+
+OpenFlow 1.2 made the following changes:
+
+ - OFPC_MODIFY and OFPFC_MODIFY_STRICT were changed to never
+ add a new flow, regardless of whether the flow cookie was
+ used for matching.
+
+Open vSwitch support for OpenFlow 1.0 implements the OpenFlow 1.0
+behavior with the following extensions:
+
+ - An NXM extension field NXM_NX_COOKIE(_W) allows the NXM
+ versions of OFPFC_MODIFY, OFPFC_MODIFY_STRICT, OFPFC_DELETE,
+ and OFPFC_DELETE_STRICT flow_mods, plus flow stats requests
+ and aggregate stats requests, to match on flow cookies with
+ arbitrary masks. This is much like the equivalent OpenFlow
+ 1.1 feature.
+
+ - Like OpenFlow 1.1, OFPC_MODIFY and OFPFC_MODIFY_STRICT add a
+ new flow if there is no match and the mask is zero (or not
+ given).
+
+ - The "cookie" field in OFPT_FLOW_MOD and NXT_FLOW_MOD messages
+ is used as the cookie value for OFPFC_ADD commands, as
+ described in OpenFlow 1.0. For OFPFC_MODIFY and
+ OFPFC_MODIFY_STRICT commands, the "cookie" field is used as a
+ new cookie for flows that match unless it is UINT64_MAX, in
+ which case the flow's cookie is not updated.
+
+ - NXT_PACKET_IN (the Nicira extended version of
+ OFPT_PACKET_IN) reports the cookie of the rule that
+ generated the packet, or all-1-bits if no rule generated the
+ packet. (Older versions of OVS used all-0-bits instead of
+ all-1-bits.)
+
+The following table shows the handling of different protocols when
+receiving OFPFC_MODIFY and OFPFC_MODIFY_STRICT messages. A mask of 0
+indicates either an explicit mask of zero or an implicit one by not
+specifying the NXM_NX_COOKIE(_W) field.
+
+ Match Update Add on miss Add on miss
+ cookie cookie mask!=0 mask==0
+ ====== ====== =========== ===========
+OpenFlow 1.0 no yes <always add on miss>
+OpenFlow 1.1 yes no no yes
+OpenFlow 1.2 yes no no no
+NXM yes yes* no yes
+
+* Updates the flow's cookie unless the "cookie" field is UINT64_MAX.
+
+
+Multiple Table Support
+======================
+
+OpenFlow 1.0 has only rudimentary support for multiple flow tables.
+Notably, OpenFlow 1.0 does not allow the controller to specify the
+flow table to which a flow is to be added. Open vSwitch adds an
+extension for this purpose, which is enabled on a per-OpenFlow
+connection basis using the NXT_FLOW_MOD_TABLE_ID message. When the
+extension is enabled, the upper 8 bits of the 'command' member in an
+OFPT_FLOW_MOD or NXT_FLOW_MOD message designates the table to which a
+flow is to be added.
+
+The Open vSwitch software switch implementation offers 255 flow
+tables. On packet ingress, only the first flow table (table 0) is
+searched, and the contents of the remaining tables are not considered
+in any way. Tables other than table 0 only come into play when an
+NXAST_RESUBMIT_TABLE action specifies another table to search.
+
+Tables 128 and above are reserved for use by the switch itself.
+Controllers should use only tables 0 through 127.
+
IPv6
====
Open vSwitch supports stateless handling of IPv6 packets. Flows can be
written to support matching TCP, UDP, and ICMPv6 headers within an IPv6
-packet.
+packet. Deeper matching of some Neighbor Discovery messages is also
+supported.
IPv6 was not designed to interact well with middle-boxes. This,
combined with Open vSwitch's stateless nature, have affected the
vSwitch doesn't process jumbograms.
+In-Band Control
+===============
+
+Motivation
+----------
+
+An OpenFlow switch must establish and maintain a TCP network
+connection to its controller. There are two basic ways to categorize
+the network that this connection traverses: either it is completely
+separate from the one that the switch is otherwise controlling, or its
+path may overlap the network that the switch controls. We call the
+former case "out-of-band control", the latter case "in-band control".
+
+Out-of-band control has the following benefits:
+
+ - Simplicity: Out-of-band control slightly simplifies the switch
+ implementation.
+
+ - Reliability: Excessive switch traffic volume cannot interfere
+ with control traffic.
+
+ - Integrity: Machines not on the control network cannot
+ impersonate a switch or a controller.
+
+ - Confidentiality: Machines not on the control network cannot
+ snoop on control traffic.
+
+In-band control, on the other hand, has the following advantages:
+
+ - No dedicated port: There is no need to dedicate a physical
+ switch port to control, which is important on switches that have
+ few ports (e.g. wireless routers, low-end embedded platforms).
+
+ - No dedicated network: There is no need to build and maintain a
+ separate control network. This is important in many
+ environments because it reduces proliferation of switches and
+ wiring.
+
+Open vSwitch supports both out-of-band and in-band control. This
+section describes the principles behind in-band control. See the
+description of the Controller table in ovs-vswitchd.conf.db(5) to
+configure OVS for in-band control.
+
+Principles
+----------
+
+The fundamental principle of in-band control is that an OpenFlow
+switch must recognize and switch control traffic without involving the
+OpenFlow controller. All the details of implementing in-band control
+are special cases of this principle.
+
+The rationale for this principle is simple. If the switch does not
+handle in-band control traffic itself, then it will be caught in a
+contradiction: it must contact the controller, but it cannot, because
+only the controller can set up the flows that are needed to contact
+the controller.
+
+The following points describe important special cases of this
+principle.
+
+ - In-band control must be implemented regardless of whether the
+ switch is connected.
+
+ It is tempting to implement the in-band control rules only when
+ the switch is not connected to the controller, using the
+ reasoning that the controller should have complete control once
+ it has established a connection with the switch.
+
+ This does not work in practice. Consider the case where the
+ switch is connected to the controller. Occasionally it can
+ happen that the controller forgets or otherwise needs to obtain
+ the MAC address of the switch. To do so, the controller sends a
+ broadcast ARP request. A switch that implements the in-band
+ control rules only when it is disconnected will then send an
+ OFPT_PACKET_IN message up to the controller. The controller will
+ be unable to respond, because it does not know the MAC address of
+ the switch. This is a deadlock situation that can only be
+ resolved by the switch noticing that its connection to the
+ controller has hung and reconnecting.
+
+ - In-band control must override flows set up by the controller.
+
+ It is reasonable to assume that flows set up by the OpenFlow
+ controller should take precedence over in-band control, on the
+ basis that the controller should be in charge of the switch.
+
+ Again, this does not work in practice. Reasonable controller
+ implementations may set up a "last resort" fallback rule that
+ wildcards every field and, e.g., sends it up to the controller or
+ discards it. If a controller does that, then it will isolate
+ itself from the switch.
+
+ - The switch must recognize all control traffic.
+
+ The fundamental principle of in-band control states, in part,
+ that a switch must recognize control traffic without involving
+ the OpenFlow controller. More specifically, the switch must
+ recognize *all* control traffic. "False negatives", that is,
+ packets that constitute control traffic but that the switch does
+ not recognize as control traffic, lead to control traffic storms.
+
+ Consider an OpenFlow switch that only recognizes control packets
+ sent to or from that switch. Now suppose that two switches of
+ this type, named A and B, are connected to ports on an Ethernet
+ hub (not a switch) and that an OpenFlow controller is connected
+ to a third hub port. In this setup, control traffic sent by
+ switch A will be seen by switch B, which will send it to the
+ controller as part of an OFPT_PACKET_IN message. Switch A will
+ then see the OFPT_PACKET_IN message's packet, re-encapsulate it
+ in another OFPT_PACKET_IN, and send it to the controller. Switch
+ B will then see that OFPT_PACKET_IN, and so on in an infinite
+ loop.
+
+ Incidentally, the consequences of "false positives", where
+ packets that are not control traffic are nevertheless recognized
+ as control traffic, are much less severe. The controller will
+ not be able to control their behavior, but the network will
+ remain in working order. False positives do constitute a
+ security problem.
+
+ - The switch should use echo-requests to detect disconnection.
+
+ TCP will notice that a connection has hung, but this can take a
+ considerable amount of time. For example, with default settings
+ the Linux kernel TCP implementation will retransmit for between
+ 13 and 30 minutes, depending on the connection's retransmission
+ timeout, according to kernel documentation. This is far too long
+ for a switch to be disconnected, so an OpenFlow switch should
+ implement its own connection timeout. OpenFlow OFPT_ECHO_REQUEST
+ messages are the best way to do this, since they test the
+ OpenFlow connection itself.
+
+Implementation
+--------------
+
+This section describes how Open vSwitch implements in-band control.
+Correctly implementing in-band control has proven difficult due to its
+many subtleties, and has thus gone through many iterations. Please
+read through and understand the reasoning behind the chosen rules
+before making modifications.
+
+Open vSwitch implements in-band control as "hidden" flows, that is,
+flows that are not visible through OpenFlow, and at a higher priority
+than wildcarded flows can be set up through OpenFlow. This is done so
+that the OpenFlow controller cannot interfere with them and possibly
+break connectivity with its switches. It is possible to see all
+flows, including in-band ones, with the ovs-appctl "bridge/dump-flows"
+command.
+
+The Open vSwitch implementation of in-band control can hide traffic to
+arbitrary "remotes", where each remote is one TCP port on one IP address.
+Currently the remotes are automatically configured as the in-band OpenFlow
+controllers plus the OVSDB managers, if any. (The latter is a requirement
+because OVSDB managers are responsible for configuring OpenFlow controllers,
+so if the manager cannot be reached then OpenFlow cannot be reconfigured.)
+
+The following rules (with the OFPP_NORMAL action) are set up on any bridge
+that has any remotes:
+
+ (a) DHCP requests sent from the local port.
+ (b) ARP replies to the local port's MAC address.
+ (c) ARP requests from the local port's MAC address.
+
+In-band also sets up the following rules for each unique next-hop MAC
+address for the remotes' IPs (the "next hop" is either the remote
+itself, if it is on a local subnet, or the gateway to reach the remote):
+
+ (d) ARP replies to the next hop's MAC address.
+ (e) ARP requests from the next hop's MAC address.
+
+In-band also sets up the following rules for each unique remote IP address:
+
+ (f) ARP replies containing the remote's IP address as a target.
+ (g) ARP requests containing the remote's IP address as a source.
+
+In-band also sets up the following rules for each unique remote (IP,port)
+pair:
+
+ (h) TCP traffic to the remote's IP and port.
+ (i) TCP traffic from the remote's IP and port.
+
+The goal of these rules is to be as narrow as possible to allow a
+switch to join a network and be able to communicate with the
+remotes. As mentioned earlier, these rules have higher priority
+than the controller's rules, so if they are too broad, they may
+prevent the controller from implementing its policy. As such,
+in-band actively monitors some aspects of flow and packet processing
+so that the rules can be made more precise.
+
+In-band control monitors attempts to add flows into the datapath that
+could interfere with its duties. The datapath only allows exact
+match entries, so in-band control is able to be very precise about
+the flows it prevents. Flows that miss in the datapath are sent to
+userspace to be processed, so preventing these flows from being
+cached in the "fast path" does not affect correctness. The only type
+of flow that is currently prevented is one that would prevent DHCP
+replies from being seen by the local port. For example, a rule that
+forwarded all DHCP traffic to the controller would not be allowed,
+but one that forwarded to all ports (including the local port) would.
+
+As mentioned earlier, packets that miss in the datapath are sent to
+the userspace for processing. The userspace has its own flow table,
+the "classifier", so in-band checks whether any special processing
+is needed before the classifier is consulted. If a packet is a DHCP
+response to a request from the local port, the packet is forwarded to
+the local port, regardless of the flow table. Note that this requires
+L7 processing of DHCP replies to determine whether the 'chaddr' field
+matches the MAC address of the local port.
+
+It is interesting to note that for an L3-based in-band control
+mechanism, the majority of rules are devoted to ARP traffic. At first
+glance, some of these rules appear redundant. However, each serves an
+important role. First, in order to determine the MAC address of the
+remote side (controller or gateway) for other ARP rules, we must allow
+ARP traffic for our local port with rules (b) and (c). If we are
+between a switch and its connection to the remote, we have to
+allow the other switch's ARP traffic to through. This is done with
+rules (d) and (e), since we do not know the addresses of the other
+switches a priori, but do know the remote's or gateway's. Finally,
+if the remote is running in a local guest VM that is not reached
+through the local port, the switch that is connected to the VM must
+allow ARP traffic based on the remote's IP address, since it will
+not know the MAC address of the local port that is sending the traffic
+or the MAC address of the remote in the guest VM.
+
+With a few notable exceptions below, in-band should work in most
+network setups. The following are considered "supported' in the
+current implementation:
+
+ - Locally Connected. The switch and remote are on the same
+ subnet. This uses rules (a), (b), (c), (h), and (i).
+
+ - Reached through Gateway. The switch and remote are on
+ different subnets and must go through a gateway. This uses
+ rules (a), (b), (c), (h), and (i).
+
+ - Between Switch and Remote. This switch is between another
+ switch and the remote, and we want to allow the other
+ switch's traffic through. This uses rules (d), (e), (h), and
+ (i). It uses (b) and (c) indirectly in order to know the MAC
+ address for rules (d) and (e). Note that DHCP for the other
+ switch will not work unless an OpenFlow controller explicitly lets this
+ switch pass the traffic.
+
+ - Between Switch and Gateway. This switch is between another
+ switch and the gateway, and we want to allow the other switch's
+ traffic through. This uses the same rules and logic as the
+ "Between Switch and Remote" configuration described earlier.
+
+ - Remote on Local VM. The remote is a guest VM on the
+ system running in-band control. This uses rules (a), (b), (c),
+ (h), and (i).
+
+ - Remote on Local VM with Different Networks. The remote
+ is a guest VM on the system running in-band control, but the
+ local port is not used to connect to the remote. For
+ example, an IP address is configured on eth0 of the switch. The
+ remote's VM is connected through eth1 of the switch, but an
+ IP address has not been configured for that port on the switch.
+ As such, the switch will use eth0 to connect to the remote,
+ and eth1's rules about the local port will not work. In the
+ example, the switch attached to eth0 would use rules (a), (b),
+ (c), (h), and (i) on eth0. The switch attached to eth1 would use
+ rules (f), (g), (h), and (i).
+
+The following are explicitly *not* supported by in-band control:
+
+ - Specify Remote by Name. Currently, the remote must be
+ identified by IP address. A naive approach would be to permit
+ all DNS traffic. Unfortunately, this would prevent the
+ controller from defining any policy over DNS. Since switches
+ that are located behind us need to connect to the remote,
+ in-band cannot simply add a rule that allows DNS traffic from
+ the local port. The "correct" way to support this is to parse
+ DNS requests to allow all traffic related to a request for the
+ remote's name through. Due to the potential security
+ problems and amount of processing, we decided to hold off for
+ the time-being.
+
+ - Differing Remotes for Switches. All switches must know
+ the L3 addresses for all the remotes that other switches
+ may use, since rules need to be set up to allow traffic related
+ to those remotes through. See rules (f), (g), (h), and (i).
+
+ - Differing Routes for Switches. In order for the switch to
+ allow other switches to connect to a remote through a
+ gateway, it allows the gateway's traffic through with rules (d)
+ and (e). If the routes to the remote differ for the two
+ switches, we will not know the MAC address of the alternate
+ gateway.
+
+
+Action Reproduction
+===================
+
+It seems likely that many controllers, at least at startup, use the
+OpenFlow "flow statistics" request to obtain existing flows, then
+compare the flows' actions against the actions that they expect to
+find. Before version 1.8.0, Open vSwitch always returned exact,
+byte-for-byte copies of the actions that had been added to the flow
+table. The current version of Open vSwitch does not always do this in
+some exceptional cases. This section lists the exceptions that
+controller authors must keep in mind if they compare actual actions
+against desired actions in a bytewise fashion:
+
+ - Open vSwitch zeros padding bytes in action structures,
+ regardless of their values when the flows were added.
+
+ - Open vSwitch "normalizes" the instructions in OpenFlow 1.1
+ (and later) in the following way:
+
+ * OVS sorts the instructions into the following order:
+ Apply-Actions, Clear-Actions, Write-Actions,
+ Write-Metadata, Goto-Table.
+
+ * OVS drops Apply-Actions instructions that have empty
+ action lists.
+
+ * OVS drops Write-Actions instructions that have empty
+ action sets.
+
+Please report other discrepancies, if you notice any, so that we can
+fix or document them.
+
+
Suggestions
===========