X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;ds=sidebyside;f=DESIGN;h=211292569ff038dfea63738497417728384bef25;hb=c9f716683d1d4302f026764effc17554c93a8c9f;hp=56e2605321ec215e1dea519f4a42d3307312e15d;hpb=d31f1109f10e5ffb9bf266306b913ebf23781666;p=openvswitch diff --git a/DESIGN b/DESIGN index 56e26053..21129256 100644 --- a/DESIGN +++ b/DESIGN @@ -9,13 +9,172 @@ successful deployment. The end of this document contains contact information that can be used to let us know how we can make Open vSwitch more generally useful. +Asynchronous Messages +===================== + +Over time, Open vSwitch has added many knobs that control whether a +given controller receives OpenFlow asynchronous messages. This +section describes how all of these features interact. + +First, a service controller never receives any asynchronous messages +unless it explicitly configures a miss_send_len greater than zero with +an OFPT_SET_CONFIG message. + +Second, OFPT_FLOW_REMOVED and NXT_FLOW_REMOVED messages are generated +only if the flow that was removed had the OFPFF_SEND_FLOW_REM flag +set. + +Third, OFPT_PACKET_IN and NXT_PACKET_IN messages are sent only to +OpenFlow controller connections that have the correct connection ID +(see "struct nx_controller_id" and "struct nx_action_controller"): + + - For packet-in messages generated by a NXAST_CONTROLLER action, + the controller ID specified in the action. + + - For other packet-in messages, controller ID zero. (This is the + default ID when an OpenFlow controller does not configure one.) + +Finally, Open vSwitch consults a per-connection table indexed by the +message type, reason code, and current role. The following table +shows how this table is initialized by default when an OpenFlow +connection is made. An entry labeled "yes" means that the message is +sent, an entry labeled "---" means that the message is suppressed. + + master/ + message and reason code other slave + ---------------------------------------- ------- ----- + OFPT_PACKET_IN / NXT_PACKET_IN + OFPR_NO_MATCH yes --- + OFPR_ACTION yes --- + OFPR_INVALID_TTL --- --- + + OFPT_FLOW_REMOVED / NXT_FLOW_REMOVED + OFPRR_IDLE_TIMEOUT yes --- + OFPRR_HARD_TIMEOUT yes --- + OFPRR_DELETE yes --- + + OFPT_PORT_STATUS + OFPPR_ADD yes yes + OFPPR_DELETE yes yes + OFPPR_MODIFY yes yes + +The NXT_SET_ASYNC_CONFIG message directly sets all of the values in +this table for the current connection. The +OFPC_INVALID_TTL_TO_CONTROLLER bit in the OFPT_SET_CONFIG message +controls the setting for OFPR_INVALID_TTL for the "master" role. + + +OFPAT_ENQUEUE +============= + +The OpenFlow 1.0 specification requires the output port of the OFPAT_ENQUEUE +action to "refer to a valid physical port (i.e. < OFPP_MAX) or OFPP_IN_PORT". +Although OFPP_LOCAL is not less than OFPP_MAX, it is an 'internal' port which +can have QoS applied to it in Linux. Since we allow the OFPAT_ENQUEUE to apply +to 'internal' ports whose port numbers are less than OFPP_MAX, we interpret +OFPP_LOCAL as a physical port and support OFPAT_ENQUEUE on it as well. + + +OFPT_FLOW_MOD +============= + +The OpenFlow 1.0 specification for the behavior of OFPT_FLOW_MOD is +confusing. The following table summarizes the Open vSwitch +implementation of its behavior in the following categories: + + - "match on priority": Whether the flow_mod acts only on flows + whose priority matches that included in the flow_mod message. + + - "match on out_port": Whether the flow_mod acts only on flows + that output to the out_port included in the flow_mod message (if + out_port is not OFPP_NONE). + + - "updates flow_cookie": Whether the flow_mod changes the + flow_cookie of the flow or flows that it matches to the + flow_cookie included in the flow_mod message. + + - "updates OFPFF_ flags": Whether the flow_mod changes the + OFPFF_SEND_FLOW_REM flag of the flow or flows that it matches to + the setting included in the flags of the flow_mod message. + + - "honors OFPFF_CHECK_OVERLAP": Whether the OFPFF_CHECK_OVERLAP + flag in the flow_mod is significant. + + - "updates idle_timeout" and "updates hard_timeout": Whether the + idle_timeout and hard_timeout in the flow_mod, respectively, + have an effect on the flow or flows matched by the flow_mod. + + - "updates idle timer": Whether the flow_mod resets the per-flow + timer that measures how long a flow has been idle. + + - "updates hard timer": Whether the flow_mod resets the per-flow + timer that measures how long it has been since a flow was + modified. + + - "zeros counters": Whether the flow_mod resets per-flow packet + and byte counters to zero. + + - "sends flow_removed message": Whether the flow_mod generates a + flow_removed message for the flow or flows that it affects. + +An entry labeled "yes" means that the flow mod type does have the +indicated behavior, "---" means that it does not, an empty cell means +that the property is not applicable, and other values are explained +below the table. + + MODIFY DELETE + ADD MODIFY STRICT DELETE STRICT + === ====== ====== ====== ====== +match on priority --- --- yes --- yes +match on out_port --- --- --- yes yes +updates flow_cookie yes yes yes +updates OFPFF_SEND_FLOW_REM yes + + +honors OFPFF_CHECK_OVERLAP yes + + +updates idle_timeout yes + + +updates hard_timeout yes + + +resets idle timer yes + + +resets hard timer yes yes yes +zeros counters yes + + +sends flow_removed message --- --- --- % % + +(+) "modify" and "modify-strict" only take these actions when they + create a new flow, not when they update an existing flow. + +(%) "delete" and "delete_strict" generates a flow_removed message if + the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. + (Each controller can separately control whether it wants to + receive the generated messages.) + + +Multiple Table Support +====================== + +OpenFlow 1.0 has only rudimentary support for multiple flow tables. +Notably, OpenFlow 1.0 does not allow the controller to specify the +flow table to which a flow is to be added. Open vSwitch adds an +extension for this purpose, which is enabled on a per-OpenFlow +connection basis using the NXT_FLOW_MOD_TABLE_ID message. When the +extension is enabled, the upper 8 bits of the 'command' member in an +OFPT_FLOW_MOD or NXT_FLOW_MOD message designates the table to which a +flow is to be added. + +The Open vSwitch software switch implementation offers 255 flow +tables. On packet ingress, only the first flow table (table 0) is +searched, and the contents of the remaining tables are not considered +in any way. Tables other than table 0 only come into play when an +NXAST_RESUBMIT_TABLE action specifies another table to search. + +Tables 128 and above are reserved for use by the switch itself. +Controllers should use only tables 0 through 127. + IPv6 ==== Open vSwitch supports stateless handling of IPv6 packets. Flows can be written to support matching TCP, UDP, and ICMPv6 headers within an IPv6 -packet. +packet. Deeper matching of some Neighbor Discovery messages is also +supported. IPv6 was not designed to interact well with middle-boxes. This, combined with Open vSwitch's stateless nature, have affected the @@ -70,6 +229,169 @@ nodes that do not connect to link with such large MTUs. Currently, Open vSwitch doesn't process jumbograms. +In-Band Control +=============== + +In-band control allows a single network to be used for OpenFlow traffic and +other data traffic. See ovs-vswitchd.conf.db(5) for a description of +configuring in-band control. + +This comment is an attempt to describe how in-band control works at a +wire- and implementation-level. Correctly implementing in-band +control has proven difficult due to its many subtleties, and has thus +gone through many iterations. Please read through and understand the +reasoning behind the chosen rules before making modifications. + +In Open vSwitch, in-band control is implemented as "hidden" flows (in that +they are not visible through OpenFlow) and at a higher priority than +wildcarded flows can be set up by through OpenFlow. This is done so that +the OpenFlow controller cannot interfere with them and possibly break +connectivity with its switches. It is possible to see all flows, including +in-band ones, with the ovs-appctl "bridge/dump-flows" command. + +The Open vSwitch implementation of in-band control can hide traffic to +arbitrary "remotes", where each remote is one TCP port on one IP address. +Currently the remotes are automatically configured as the in-band OpenFlow +controllers plus the OVSDB managers, if any. (The latter is a requirement +because OVSDB managers are responsible for configuring OpenFlow controllers, +so if the manager cannot be reached then OpenFlow cannot be reconfigured.) + +The following rules (with the OFPP_NORMAL action) are set up on any bridge +that has any remotes: + + (a) DHCP requests sent from the local port. + (b) ARP replies to the local port's MAC address. + (c) ARP requests from the local port's MAC address. + +In-band also sets up the following rules for each unique next-hop MAC +address for the remotes' IPs (the "next hop" is either the remote +itself, if it is on a local subnet, or the gateway to reach the remote): + + (d) ARP replies to the next hop's MAC address. + (e) ARP requests from the next hop's MAC address. + +In-band also sets up the following rules for each unique remote IP address: + + (f) ARP replies containing the remote's IP address as a target. + (g) ARP requests containing the remote's IP address as a source. + +In-band also sets up the following rules for each unique remote (IP,port) +pair: + + (h) TCP traffic to the remote's IP and port. + (i) TCP traffic from the remote's IP and port. + +The goal of these rules is to be as narrow as possible to allow a +switch to join a network and be able to communicate with the +remotes. As mentioned earlier, these rules have higher priority +than the controller's rules, so if they are too broad, they may +prevent the controller from implementing its policy. As such, +in-band actively monitors some aspects of flow and packet processing +so that the rules can be made more precise. + +In-band control monitors attempts to add flows into the datapath that +could interfere with its duties. The datapath only allows exact +match entries, so in-band control is able to be very precise about +the flows it prevents. Flows that miss in the datapath are sent to +userspace to be processed, so preventing these flows from being +cached in the "fast path" does not affect correctness. The only type +of flow that is currently prevented is one that would prevent DHCP +replies from being seen by the local port. For example, a rule that +forwarded all DHCP traffic to the controller would not be allowed, +but one that forwarded to all ports (including the local port) would. + +As mentioned earlier, packets that miss in the datapath are sent to +the userspace for processing. The userspace has its own flow table, +the "classifier", so in-band checks whether any special processing +is needed before the classifier is consulted. If a packet is a DHCP +response to a request from the local port, the packet is forwarded to +the local port, regardless of the flow table. Note that this requires +L7 processing of DHCP replies to determine whether the 'chaddr' field +matches the MAC address of the local port. + +It is interesting to note that for an L3-based in-band control +mechanism, the majority of rules are devoted to ARP traffic. At first +glance, some of these rules appear redundant. However, each serves an +important role. First, in order to determine the MAC address of the +remote side (controller or gateway) for other ARP rules, we must allow +ARP traffic for our local port with rules (b) and (c). If we are +between a switch and its connection to the remote, we have to +allow the other switch's ARP traffic to through. This is done with +rules (d) and (e), since we do not know the addresses of the other +switches a priori, but do know the remote's or gateway's. Finally, +if the remote is running in a local guest VM that is not reached +through the local port, the switch that is connected to the VM must +allow ARP traffic based on the remote's IP address, since it will +not know the MAC address of the local port that is sending the traffic +or the MAC address of the remote in the guest VM. + +With a few notable exceptions below, in-band should work in most +network setups. The following are considered "supported' in the +current implementation: + + - Locally Connected. The switch and remote are on the same + subnet. This uses rules (a), (b), (c), (h), and (i). + + - Reached through Gateway. The switch and remote are on + different subnets and must go through a gateway. This uses + rules (a), (b), (c), (h), and (i). + + - Between Switch and Remote. This switch is between another + switch and the remote, and we want to allow the other + switch's traffic through. This uses rules (d), (e), (h), and + (i). It uses (b) and (c) indirectly in order to know the MAC + address for rules (d) and (e). Note that DHCP for the other + switch will not work unless an OpenFlow controller explicitly lets this + switch pass the traffic. + + - Between Switch and Gateway. This switch is between another + switch and the gateway, and we want to allow the other switch's + traffic through. This uses the same rules and logic as the + "Between Switch and Remote" configuration described earlier. + + - Remote on Local VM. The remote is a guest VM on the + system running in-band control. This uses rules (a), (b), (c), + (h), and (i). + + - Remote on Local VM with Different Networks. The remote + is a guest VM on the system running in-band control, but the + local port is not used to connect to the remote. For + example, an IP address is configured on eth0 of the switch. The + remote's VM is connected through eth1 of the switch, but an + IP address has not been configured for that port on the switch. + As such, the switch will use eth0 to connect to the remote, + and eth1's rules about the local port will not work. In the + example, the switch attached to eth0 would use rules (a), (b), + (c), (h), and (i) on eth0. The switch attached to eth1 would use + rules (f), (g), (h), and (i). + +The following are explicitly *not* supported by in-band control: + + - Specify Remote by Name. Currently, the remote must be + identified by IP address. A naive approach would be to permit + all DNS traffic. Unfortunately, this would prevent the + controller from defining any policy over DNS. Since switches + that are located behind us need to connect to the remote, + in-band cannot simply add a rule that allows DNS traffic from + the local port. The "correct" way to support this is to parse + DNS requests to allow all traffic related to a request for the + remote's name through. Due to the potential security + problems and amount of processing, we decided to hold off for + the time-being. + + - Differing Remotes for Switches. All switches must know + the L3 addresses for all the remotes that other switches + may use, since rules need to be set up to allow traffic related + to those remotes through. See rules (f), (g), (h), and (i). + + - Differing Routes for Switches. In order for the switch to + allow other switches to connect to a remote through a + gateway, it allows the gateway's traffic through with rules (d) + and (e). If the routes to the remote differ for the two + switches, we will not know the MAC address of the alternate + gateway. + + Suggestions ===========