/depcomp
/install-sh
/missing
+/package.m4
/stamp-h1
Module.symvers
TAGS
Omit parameter names from function prototypes when the names do not
give useful information, e.g.:
- int netdev_get_mtu(const struct netdev *);
+ int netdev_get_mtu(const struct netdev *, int *mtup);
STATEMENTS
precedence makes it necessary, or unless the operands are themselves
expressions that use && and ||. Thus:
- if (!isdigit(s[0]) || !isdigit(s[1]) || !isdigit(s[2])) {
+ if (!isdigit((unsigned char)s[0])
+ || !isdigit((unsigned char)s[1])
+ || !isdigit((unsigned char)s[2])) {
printf("string %s does not start with 3-digit code\n", s);
}
====================================
This document describes how to build and install Open vSwitch on a
-generic Linux host host. If you want to install Open vSwitch on a
-Citrix XenServer 5.5.0, see INSTALL.XenServer instead.
+generic Linux host. If you want to install Open vSwitch on a Citrix
+XenServer version 5.5.0, see INSTALL.XenServer instead.
This version of Open vSwitch should be built manually with "configure"
and "make". Debian packaging for Open vSwitch is also included, but
At runtime, you may make ovs-vswitchd reload its configuration file
and update its configuration accordingly by sending it a SIGHUP
-signal. The ovs-appctl utility can also be used to do this with a
-command such as:
+signal. The ovs-appctl utility can also be used to do this:
- % ovs-appctl -t <pid> -e vswitchd/reload
+ % ovs-appctl vswitchd/reload
-where <pid> is ovs-vswitchd's process ID. In the latter case,
-ovs-appctl will not exit until the reload and reconfiguration is
-complete.
+In the latter case, ovs-appctl will wait for ovs-vswitchd to finish
+reloading before it exits.
Bug Reporting
-------------
-Please report problems to ovs-bugs@openvswitch.org.
+Please report problems to bugs@openvswitch.org.
--- /dev/null
+ Using Open vSwitch as a Simple OpenFlow Switch
+ ==============================================
+
+Open vSwitch uses OpenFlow as its preferred method of remote flow table
+configuration. This is the simplest method of using it with an OpenFlow
+controller. All that is required is to follow the instructions in
+INSTALL.Linux and add the bridge.<name>.controller set of parameters to the
+ovs-vswitchd(8) configuration file as described in ovs-vswitchd.conf(5).
+We recommend using OpenFlow in this manner. However, it is also possible to
+use Open vSwitch as a simple OpenFlow switch like that provided by the
+OpenFlow reference implementation [1]. The remainder of this file describes
+how to user it in that manner.
+
+What is OpenFlow?
+-----------------
+
+OpenFlow is a flow-based switch specification designed to enable
+researchers to run experiments in live networks. OpenFlow is based on a
+simple Ethernet flow switch that exposes a standardized interface for
+adding and removing flow entries.
+
+An OpenFlow switch consists of three parts: (1) A "flow table" in
+which each flow entry is associated with an action telling the switch
+how to process the flow, (2) a "secure channel" that connects the switch
+to a remote process (a controller), allowing commands and packets to
+be sent between the controller and the switch, and (3) an OpenFlow
+protocol implementation, providing an open and standard way for a
+controller to talk to the switch.
+
+An OpenFlow switch can thus serve as a simple datapath element that
+forwards packets between ports according to flow actions defined by
+the controller using OpenFlow commands. Example actions are:
+
+ - Forward this flow's packets to the given port(s)
+ - Drop this flow's packets
+ - Encapsulate and forward this flow's packets to the controller.
+
+The OpenFlow switch is defined in detail in the OpenFlow switch
+Specification [2].
+
+Installation Procedure
+----------------------
+
+The procedure below explains how to use the Open vSwitch as a simple
+OpenFlow switch.
+
+1. Build and install the Open vSwitch kernel modules and userspace
+ programs as described in INSTALL.Linux.
+
+ It is important to run "make install", because some Open vSwitch
+ programs expect to find files in locations selected at installation
+ time.
+
+2. Load the openvswitch kernel module (which was built in step 1), e.g.:
+
+ % insmod datapath/linux-2.6/openvswitch_mod.ko
+
+ This kernel module cannot be loaded if the Linux bridge module is
+ already loaded. Thus, you may need to remove any existing bridges
+ and unload the bridge module with "rmmod bridge" before you can do
+ this.
+
+3. Create a datapath instance. The command below creates a datapath
+ identified as dp0 (see ovs-dpctl(8) for more detailed usage
+ information).
+
+ # ovs-dpctl add-dp dp0
+
+ Creating datapath dp0 creates a new network device, also named dp0.
+ This network device, called the datapath's "local port", will be
+ bridged to the physical switch ports by ovs-openflowd(8). It is
+ optionally used for in-band control as described in step 5.
+
+4. Use ovs-dpctl to attach the datapath to physical interfaces on the
+ machine. Say, for example, you want to create a trivial 2-port
+ switch using interfaces eth1 and eth2, you would issue the following
+ commands:
+
+ # ovs-dpctl add-if dp0 eth1
+ # ovs-dpctl add-if dp0 eth2
+
+ You can verify that the interfaces were successfully added by asking
+ ovs-dpctl to print the current status of datapath dp0:
+
+ # ovs-dpctl show dp0
+
+5. Arrange so that the switch can reach the controller over the network.
+ This can be done in two ways. The switch may be configured for
+ out-of-band control, which means it uses a network separate from the
+ data traffic that it controls. Alternatively, the switch may be
+ configured to contact the controller over one of the network devices
+ under its control. In-band control is often more convenient than
+ out-of-band, because it is not necessary to maintain two independent
+ networks.
+
+ - If you are using out-of-band control, at this point make sure
+ that the switch machine can reach the controller over the
+ network.
+
+ - If you are using in-band control, then at this point you must
+ configure the dp0 network device created in step 3. This
+ device is not yet bridged to any physical network (because
+ ovs-openflowd does that, and it is not yet running), so the next
+ step depends on whether connectivity is required to configure
+ the device's IP address:
+
+ * If the switch has a static IP address, you may configure
+ its IP address now, e.g.:
+
+ # ifconfig dp0 192.168.1.1
+
+ * If the switch does not have a static IP address, e.g. its
+ IP address is obtained dynamically via DHCP, then proceed
+ to the next step. The DHCP client will not be able to
+ contact the DHCP server until the secure channel has
+ started. The address will be obtained in step 7.
+
+ - If you are using in-band control with controller discovery, no
+ configuration is required at this point. You may proceed to
+ the next step.
+
+6. Run ovs-openflowd to start the secure channel connecting the datapath to
+ a remote controller. If the controller is running on host
+ 192.168.1.2 port 6633 (the default port), the ovs-openflowd invocation
+ would look like this:
+
+ # ovs-openflowd dp0 tcp:192.168.1.2
+
+ - If you are using in-band control with controller discovery, omit
+ the second argument to the ovs-openflowd command.
+
+ - If you are using out-of-band control, add --out-of-band to the
+ command line.
+
+ Using the "tcp:<controller_ip>" argument causes the switch to connect
+ in an insecure manner. Please see INSTALL.SSL for a description of
+ how to connect securely using SSL.
+
+7. If you are using in-band control with manual configuration, and the
+ switch obtains its IP address dynamically, then you may now obtain
+ the switch's IP address, e.g. by invoking a DHCP client. The
+ secure channel will only be able to connect to the controller after
+ an IP address has been obtained.
+
+8. The secure channel should connect to the controller within a few
+ seconds. It may take a little longer if controller discovery is in
+ use, because the switch must then also obtain its own IP address
+ and the controller's location via DHCP.
+
+References
+----------
+
+ [1] OpenFlow Reference Implementation.
+ <http://www.openflowswitch.org/wp/downloads/>
+
+ [2] OpenFlow Switch Specification.
+ <http://openflowswitch.org/documents/openflow-spec-latest.pdf>
Reporting Bugs
--------------
-Please report problems to ovs-bugs@openvswitch.org.
+Please report problems to bugs@openvswitch.org.
===============================================
This document describes how to build and install Open vSwitch on a
-Citrix XenServer 5.5.0 host. If you want to install Open vSwitch on a
+Citrix XenServer host. If you want to install Open vSwitch on a
generic Linux host, see INSTALL.Linux instead.
+These instructions have been tested with XenServer versions 5.5.0 and
+5.5.900.
+
Building Open vSwitch for XenServer
-----------------------------------
Reporting Bugs
--------------
-Please report problems to ovs-bugs@openvswitch.org.
+Please report problems to bugs@openvswitch.org.
--- /dev/null
+ Replacing a Linux Bridge with Open vSwitch
+ ==========================================
+
+This file documents how Open vSwitch may be used as a drop-in
+replacement for a Linux kernel bridge in an environment that includes
+elements that are tightly tied to the Linux bridge tools
+(e.g. "brctl") and architecture. We recommend directly using the
+management tools provided with Open vSwitch rather than these
+compatibility hooks for environments that are not tightly tied to the
+Linux bridging tools; they are more efficient and better reflect the
+actual operation and status.
+
+
+Installation Procedure
+----------------------
+
+The procedure below explains how to use the Open vSwitch bridge
+compatibility support. This procedure is written from the perspective
+of a system administrator manually loading and starting Open vSwitch
+in bridge compatibility mode, but of course in practice one would want
+to update system scripts to follow these steps.
+
+1. Build and install the Open vSwitch kernel modules and userspace
+ programs as described in INSTALL.Linux.
+
+ It is important to run "make install", because some Open vSwitch
+ programs expect to find files in locations selected at installation
+ time.
+
+2. Load both the openvswitch and brcompat kernel modules (which were
+ built in step 1), e.g.:
+
+ % insmod datapath/linux-2.6/openvswitch_mod.ko
+ % insmod datapath/linux-2.6/brcompat_mod.ko
+
+ These kernel modules cannot be loaded if the Linux bridge module is
+ already loaded. Thus, you may need to remove any existing bridges
+ and unload the bridge module with "rmmod bridge" before you can do
+ this. In addition, if you edit your system configuration files to
+ load these modules at boot time, it should happen before any bridge
+ configuration (e.g. before any calls to "brctl" or "ifup" of any
+ bridge interfaces), to ensure that the Open vSwitch kernel modules
+ are loaded before the Linux kernel bridge module.
+
+3. Create an initial version of the configuration file, for example
+ /etc/ovs-vswitchd.conf. This file may be empty initially or may
+ contain add any valid configuration directives described in
+ ovs-vswitchd.conf(5). However, it must exist when you start
+ ovs-vswitchd.
+
+ To create an empty configuration file:
+
+ % touch /etc/ovs-vswitchd.conf
+
+4. Start ovs-vswitchd and ovs-brcompatd, e.g.:
+
+ % ovs-vswitchd -P -D -vANY:console:EMER /etc/ovs-vswitchd.conf
+ % ovs-brcompatd -P -D -vANY:console:EMER /etc/ovs-vswitchd.conf
+
+5. Now you should be able to manage the Open vSwitch using brctl and
+ related tools. For example, you can create an Open vSwitch bridge,
+ add interfaces to it, then print information about bridges with the
+ commands:
+
+ % brctl addbr br0
+ % brctl addif br0 eth0
+ % brctl addif br0 eth1
+ % brctl show
+
+ Each of these commands actually uses or modifies the Open vSwitch
+ configuration file, then notifies the ovs-vswitchd daemon of the
+ change. For example, after executing the commands above starting
+ from an empty configuration file, "cat /etc/ovs-vswitchd.conf"
+ should show that the configuration file now contains the following:
+
+ bridge.br0.port=br0
+ bridge.br0.port=eth0
+ bridge.br0.port=eth1
ACLOCAL_AMFLAGS = -I m4
SUBDIRS = datapath
-if ENABLE_USERSPACE
AM_CPPFLAGS = $(SSL_CFLAGS)
AM_CPPFLAGS += $(NCURSES_CFLAGS)
AM_CPPFLAGS += $(PCRE_CFLAGS)
CLEANFILES =
DISTCLEANFILES =
-EXTRA_DIST = INSTALL.Linux INSTALL.XenServer INSTALL.SSL
-TESTS =
-TESTS_ENVIRONMENT =
+EXTRA_DIST = INSTALL.bridge \
+ INSTALL.Linux \
+ INSTALL.OpenFlow \
+ INSTALL.SSL \
+ INSTALL.XenServer
bin_PROGRAMS =
sbin_PROGRAMS =
bin_SCRIPTS =
SUFFIXES = .in
.in:
$(PERL) $(srcdir)/soexpand.pl -I$(srcdir) < $< | \
- sed -e 's,[@]LOGDIR[@],$(LOGDIR),g' \
+ sed \
-e 's,[@]PKIDIR[@],$(PKIDIR),g' \
+ -e 's,[@]LOGDIR[@],$(LOGDIR),g' \
+ -e 's,[@]PERL[@],$(PERL),g' \
+ -e 's,[@]PYTHON[@],$(PYTHON),g' \
-e 's,[@]RUNDIR[@],$(RUNDIR),g' \
+ -e 's,[@]VERSION[@],$(VERSION),g' \
+ -e 's,[@]localstatedir[@],$(localstatedir),g' \
-e 's,[@]pkgdatadir[@],$(pkgdatadir),g' \
- -e 's,[@]PERL[@],$(PERL),g' > $@
+ -e 's,[@]sysconfdir[@],$(sysconfdir),g' \
+ > $@.tmp
+ @if head -n 1 $@.tmp | grep -q '#!'; then \
+ echo chmod +x $@.tmp; \
+ chmod +x $@.tmp; \
+ fi
+ mv $@.tmp $@
include lib/automake.mk
-include secchan/automake.mk
+include ofproto/automake.mk
include utilities/automake.mk
include tests/automake.mk
include include/automake.mk
include debian/automake.mk
include vswitchd/automake.mk
include xenserver/automake.mk
-if HAVE_CURSES
-if HAVE_PCRE
include extras/ezio/automake.mk
-endif
-endif
-endif # ENABLE_USERSPACE
What is Open vSwitch?
---------------------
-Open vSwitch is an Ethernet switch for virtual servers with the
-following features:
-
- * NIC bonding with automatic fail-over and source MAC-based TX
- load balancing ("SLB").
-
- * 802.1Q VLAN support.
-
- * Port mirroring, with optional VLAN tagging.
-
- * NetFlow v5 flow logging.
-
- * Connectivity to an external OpenFlow controller, such as
- NOX.
+Open vSwitch is a multilayer software switch licensed under the open
+source Apache 2 license. Our goal is to implement a production
+quality switch platform that supports standard management interfaces
+(e.g. NetFlow, RSPAN, ERSPAN, IOS-like CLI), and opens the forwarding
+functions to programmatic extension and control.
+
+Open vSwitch is well suited to function as a virtual switch in VM
+environments. In addition to exposing standard control and visibility
+interfaces to the virtual networking layer, it was designed to support
+distribution across multiple physical servers. Open vSwitch supports
+multiple Linux-based virtualization technologies including
+Xen/XenServer, KVM, and VirtualBox.
+
+The bulk of the code is written in platform-independent C and is
+easily ported to other environments. The current release of Open
+vSwitch supports the following features:
+
+ * Visibility into inter-VM communication via NetFlow, SPAN, and RSPAN
+ * Standard 802.1Q VLAN model with trunking
+ * Per VM policing
+ * NIC bonding with source-MAC load balancing
+ * Kernel-based forwarding
+ * Support for OpenFlow
+ * Compatibility layer for the Linux bridging code
Open vSwitch supports Linux 2.6.15 and up, with testing focused on
2.6.18 with Centos and Xen patches and version 2.6.26 from kernel.org.
The main components of this distribution are:
- - ovs-vswitchd, a daemon that implements the virtual switch,
- along with a companion Linux kernel module for flow-based
- switching.
+ * ovs-vswitchd, a daemon that implements the switch, along with
+ a companion Linux kernel module for flow-based switching.
+
+ * ovs-brcompatd, a daemon that allows ovs-vswitchd to act as a
+ drop-in replacement for the Linux bridge in many environments,
+ along with a companion Linux kernel module to intercept bridge
+ ioctls.
- - ovs-brcompatd, a daemon that allows ovs-vswitchd to act as a
- drop-in replacement for the Linux bridge in many
- environments, along with a companion Linux kernel module to
- intercept bridge ioctls.
+ * ovs-dpctl, a tool for configuring the switch kernel module.
- - ovs-dpctl, a tool for configuring the virtual switch kernel
- module.
+ * Scripts and specs for building RPMs that allow Open vSwitch
+ to be installed on a Citrix XenServer host as a drop-in
+ replacement for its switch, with additional functionality.
- - Scripts and specs for building RPMs that allow Open vSwitch
- to be installed on a Citrix XenServer host as a drop-in
- replacement for its virtual switch, with additional
- functionality.
+ * ovs-vsctl, a utility for querying and updating the configuration
+ of ovs-vswitchd.
- - vlog-appctl, a utility that can control Open vSwitch daemons,
- adjusting their logging levels among other uses.
+ * ovs-appctl, a utility that sends commands to running Open
+ vSwitch daemons.
Open vSwitch also provides an OpenFlow implementation and tools for
those interested in OpenFlow but not additional Open vSwitch features:
- - secchan, a program that implements a simple OpenFlow switch
- (without the special features provided by ovs-vswitchd) using
- the same kernel module as ovs-vswitchd.
+ * ovs-openflowd, a program that implements a simple OpenFlow
+ switch (without the special features provided by ovs-vswitchd)
+ using the same kernel module as ovs-vswitchd.
- - ovs-controller, a simple OpenFlow controller.
+ * ovs-controller, a simple OpenFlow controller.
- - ovs-ofctl, a utility for querying and controlling OpenFlow
- switches and controllers.
+ * ovs-ofctl, a utility for querying and controlling OpenFlow
+ switches and controllers.
- - ovs-pki, a utility for creating and managing the public-key
- infrastructure for OpenFlow switches.
+ * ovs-pki, a utility for creating and managing the public-key
+ infrastructure for OpenFlow switches.
- - A patch to tcpdump that enables it to parse OpenFlow
- messages.
+ * A patch to tcpdump that enables it to parse OpenFlow messages.
What other documentation is available?
--------------------------------------
To install Open vSwitch on a regular Linux machine, read INSTALL.Linux.
+To use Open vSwitch as a drop-in replacement for the Linux bridge,
+read INSTALL.bridge.
+
To build RPMs for installing Open vSwitch on a Citrix XenServer host
or resource pool, read INSTALL.XenServer.
Contact
-------
-ovs-bugs@openvswitch.org
+bugs@openvswitch.org
http://openvswitch.org/
--- /dev/null
+Building with gcov support
+==========================
+
+The Open vSwitch "configure" script supports the following
+code-coverage related options:
+
+ --disable-coverage
+ --enable-coverage=no
+
+ Do not build with gcov code coverage support.
+
+ This is the default if no coverage option is passed to
+ "configure".
+
+ --enable-coverage
+ --enable-coverage=yes
+
+ Build with gcov code coverage support, but do not assume that any
+ coverage-related tools are installed and do not add special
+ coverage support to the test suite.
+
+ --enable-coverage=lcov
+
+ Build with gcov code coverage support, as above, but also add
+ support for coverage analysis to the test suite. Running "make
+ check" will produce a directory "tests/coverage.html" in the build
+ directory with an analysis of the test suite's coverage.
+
+ This setting requires the lcov suite of utilities to be installed.
+ The "lcov" and "genhtml" programs from lcov must be in PATH. lcov
+ is available at: http://ltp.sourceforge.net/coverage/lcov.php
--- /dev/null
+Reporting Bugs in Open vSwitch
+==============================
+
+We are eager to hear from users about problems that they have
+encountered with Open vSwitch. This file documents how best to report
+bugs so as to ensure that they can be fixed as quickly as possible.
+
+Please report bugs by sending email to bugs@openvswitch.org. Include
+as much of the following information as you can in your report:
+
+ * The Open vSwitch version number (as output by "ovs-vswitchd
+ --version").
+
+ * The Git commit number (as output by "git rev-parse HEAD"),
+ if you built from a Git snapshot.
+
+ * Any local patches or changes you have applied (if any).
+
+ * The kernel version on which Open vSwitch is running (from
+ /proc/version) and the distribution and version number of
+ your OS (e.g. "Centos 5.0").
+
+ * The contents of the vswitchd configuration file (usually
+ /etc/ovs-vswitchd.conf).
+
+ * The output of "ovs-dpctl show".
+
+ * If you have Open vSwitch configured to connect to an
+ OpenFlow controller, the output of "ovs-ofctl show <bridge>"
+ for each <bridge> configured in the vswitchd configuration
+ file.
+
+ * A description of the problem, which should include:
+
+ - What you did that make the problem appear.
+
+ - What you expected to happen.
+
+ - What actually happened.
+
+ * A fix or workaround, if you have one.
+
+ * Any other information that you think might be relevant.
+
+bugs@openvswitch.org is a public mailing list, to which anyone can
+subscribe, so please do not include confidential information in your
+bug report.
+
+Contact
+-------
+
+bugs@openvswitch.org
+http://openvswitch.org/
--- /dev/null
+How to Submit Patches for Open vSwitch
+======================================
+
+Send changes to Open vSwitch as patches to discuss@openvswitch.org.
+One patch per email, please. More details are included below.
+
+If you are using Git, then "git format-patch" takes care of most of
+the mechanics described below for you.
+
+Before You Start
+----------------
+
+Before you send patches at all, make sure that each patch makes sense.
+In particular:
+
+ - A given patch should not break anything, even if later
+ patches fix the problems that it causes. The source tree
+ should still build and work after each patch is applied.
+ (This enables "git bisect" to work best.)
+
+ - A patch should make one logical change. Don't make
+ multiple, logically unconnected changes to disparate
+ subsystems in a single patch.
+
+ - A patch that adds or removes user-visible features should
+ also update the appropriate user documentation or manpages.
+
+Testing is also important:
+
+ - A patch that adds or deletes files should be tested with
+ "make distcheck" before submission.
+
+ - A patch that modifies Linux kernel code should be at least
+ build-tested on various Linux kernel versions before
+ submission. I suggest versions 2.6.18, 2.6.27, and whatever
+ the current latest release version is at the time.
+
+ - A patch that modifies the ofproto or vswitchd code should be
+ tested in at least simple cases before submission.
+
+ - A patch that modifies xenserver code should be tested on
+ XenServer before submission.
+
+Email Subject
+-------------
+
+The subject line of your email should be in the following format:
+[PATCH <n>/<m>] <area>: <summary>
+
+ - [PATCH <n>/<m>] indicates that this is the nth of a series
+ of m patches. It helps reviewers to read patches in the
+ correct order. You may omit this prefix if you are sending
+ only one patch.
+
+ - <area>: indicates the area of the Open vSwitch to which the
+ change applies (often the name of a source file or a
+ directory). You may omit it if the change crosses multiple
+ distinct pieces of code.
+
+ - <summary> briefly describes the change.
+
+The subject, minus the [PATCH <n>/<m>] prefix, becomes the first line
+of the commit's change log message.
+
+Description
+-----------
+
+The body of the email should start with a more thorough description of
+the change. This becomes the body of the commit message, following
+the subject. There is no need to duplicate the summary given in the
+subject.
+
+Please limit lines in the description to 79 characters in width.
+
+The description should include:
+
+ - The rationale for the change.
+
+ - Design description and rationale (but this might be better
+ added as code comments).
+
+ - Testing that you performed (or testing that should be done
+ but you could not for whatever reason).
+
+There is no need to describe what the patch actually changed, if the
+reader can see it for himself.
+
+If the patch refers to a commit already in the Open vSwitch
+repository, please include both the commit number and the subject of
+the patch, e.g. 'commit 632d136c "vswitch: Remove restriction on
+datapath names."'.
+
+If you, the person sending the patch, did not write the patch
+yourself, then the very first line of the body should take the form
+"From: <author name> <author email>", followed by a blank line. This
+will automatically cause the named author to be credited with
+authorship in the repository. If others contributed to the patch, but
+are not the main authors, then please credit them as part of the
+description (e.g. "Thanks to Bob J. User for reporting this bug.").
+
+Comments
+--------
+
+If you want to include any comments in your email that should not be
+part of the commit's change log message, put them after the
+description, separated by a line that contains just "---". It may be
+helpful to include a diffstat here for changes that touch multiple
+files.
+
+Patch
+-----
+
+The patch should be in the body of the email following the descrition,
+separated by a blank line.
+
+Patches should be in "diff -up" format. We recommend that you use Git
+to produce your patches, in which case you should use the -M -C
+options to "git diff" (or other Git tools) if your patch renames or
+copies files. Quilt (http://savannah.nongnu.org/projects/quilt) might
+be useful if you do not want to use Git.
+
+Patches should be inline in the email message. Some email clients
+corrupt white space or wrap lines in patches. There are hints on how
+to configure many email clients to avoid this problem at:
+ http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=Documentation/email-clients.txt
+If you cannot convince your email client not to mangle patches, then
+sending the patch as an attachment is a second choice.
+
+Please follow the style used in the code that you are modifying. The
+CodingStyle file describes the coding style used in most of Open
+vSwitch. Use Linux kernel coding style for Linux kernel code.
+
+Example
+-------
+
+From 632d136c7b108cd3d39a2e64fe6230e23977caf8 Mon Sep 17 00:00:00 2001
+From: Ben Pfaff <blp@nicira.com>
+Date: Mon, 6 Jul 2009 10:17:54 -0700
+Subject: [PATCH] vswitch: Remove restriction on datapath names.
+
+Commit f4b96c92c "vswitch: Disallow bridges named "dpN" or "nl:N"" disabled
+naming bridges "dpN" because the vswitchd code made the bad assumption that
+the bridge's local port has the same name as the bridge, which was not
+true (at the time) for bridges named dpN. Now that assumption has been
+eliminated, so this commit eliminates the restriction too.
+
+This change is also a cleanup in that it eliminates one form of the
+vswitch's dependence on specifics of the dpif implementation.
+---
+ vswitchd/bridge.c | 23 +++++------------------
+ vswitchd/ovs-vswitchd.conf.5.in | 3 +--
+ 2 files changed, 6 insertions(+), 20 deletions(-)
+
+diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
+index 32647ea..00cffbc 100644
+--- a/vswitchd/bridge.c
++++ b/vswitchd/bridge.c
+@@ -351,32 +351,19 @@ bridge_configure_ssl(void)
+ void
+ bridge_reconfigure(void)
+ {
+- struct svec old_br, new_br, raw_new_br;
++ struct svec old_br, new_br;
+ struct bridge *br, *next;
+ size_t i, j;
+
+ COVERAGE_INC(bridge_reconfigure);
+
+- /* Collect old bridges. */
++ /* Collect old and new bridges. */
+ svec_init(&old_br);
++ svec_init(&new_br);
+ LIST_FOR_EACH (br, struct bridge, node, &all_bridges) {
+ svec_add(&old_br, br->name);
+ }
+-
+- /* Collect new bridges. */
+- svec_init(&raw_new_br);
+- cfg_get_subsections(&raw_new_br, "bridge");
+- svec_init(&new_br);
+- for (i = 0; i < raw_new_br.n; i++) {
+- const char *name = raw_new_br.names[i];
+- if (!strncmp(name, "dp", 2) && isdigit((unsigned char)name[2])) {
+- VLOG_ERR("%s is not a valid bridge name (bridges may not be "
+- "named \"dp\" followed by a digit)", name);
+- } else {
+- svec_add(&new_br, name);
+- }
+- }
+- svec_destroy(&raw_new_br);
++ cfg_get_subsections(&new_br, "bridge");
+
+ /* Get rid of deleted bridges and add new bridges. */
+ svec_sort(&old_br);
+@@ -793,7 +780,7 @@ bridge_create(const char *name)
+ br = xcalloc(1, sizeof *br);
+
+ error = dpif_create(name, &br->dpif);
+- if (error == EEXIST) {
++ if (error == EEXIST || error == EBUSY) {
+ error = dpif_open(name, &br->dpif);
+ if (error) {
+ VLOG_ERR("datapath %s already exists but cannot be opened: %s",
+diff --git a/vswitchd/ovs-vswitchd.conf.5.in b/vswitchd/ovs-vswitchd.conf.5.in
+index 5483ad5..d82a08a 100644
+--- a/vswitchd/ovs-vswitchd.conf.5.in
++++ b/vswitchd/ovs-vswitchd.conf.5.in
+@@ -50,8 +50,7 @@ configure \fBovs\-vswitchd\fR.
+ .SS "Bridge Configuration"
+ A bridge (switch) with a given \fIname\fR is configured by specifying
+ the names of its network devices as values for key
+-\fBbridge.\fIname\fB.port\fR. (The specified \fIname\fR may not begin
+-with \fBdp\fR followed by a digit.)
++\fBbridge.\fIname\fB.port\fR.
+ .PP
+ The names given on \fBbridge.\fIname\fB.port\fR must be the names of
+ existing network devices, except for ``internal ports.'' An internal
+--
+1.6.3.3
+
# See the License for the specific language governing permissions and
# limitations under the License.
-dnl Checks for --disable-userspace.
-AC_DEFUN([OVS_CHECK_USERSPACE],
- [AC_ARG_ENABLE(
- [userspace],
- [AC_HELP_STRING([--disable-userspace],
- [Disable building userspace components.])],
- [case "${enableval}" in
- (yes) build_userspace=true ;;
- (no) build_userspace=false ;;
- (*) AC_MSG_ERROR([bad value ${enableval} for --enable-userspace]) ;;
- esac],
- [build_userspace=true])
- AM_CONDITIONAL([ENABLE_USERSPACE], [$build_userspace])])
-
dnl OVS_CHECK_LINUX(OPTION, VERSION, VARIABLE, CONDITIONAL)
dnl
dnl Configure linux kernel source tree
# See the License for the specific language governing permissions and
# limitations under the License.
-AC_PREREQ(2.60)
+AC_PREREQ(2.63)
AC_INIT(openvswitch, 0.90.6, ovs-bugs@openvswitch.org)
NX_BUILDNR
AC_CONFIG_SRCDIR([datapath/datapath.c])
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_AUX_DIR([build-aux])
AC_CONFIG_HEADERS([config.h])
+AC_CONFIG_TESTDIR([tests])
AM_INIT_AUTOMAKE
AC_PROG_CC
AC_C_BIGENDIAN
AC_SYS_LARGEFILE
-OVS_CHECK_USERSPACE
+OVS_CHECK_COVERAGE
OVS_CHECK_NDEBUG
OVS_CHECK_NETLINK
OVS_CHECK_OPENSSL
OVS_CHECK_CURSES
OVS_CHECK_LINUX_VT_H
OVS_CHECK_PCRE
+OVS_CHECK_PYTHON
OVS_CHECK_IF_PACKET
OVS_CHECK_STRTOK_R
-if $build_userspace; then
- OVS_CHECK_PKIDIR
- OVS_CHECK_RUNDIR
- OVS_CHECK_MALLOC_HOOKS
- OVS_CHECK_VALGRIND
- OVS_CHECK_TTY_LOCK_DIR
- OVS_CHECK_SOCKET_LIBS
- OVS_CHECK_FAULT_LIBS
+OVS_CHECK_PKIDIR
+OVS_CHECK_RUNDIR
+OVS_CHECK_MALLOC_HOOKS
+OVS_CHECK_VALGRIND
+OVS_CHECK_TTY_LOCK_DIR
+OVS_CHECK_SOCKET_LIBS
+OVS_CHECK_FAULT_LIBS
- AC_CHECK_FUNCS([strsignal])
+AC_CHECK_FUNCS([strsignal])
- OVS_ENABLE_OPTION([-Wall])
- OVS_ENABLE_OPTION([-Wno-sign-compare])
- OVS_ENABLE_OPTION([-Wpointer-arith])
- OVS_ENABLE_OPTION([-Wdeclaration-after-statement])
- OVS_ENABLE_OPTION([-Wformat-security])
- OVS_ENABLE_OPTION([-Wswitch-enum])
- OVS_ENABLE_OPTION([-Wunused-parameter])
- OVS_ENABLE_OPTION([-Wstrict-aliasing])
- OVS_ENABLE_OPTION([-Wbad-function-cast])
- OVS_ENABLE_OPTION([-Wcast-align])
- OVS_ENABLE_OPTION([-Wstrict-prototypes])
- OVS_ENABLE_OPTION([-Wold-style-definition])
- OVS_ENABLE_OPTION([-Wmissing-prototypes])
- OVS_ENABLE_OPTION([-Wmissing-field-initializers])
- OVS_ENABLE_OPTION([-Wno-override-init])
-fi
+OVS_ENABLE_OPTION([-Wall])
+OVS_ENABLE_OPTION([-Wno-sign-compare])
+OVS_ENABLE_OPTION([-Wpointer-arith])
+OVS_ENABLE_OPTION([-Wdeclaration-after-statement])
+OVS_ENABLE_OPTION([-Wformat-security])
+OVS_ENABLE_OPTION([-Wswitch-enum])
+OVS_ENABLE_OPTION([-Wunused-parameter])
+OVS_ENABLE_OPTION([-Wstrict-aliasing])
+OVS_ENABLE_OPTION([-Wbad-function-cast])
+OVS_ENABLE_OPTION([-Wcast-align])
+OVS_ENABLE_OPTION([-Wstrict-prototypes])
+OVS_ENABLE_OPTION([-Wold-style-definition])
+OVS_ENABLE_OPTION([-Wmissing-prototypes])
+OVS_ENABLE_OPTION([-Wmissing-field-initializers])
+OVS_ENABLE_OPTION([-Wno-override-init])
AC_ARG_VAR(KARCH, [Kernel Architecture String])
AC_SUBST(KARCH)
datapath/Makefile
datapath/linux-2.6/Kbuild
datapath/linux-2.6/Makefile
-datapath/linux-2.6/Makefile.main])
+datapath/linux-2.6/Makefile.main
+tests/atlocal])
AC_OUTPUT
* when we send the packet out on the wire, and it will fail at
* that point because skb_checksum_setup() will not look inside
* an 802.1Q header. */
- skb_checksum_setup(skb);
+ vswitch_skb_checksum_setup(skb);
/* GSO is not implemented for packets with an 802.1Q header, so
* we have to do segmentation before we add that header.
* then freeing the original skbuff is wasteful. So the following code
* is slightly obscure just to avoid that. */
int prev_port = -1;
- int err = 0;
+ int err;
for (; n_actions > 0; a++, n_actions--) {
WARN_ON_ONCE(skb_shared(skb));
if (prev_port != -1) {
do_output(dp, skb, prev_port);
else
kfree_skb(skb);
- return err;
+ return 0;
}
EXPORT_SYMBOL(dp_ioctl_hook);
/* Datapaths. Protected on the read side by rcu_read_lock, on the write side
- * by dp_mutex. dp_mutex is almost completely redundant with genl_mutex
- * maintained by the Generic Netlink code, but the timeout path needs mutual
- * exclusion too.
+ * by dp_mutex.
*
* dp_mutex nests inside the RTNL lock: if you need both you must take the RTNL
* lock first.
/* Initialize kobject for bridge. This will be added as
* /sys/class/net/<devname>/brif later, if sysfs is enabled. */
- kobject_set_name(&dp->ifobj, SYSFS_BRIDGE_PORT_SUBDIR); /* "brif" */
dp->ifobj.kset = NULL;
- dp->ifobj.parent = NULL;
kobject_init(&dp->ifobj, &dp_ktype);
/* Allocate table. */
mutex_unlock(&dp_mutex);
rtnl_unlock();
-#ifdef SUPPORT_SYSFS
dp_sysfs_add_dp(dp);
-#endif
return 0;
if (p->port_no != ODPP_LOCAL)
dp_del_port(p);
-#ifdef SUPPORT_SYSFS
dp_sysfs_del_dp(dp);
-#endif
rcu_assign_pointer(dps[dp->dp_idx], NULL);
}
struct kobj_type brport_ktype = {
-#ifdef SUPPORT_SYSFS
+#ifdef CONFIG_SYSFS
.sysfs_ops = &brport_sysfs_ops,
#endif
.release = release_nbp
/* Initialize kobject for bridge. This will be added as
* /sys/class/net/<devname>/brport later, if sysfs is enabled. */
- kobject_set_name(&p->kobj, SYSFS_BRIDGE_PORT_ATTR); /* "brport" */
p->kobj.kset = NULL;
- p->kobj.parent = &p->dev->NETDEV_DEV_MEMBER.kobj;
kobject_init(&p->kobj, &brport_ktype);
dp_ifinfo_notify(RTM_NEWLINK, p);
if (copy_from_user(&port, portp, sizeof port))
goto out;
port.devname[IFNAMSIZ - 1] = '\0';
- port_no = port.port;
-
- err = -EINVAL;
- if (port_no < 0 || port_no >= DP_MAX_PORTS)
- goto out;
rtnl_lock();
dp = get_dp_locked(dp_idx);
if (!dp)
goto out_unlock_rtnl;
- err = -EEXIST;
- if (dp->ports[port_no])
- goto out_unlock_dp;
+ for (port_no = 1; port_no < DP_MAX_PORTS; port_no++)
+ if (!dp->ports[port_no])
+ goto got_port_no;
+ err = -EFBIG;
+ goto out_unlock_dp;
+got_port_no:
if (!(port.flags & ODP_PORT_INTERNAL)) {
err = -ENODEV;
dev = dev_get_by_name(&init_net, port.devname);
if (err)
goto out_put;
-#ifdef SUPPORT_SYSFS
dp_sysfs_add_if(dp->ports[port_no]);
-#endif
+
+ err = __put_user(port_no, &port.port);
out_put:
dev_put(dev);
{
ASSERT_RTNL();
-#ifdef SUPPORT_SYSFS
if (p->port_no != ODPP_LOCAL)
dp_sysfs_del_if(p);
-#endif
dp_ifinfo_notify(RTM_DELLINK, p);
p->dp->n_ports--;
#error
#endif
-#ifdef CONFIG_XEN
+#if defined(CONFIG_XEN) && LINUX_VERSION_CODE == KERNEL_VERSION(2,6,18)
/* This code is copied verbatim from net/dev/core.c in Xen's
* linux-2.6.18-92.1.10.el5.xs5.0.0.394.644. We can't call those functions
* directly because they aren't exported. */
}
}
-int skb_checksum_setup(struct sk_buff *skb)
+int vswitch_skb_checksum_setup(struct sk_buff *skb)
{
if (skb->proto_csum_blank) {
if (skb->protocol != htons(ETH_P_IP))
out:
return -EPROTO;
}
+#else
+int vswitch_skb_checksum_setup(struct sk_buff *skb) { return 0; }
+#endif /* CONFIG_XEN && linux == 2.6.18 */
+
+/* Append each packet in 'skb' list to 'queue'. There will be only one packet
+ * unless we broke up a GSO packet. */
+static int
+queue_control_packets(struct sk_buff *skb, struct sk_buff_head *queue,
+ int queue_no, u32 arg)
+{
+ struct sk_buff *nskb;
+ int port_no;
+ int err;
+
+ port_no = ODPP_LOCAL;
+ if (skb->dev) {
+ if (skb->dev->br_port)
+ port_no = skb->dev->br_port->port_no;
+ else if (is_dp_dev(skb->dev))
+ port_no = dp_dev_priv(skb->dev)->port_no;
+ }
+
+ do {
+ struct odp_msg *header;
+
+ nskb = skb->next;
+ skb->next = NULL;
+
+ /* If a checksum-deferred packet is forwarded to the
+ * controller, correct the pointers and checksum. This happens
+ * on a regular basis only on Xen, on which VMs can pass up
+ * packets that do not have their checksum computed.
+ */
+ err = vswitch_skb_checksum_setup(skb);
+ if (err)
+ goto err_kfree_skbs;
+#ifndef CHECKSUM_HW
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,22)
+ /* Until 2.6.22, the start of the transport header was
+ * also the start of data to be checksummed. Linux
+ * 2.6.22 introduced the csum_start field for this
+ * purpose, but we should point the transport header to
+ * it anyway for backward compatibility, as
+ * dev_queue_xmit() does even in 2.6.28. */
+ skb_set_transport_header(skb, skb->csum_start -
+ skb_headroom(skb));
+#endif
+ err = skb_checksum_help(skb);
+ if (err)
+ goto err_kfree_skbs;
+ }
+#else
+ if (skb->ip_summed == CHECKSUM_HW) {
+ err = skb_checksum_help(skb, 0);
+ if (err)
+ goto err_kfree_skbs;
+ }
#endif
+ err = skb_cow(skb, sizeof *header);
+ if (err)
+ goto err_kfree_skbs;
+
+ header = (struct odp_msg*)__skb_push(skb, sizeof *header);
+ header->type = queue_no;
+ header->length = skb->len;
+ header->port = port_no;
+ header->reserved = 0;
+ header->arg = arg;
+ skb_queue_tail(queue, skb);
+
+ skb = nskb;
+ } while (skb);
+ return 0;
+
+err_kfree_skbs:
+ kfree_skb(skb);
+ while ((skb = nskb) != NULL) {
+ nskb = skb->next;
+ kfree_skb(skb);
+ }
+ return err;
+}
+
int
dp_output_control(struct datapath *dp, struct sk_buff *skb, int queue_no,
u32 arg)
{
struct dp_stats_percpu *stats;
struct sk_buff_head *queue;
- int port_no;
int err;
WARN_ON_ONCE(skb_shared(skb));
if (skb_queue_len(queue) >= DP_MAX_QUEUE_LEN)
goto err_kfree_skb;
- /* If a checksum-deferred packet is forwarded to the controller,
- * correct the pointers and checksum. This happens on a regular basis
- * only on Xen (the CHECKSUM_HW case), on which VMs can pass up packets
- * that do not have their checksum computed. We also implement it for
- * the non-Xen case, but it is difficult to trigger or test this case
- * there, hence the WARN_ON_ONCE().
- */
- err = skb_checksum_setup(skb);
- if (err)
- goto err_kfree_skb;
-#ifndef CHECKSUM_HW
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- WARN_ON_ONCE(1);
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,22)
- /* Until 2.6.22, the start of the transport header was also the
- * start of data to be checksummed. Linux 2.6.22 introduced
- * the csum_start field for this purpose, but we should point
- * the transport header to it anyway for backward
- * compatibility, as dev_queue_xmit() does even in 2.6.28. */
- skb_set_transport_header(skb, skb->csum_start -
- skb_headroom(skb));
-#endif
- err = skb_checksum_help(skb);
- if (err)
- goto err_kfree_skb;
- }
-#else
- if (skb->ip_summed == CHECKSUM_HW) {
- err = skb_checksum_help(skb, 0);
- if (err)
- goto err_kfree_skb;
- }
-#endif
-
/* Break apart GSO packets into their component pieces. Otherwise
* userspace may try to stuff a 64kB packet into a 1500-byte MTU. */
if (skb_is_gso(skb)) {
}
}
- /* Figure out port number. */
- port_no = ODPP_LOCAL;
- if (skb->dev) {
- if (skb->dev->br_port)
- port_no = skb->dev->br_port->port_no;
- else if (is_dp_dev(skb->dev))
- port_no = dp_dev_priv(skb->dev)->port_no;
- }
-
- /* Append each packet to queue. There will be only one packet unless
- * we broke up a GSO packet above. */
- do {
- struct odp_msg *header;
- struct sk_buff *nskb = skb->next;
- skb->next = NULL;
-
- err = skb_cow(skb, sizeof *header);
- if (err) {
- while (nskb) {
- kfree_skb(skb);
- skb = nskb;
- nskb = skb->next;
- }
- goto err_kfree_skb;
- }
-
- header = (struct odp_msg*)__skb_push(skb, sizeof *header);
- header->type = queue_no;
- header->length = skb->len;
- header->port = port_no;
- header->reserved = 0;
- header->arg = arg;
- skb_queue_tail(queue, skb);
-
- skb = nskb;
- } while (skb);
-
+ err = queue_control_packets(skb, queue, queue_no, arg);
wake_up_interruptible(&dp->waitqueue);
- return 0;
+ return err;
err_kfree_skb:
kfree_skb(skb);
stats->n_bytes = flow->byte_count;
stats->ip_tos = flow->ip_tos;
stats->tcp_flags = flow->tcp_flags;
+ stats->error = 0;
}
static void clear_stats(struct sw_flow *flow)
if (!n_actions)
return 0;
- if (ufp->n_actions > INT_MAX / sizeof(union odp_action))
- return -EINVAL;
sf_acts = rcu_dereference(flow->sf_acts);
if (__put_user(sf_acts->n_actions, &ufp->n_actions) ||
return put_actions(flow, ufp);
}
-static int del_or_query_flow(struct datapath *dp,
- struct odp_flow __user *ufp,
- unsigned int cmd)
+static int del_flow(struct datapath *dp, struct odp_flow __user *ufp)
{
struct dp_table *table = rcu_dereference(dp->table);
struct odp_flow uf;
if (!flow)
goto error;
- if (cmd == ODP_FLOW_DEL) {
- /* XXX redundant lookup */
- error = dp_table_delete(table, flow);
- if (error)
- goto error;
+ /* XXX redundant lookup */
+ error = dp_table_delete(table, flow);
+ if (error)
+ goto error;
- /* XXX These statistics might lose a few packets, since other
- * CPUs can be using this flow. We used to synchronize_rcu()
- * to make sure that we get completely accurate stats, but that
- * blows our performance, badly. */
- dp->n_flows--;
- error = answer_query(flow, 0, ufp);
- flow_deferred_free(flow);
- } else {
- error = answer_query(flow, uf.flags, ufp);
- }
+ /* XXX These statistics might lose a few packets, since other CPUs can
+ * be using this flow. We used to synchronize_rcu() to make sure that
+ * we get completely accurate stats, but that blows our performance,
+ * badly. */
+ dp->n_flows--;
+ error = answer_query(flow, 0, ufp);
+ flow_deferred_free(flow);
error:
return error;
}
-static int query_multiple_flows(struct datapath *dp,
- const struct odp_flowvec *flowvec)
+static int query_flows(struct datapath *dp, const struct odp_flowvec *flowvec)
{
struct dp_table *table = rcu_dereference(dp->table);
int i;
flow = dp_table_lookup(table, &uf.key);
if (!flow)
- error = __clear_user(&ufp->stats, sizeof ufp->stats);
+ error = __put_user(ENOENT, &ufp->stats.error);
else
- error = answer_query(flow, 0, ufp);
+ error = answer_query(flow, uf.flags, ufp);
if (error)
return -EFAULT;
}
return err;
}
-static int
-get_dp_stats(struct datapath *dp, struct odp_stats __user *statsp)
+static int get_dp_stats(struct datapath *dp, struct odp_stats __user *statsp)
{
struct odp_stats stats;
int i;
break;
}
}
- return put_user(idx, &pvp->n_ports);
+ return put_user(dp->n_ports, &pvp->n_ports);
}
/* RCU callback for freeing a dp_port_group */
/* Handle commands with special locking requirements up front. */
switch (cmd) {
case ODP_DP_CREATE:
- return create_dp(dp_idx, (char __user *)argp);
+ err = create_dp(dp_idx, (char __user *)argp);
+ goto exit;
case ODP_DP_DESTROY:
- return destroy_dp(dp_idx);
+ err = destroy_dp(dp_idx);
+ goto exit;
case ODP_PORT_ADD:
- return add_port(dp_idx, (struct odp_port __user *)argp);
+ err = add_port(dp_idx, (struct odp_port __user *)argp);
+ goto exit;
case ODP_PORT_DEL:
err = get_user(port_no, (int __user *)argp);
- if (err)
- break;
- return del_port(dp_idx, port_no);
+ if (!err)
+ err = del_port(dp_idx, port_no);
+ goto exit;
}
dp = get_dp_locked(dp_idx);
+ err = -ENODEV;
if (!dp)
- return -ENODEV;
+ goto exit;
switch (cmd) {
case ODP_DP_STATS:
break;
case ODP_FLOW_DEL:
- case ODP_FLOW_GET:
- err = del_or_query_flow(dp, (struct odp_flow __user *)argp,
- cmd);
+ err = del_flow(dp, (struct odp_flow __user *)argp);
break;
- case ODP_FLOW_GET_MULTIPLE:
- err = do_flowvec_ioctl(dp, argp, query_multiple_flows);
+ case ODP_FLOW_GET:
+ err = do_flowvec_ioctl(dp, argp, query_flows);
break;
case ODP_FLOW_LIST:
break;
}
mutex_unlock(&dp->mutex);
+exit:
return err;
}
};
static int major;
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,27)
static struct llc_sap *dp_stp_sap;
static int dp_stp_rcv(struct sk_buff *skb, struct net_device *dev,
return 0;
}
-static int __init dp_init(void)
+static int dp_avoid_bridge_init(void)
{
- int err;
-
- printk("Open vSwitch %s, built "__DATE__" "__TIME__"\n", VERSION BUILDNR);
-
/* Register to receive STP packets because the bridge module also
* attempts to do so. Since there can only be a single listener for a
* given protocol, this provides mutual exclusion against the bridge
printk(KERN_ERR "openvswitch: can't register sap for STP (probably the bridge module is loaded)\n");
return -EADDRINUSE;
}
+ return 0;
+}
+
+static void dp_avoid_bridge_exit(void)
+{
+ llc_sap_put(dp_stp_sap);
+}
+#else /* Linux 2.6.27 or later. */
+static int dp_avoid_bridge_init(void)
+{
+ /* Linux 2.6.27 introduces a way for multiple clients to register for
+ * STP packets, which interferes with what we try to do above.
+ * Instead, just check whether there's a bridge hook defined. This is
+ * not as safe--the bridge module is willing to load over the top of
+ * us--but it provides a little bit of protection. */
+ if (br_handle_frame_hook) {
+ printk(KERN_ERR "openvswitch: bridge module is loaded, cannot load over it\n");
+ return -EADDRINUSE;
+ }
+ return 0;
+}
+
+static void dp_avoid_bridge_exit(void)
+{
+ /* Nothing to do. */
+}
+#endif /* Linux 2.6.27 or later */
+
+static int __init dp_init(void)
+{
+ int err;
+
+ printk("Open vSwitch %s, built "__DATE__" "__TIME__"\n", VERSION BUILDNR);
+
+ err = dp_avoid_bridge_init();
+ if (err)
+ return err;
err = flow_init();
if (err)
unregister_netdevice_notifier(&dp_device_notifier);
flow_exit();
br_handle_frame_hook = NULL;
- llc_sap_put(dp_stp_sap);
+ dp_avoid_bridge_exit();
}
module_init(dp_init);
#include <asm/page.h>
#include <linux/kernel.h>
#include <linux/mutex.h>
-#include <linux/netlink.h>
#include <linux/netdevice.h>
#include <linux/workqueue.h>
#include <linux/skbuff.h>
+#include <linux/version.h>
#include "flow.h"
#include "dp_sysfs.h"
}
#endif
+int vswitch_skb_checksum_setup(struct sk_buff *skb);
+
#endif /* datapath.h */
{
struct dp_dev *dp_dev = dp_dev_priv(netdev);
strcpy(info->driver, "openvswitch");
- sprintf(info->bus_info, "%d", dp_dev->dp->dp_idx);
+ sprintf(info->bus_info, "%d.%d", dp_dev->dp->dp_idx, dp_dev->port_no);
}
static struct ethtool_ops dp_ethtool_ops = {
int dp_sysfs_add_if(struct net_bridge_port *p);
int dp_sysfs_del_if(struct net_bridge_port *p);
-#include <linux/version.h>
-#if LINUX_VERSION_CODE == KERNEL_VERSION(2,6,18)
-#define SUPPORT_SYSFS 1
-#else
-/* We only support sysfs on Linux 2.6.18 because that's the only place we
- * really need it (on Xen, for brcompat) and it's a big pain to try to support
- * multiple versions. */
-#endif
-
-#ifdef SUPPORT_SYSFS
+#ifdef CONFIG_SYSFS
extern struct sysfs_ops brport_sysfs_ops;
#endif
#include "datapath.h"
#include "dp_dev.h"
-#ifdef SUPPORT_SYSFS
+#ifdef CONFIG_SYSFS
#define to_dev(obj) container_of(obj, struct device, kobj)
/* Hack to attempt to build on more platforms. */
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,21)
-#define to_kobj(d) &(d)->class_dev.kobj
#define DP_DEVICE_ATTR CLASS_DEVICE_ATTR
+#define DEVICE_PARAMS struct class_device *d
+#define DEVICE_ARGS d
+#define DEV_ATTR(NAME) class_device_attr_##NAME
#else
-#define to_kobj(d) &(d)->dev.kobj
#define DP_DEVICE_ATTR DEVICE_ATTR
+#define DEVICE_PARAMS struct device *d, struct device_attribute *attr
+#define DEVICE_ARGS d, attr
+#define DEV_ATTR(NAME) dev_attr_##NAME
#endif
/*
* Common code for storing bridge parameters.
*/
-static ssize_t store_bridge_parm(struct class_device *d,
+static ssize_t store_bridge_parm(DEVICE_PARAMS,
const char *buf, size_t len,
void (*set)(struct datapath *, unsigned long))
{
}
-static ssize_t show_forward_delay(struct class_device *d,
- char *buf)
+static ssize_t show_forward_delay(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
#endif
}
-static ssize_t store_forward_delay(struct class_device *d,
+static ssize_t store_forward_delay(DEVICE_PARAMS,
const char *buf, size_t len)
{
- return store_bridge_parm(d, buf, len, set_forward_delay);
+ return store_bridge_parm(DEVICE_ARGS, buf, len, set_forward_delay);
}
static DP_DEVICE_ATTR(forward_delay, S_IRUGO | S_IWUSR,
show_forward_delay, store_forward_delay);
-static ssize_t show_hello_time(struct class_device *d, char *buf)
+static ssize_t show_hello_time(DEVICE_PARAMS, char *buf)
{
#if 0
return sprintf(buf, "%lu\n",
#endif
}
-static ssize_t store_hello_time(struct class_device *d,
+static ssize_t store_hello_time(DEVICE_PARAMS,
const char *buf,
size_t len)
{
- return store_bridge_parm(d, buf, len, set_hello_time);
+ return store_bridge_parm(DEVICE_ARGS, buf, len, set_hello_time);
}
static DP_DEVICE_ATTR(hello_time, S_IRUGO | S_IWUSR, show_hello_time,
store_hello_time);
-static ssize_t show_max_age(struct class_device *d,
- char *buf)
+static ssize_t show_max_age(DEVICE_PARAMS, char *buf)
{
#if 0
return sprintf(buf, "%lu\n",
#endif
}
-static ssize_t store_max_age(struct class_device *d,
+static ssize_t store_max_age(DEVICE_PARAMS,
const char *buf, size_t len)
{
- return store_bridge_parm(d, buf, len, set_max_age);
+ return store_bridge_parm(DEVICE_ARGS, buf, len, set_max_age);
}
static DP_DEVICE_ATTR(max_age, S_IRUGO | S_IWUSR, show_max_age, store_max_age);
-static ssize_t show_ageing_time(struct class_device *d,
- char *buf)
+static ssize_t show_ageing_time(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
#endif
}
-static ssize_t store_ageing_time(struct class_device *d,
+static ssize_t store_ageing_time(DEVICE_PARAMS,
const char *buf, size_t len)
{
- return store_bridge_parm(d, buf, len, set_ageing_time);
+ return store_bridge_parm(DEVICE_ARGS, buf, len, set_ageing_time);
}
static DP_DEVICE_ATTR(ageing_time, S_IRUGO | S_IWUSR, show_ageing_time,
store_ageing_time);
-static ssize_t show_stp_state(struct class_device *d,
- char *buf)
+static ssize_t show_stp_state(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
}
-static ssize_t store_stp_state(struct class_device *d,
+static ssize_t store_stp_state(DEVICE_PARAMS,
const char *buf,
size_t len)
{
static DP_DEVICE_ATTR(stp_state, S_IRUGO | S_IWUSR, show_stp_state,
store_stp_state);
-static ssize_t show_priority(struct class_device *d,
- char *buf)
+static ssize_t show_priority(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
#endif
}
-static ssize_t store_priority(struct class_device *d,
+static ssize_t store_priority(DEVICE_PARAMS,
const char *buf, size_t len)
{
- return store_bridge_parm(d, buf, len, set_priority);
+ return store_bridge_parm(DEVICE_ARGS, buf, len, set_priority);
}
static DP_DEVICE_ATTR(priority, S_IRUGO | S_IWUSR, show_priority, store_priority);
-static ssize_t show_root_id(struct class_device *d,
- char *buf)
+static ssize_t show_root_id(DEVICE_PARAMS, char *buf)
{
#if 0
return br_show_bridge_id(buf, &to_bridge(d)->designated_root);
}
static DP_DEVICE_ATTR(root_id, S_IRUGO, show_root_id, NULL);
-static ssize_t show_bridge_id(struct class_device *d,
- char *buf)
+static ssize_t show_bridge_id(DEVICE_PARAMS, char *buf)
{
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
const unsigned char *addr = dp->ports[ODPP_LOCAL]->dev->dev_addr;
}
static DP_DEVICE_ATTR(bridge_id, S_IRUGO, show_bridge_id, NULL);
-static ssize_t show_root_port(struct class_device *d,
- char *buf)
+static ssize_t show_root_port(DEVICE_PARAMS, char *buf)
{
#if 0
return sprintf(buf, "%d\n", to_bridge(d)->root_port);
}
static DP_DEVICE_ATTR(root_port, S_IRUGO, show_root_port, NULL);
-static ssize_t show_root_path_cost(struct class_device *d,
- char *buf)
+static ssize_t show_root_path_cost(DEVICE_PARAMS, char *buf)
{
#if 0
return sprintf(buf, "%d\n", to_bridge(d)->root_path_cost);
}
static DP_DEVICE_ATTR(root_path_cost, S_IRUGO, show_root_path_cost, NULL);
-static ssize_t show_topology_change(struct class_device *d,
- char *buf)
+static ssize_t show_topology_change(DEVICE_PARAMS, char *buf)
{
#if 0
return sprintf(buf, "%d\n", to_bridge(d)->topology_change);
}
static DP_DEVICE_ATTR(topology_change, S_IRUGO, show_topology_change, NULL);
-static ssize_t show_topology_change_detected(struct class_device *d,
- char *buf)
+static ssize_t show_topology_change_detected(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
static DP_DEVICE_ATTR(topology_change_detected, S_IRUGO,
show_topology_change_detected, NULL);
-static ssize_t show_hello_timer(struct class_device *d,
- char *buf)
+static ssize_t show_hello_timer(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
}
static DP_DEVICE_ATTR(hello_timer, S_IRUGO, show_hello_timer, NULL);
-static ssize_t show_tcn_timer(struct class_device *d,
- char *buf)
+static ssize_t show_tcn_timer(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
}
static DP_DEVICE_ATTR(tcn_timer, S_IRUGO, show_tcn_timer, NULL);
-static ssize_t show_topology_change_timer(struct class_device *d,
- char *buf)
+static ssize_t show_topology_change_timer(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
static DP_DEVICE_ATTR(topology_change_timer, S_IRUGO, show_topology_change_timer,
NULL);
-static ssize_t show_gc_timer(struct class_device *d,
- char *buf)
+static ssize_t show_gc_timer(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
}
static DP_DEVICE_ATTR(gc_timer, S_IRUGO, show_gc_timer, NULL);
-static ssize_t show_group_addr(struct class_device *d,
- char *buf)
+static ssize_t show_group_addr(DEVICE_PARAMS, char *buf)
{
#if 0
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
#endif
}
-static ssize_t store_group_addr(struct class_device *d,
+static ssize_t store_group_addr(DEVICE_PARAMS,
const char *buf, size_t len)
{
struct datapath *dp = dp_dev_get_dp(to_net_dev(d));
show_group_addr, store_group_addr);
static struct attribute *bridge_attrs[] = {
- &class_device_attr_forward_delay.attr,
- &class_device_attr_hello_time.attr,
- &class_device_attr_max_age.attr,
- &class_device_attr_ageing_time.attr,
- &class_device_attr_stp_state.attr,
- &class_device_attr_priority.attr,
- &class_device_attr_bridge_id.attr,
- &class_device_attr_root_id.attr,
- &class_device_attr_root_path_cost.attr,
- &class_device_attr_root_port.attr,
- &class_device_attr_topology_change.attr,
- &class_device_attr_topology_change_detected.attr,
- &class_device_attr_hello_timer.attr,
- &class_device_attr_tcn_timer.attr,
- &class_device_attr_topology_change_timer.attr,
- &class_device_attr_gc_timer.attr,
- &class_device_attr_group_addr.attr,
+ &DEV_ATTR(forward_delay).attr,
+ &DEV_ATTR(hello_time).attr,
+ &DEV_ATTR(max_age).attr,
+ &DEV_ATTR(ageing_time).attr,
+ &DEV_ATTR(stp_state).attr,
+ &DEV_ATTR(priority).attr,
+ &DEV_ATTR(bridge_id).attr,
+ &DEV_ATTR(root_id).attr,
+ &DEV_ATTR(root_path_cost).attr,
+ &DEV_ATTR(root_port).attr,
+ &DEV_ATTR(topology_change).attr,
+ &DEV_ATTR(topology_change_detected).attr,
+ &DEV_ATTR(hello_timer).attr,
+ &DEV_ATTR(tcn_timer).attr,
+ &DEV_ATTR(topology_change_timer).attr,
+ &DEV_ATTR(gc_timer).attr,
+ &DEV_ATTR(group_addr).attr,
NULL
};
*/
int dp_sysfs_add_dp(struct datapath *dp)
{
- struct kobject *kobj = to_kobj(dp->ports[ODPP_LOCAL]->dev);
+ struct kobject *kobj = &dp->ports[ODPP_LOCAL]->dev->NETDEV_DEV_MEMBER.kobj;
int err;
/* Create /sys/class/net/<devname>/bridge directory. */
}
/* Create /sys/class/net/<devname>/brif directory. */
- dp->ifobj.parent = kobj;
- err = kobject_add(&dp->ifobj);
+ err = kobject_add(&dp->ifobj, kobj, SYSFS_BRIDGE_PORT_SUBDIR);
if (err) {
pr_info("%s: can't add kobject (directory) %s/%s\n",
- __FUNCTION__, dp_name(dp), dp->ifobj.name);
+ __FUNCTION__, dp_name(dp), kobject_name(&dp->ifobj));
goto out2;
}
kobject_uevent(&dp->ifobj, KOBJ_ADD);
int dp_sysfs_del_dp(struct datapath *dp)
{
- struct kobject *kobj = to_kobj(dp->ports[ODPP_LOCAL]->dev);
+ struct kobject *kobj = &dp->ports[ODPP_LOCAL]->dev->NETDEV_DEV_MEMBER.kobj;
kobject_del(&dp->ifobj);
sysfs_remove_group(kobj, &bridge_group);
return 0;
}
-#else /* !SUPPORT_SYSFS */
+#else /* !CONFIG_SYSFS */
int dp_sysfs_add_dp(struct datapath *dp) { return 0; }
int dp_sysfs_del_dp(struct datapath *dp) { return 0; }
int dp_sysfs_add_if(struct net_bridge_port *p) { return 0; }
-int dp_sysfs_del_if(struct net_bridge_port *p)
-{
- dev_put(p->dev);
- kfree(p);
- return 0;
-}
-#endif /* !SUPPORT_SYSFS */
+int dp_sysfs_del_if(struct net_bridge_port *p) { return 0; }
+#endif /* !CONFIG_SYSFS */
#include "dp_sysfs.h"
#include "datapath.h"
-#ifdef SUPPORT_SYSFS
+#ifdef CONFIG_SYSFS
struct brport_attribute {
struct attribute attr;
int err;
/* Create /sys/class/net/<devname>/brport directory. */
- err = kobject_add(&p->kobj);
+ err = kobject_add(&p->kobj, &p->dev->NETDEV_DEV_MEMBER.kobj,
+ SYSFS_BRIDGE_PORT_ATTR);
if (err)
- goto err_put;
+ goto err;
/* Create symlink from /sys/class/net/<devname>/brport/bridge to
* /sys/class/net/<bridgename>. */
kobject_uevent(&p->kobj, KOBJ_ADD);
- return err;
+ return 0;
err_del:
kobject_del(&p->kobj);
-err_put:
- kobject_put(&p->kobj);
-
- /* Ensure that dp_sysfs_del_if becomes a no-op. */
- p->kobj.dentry = NULL;
+err:
+ p->linkname[0] = 0;
return err;
}
{
if (p->linkname[0]) {
sysfs_remove_link(&p->dp->ifobj, p->linkname);
- p->linkname[0] = '\0';
- }
- if (p->kobj.dentry) {
kobject_uevent(&p->kobj, KOBJ_REMOVE);
kobject_del(&p->kobj);
+ p->linkname[0] = '\0';
}
return 0;
}
-#endif /* SUPPORT_SYSFS */
+#endif /* CONFIG_SYSFS */
#include_next <linux/kobject.h>
#include <linux/version.h>
+
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,25)
#define kobject_init(kobj, ktype) rpl_kobject_init(kobj, ktype)
static inline void rpl_kobject_init(struct kobject *kobj, struct kobj_type *ktype)
kobj->ktype = ktype;
(kobject_init)(kobj);
}
+
+#define kobject_add(kobj, parent, name) rpl_kobject_add(kobj, parent, name)
+static inline int rpl_kobject_add(struct kobject *kobj,
+ struct kobject *parent,
+ const char *name)
+{
+ int err = kobject_set_name(kobj, "%s", name);
+ if (err)
+ return err;
+ kobj->parent = parent;
+ return (kobject_add)(kobj);
+}
#endif
+
#endif /* linux/kobject.h wrapper */
* Development version.
- -- Open vSwitch developers <ovs-dev@openvswitch.org> Mon, 19 Nov 2007 14:57:52 -0800
+ -- Open vSwitch developers <dev@openvswitch.org> Mon, 19 Nov 2007 14:57:52 -0800
Source: openvswitch
Section: net
Priority: extra
-Maintainer: Open vSwitch developers <ovs-dev@openvswitch.org>
+Maintainer: Open vSwitch developers <dev@openvswitch.org>
Build-Depends: debhelper (>= 5), autoconf (>= 2.60), automake1.10, libssl-dev, pkg-config (>= 0.21), po-debconf, bzip2, openssl, libncurses5-dev, libpcre3-dev
Standards-Version: 3.7.3
from it using module-assistant or make-kpkg. README.Debian in this
package provides further instructions.
.
- Open vSwitch is a software-based Ethernet switch targeted at virtual
- servers.
+ Open vSwitch is a full-featured software-based Ethernet switch.
Package: openvswitch-common
Architecture: any
openvswitch-common provides components required by both openvswitch-switch
and openvswitch-controller.
.
- Open vSwitch is a software-based Ethernet switch targeted at virtual
- servers.
+ Open vSwitch is a full-featured software-based Ethernet switch.
Package: openvswitch-switch
Architecture: any
Suggests: openvswitch-datapath-module
-Depends: ${shlibs:Depends}, ${misc:Depends}, openvswitch-common, dhcp3-client, module-init-tools, dmidecode, procps, debianutils
+Depends: ${shlibs:Depends}, ${misc:Depends}, openvswitch-common (= ${source:Version}), dhcp3-client, module-init-tools, dmidecode, procps, debianutils
Description: Open vSwitch switch implementations
openvswitch-switch provides the userspace components and utilities for
the Open vSwitch kernel-based switch.
.
- Open vSwitch is a software-based Ethernet switch targeted at virtual
- servers.
+ Open vSwitch is a full-featured software-based Ethernet switch.
Package: openvswitch-switch-config
Architecture: any
-Depends: ${shlibs:Depends}, ${misc:Depends}, openvswitch-switch, libwww-perl, libdigest-sha1-perl
+Depends: ${shlibs:Depends}, ${misc:Depends}, openvswitch-switch (= ${source:Version}), libwww-perl, libdigest-sha1-perl
Description: Open vSwitch switch implementations
openvswitch-switch-config provides a utility for interactively configuring
the Open vSwitch switch provided in the openvswitch-switch package.
.
- Open vSwitch is a software-based Ethernet switch targeted at virtual
- servers.
+ Open vSwitch is a full-featured software-based Ethernet switch.
Package: openvswitch-switchui
Architecture: any
Package: openvswitch-pki
Architecture: all
-Depends: ${shlibs:Depends}, ${misc:Depends}, openvswitch-common
+Depends: ${shlibs:Depends}, ${misc:Depends}, openvswitch-common (= ${source:Version})
Description: Open vSwitch public key infrastructure
openvswitch-pki provides PKI (public key infrastructure) support for
Open vSwitch switches and controllers, reducing the risk of
man-in-the-middle attacks on the Open vSwitch network infrastructure.
.
- Open vSwitch is a software-based Ethernet switch targeted at virtual
- servers.
+ Open vSwitch is a full-featured software-based Ethernet switch.
Package: openvswitch-pki-server
Architecture: all
-Depends: ${shlibs:Depends}, ${misc:Depends}, ${perl:Depends}, openvswitch-pki, apache2
+Depends: ${shlibs:Depends}, ${misc:Depends}, ${perl:Depends}, openvswitch-pki (= ${source:Version}), apache2
Description: Open vSwitch public key infrastructure (HTTP server support)
openvswitch-pki-server provides HTTP access to the Open vSwitch PKI (public
key infrastructure) maintained on the local machine by the
convenient OpenFlow switch setup using the ovs-switch-setup program
in the openvswitch-switch package.
.
- Open vSwitch is a software-based Ethernet switch targeted at virtual
- servers.
+ Open vSwitch is a full-featured software-based Ethernet switch.
Package: openvswitch-controller
Architecture: any
-Depends: ${shlibs:Depends}, openvswitch-common, openvswitch-pki
+Depends: ${shlibs:Depends}, openvswitch-common (= ${source:Version}), openvswitch-pki (= ${source:Version})
Description: Open vSwitch controller implementation
The Open vSwitch controller enables OpenFlow switches that connect to it
to act as MAC-learning Ethernet switches.
.
- Open vSwitch is a software-based Ethernet switch targeted at virtual
- servers.
+ Open vSwitch is a full-featured software-based Ethernet switch.
Package: corekeeper
Architecture: all
Recommends: openvswitch-switch
Depends: ${shlibs:Depends}, ${misc:Depends}
Description: Monitor utility for Open vSwitch switches
- The ovs-monitor utility included in this package monitors the secure
- channel and datapath. If either become unresponsive, the switch is
- rebooted.
+ The ovs-monitor utility included in this package monitors the
+ ovs-openflowd process and the kernel datapath. If either become
+ unresponsive, it reboots the machine.
Package: openvswitch-wdt
Architecture: any
Source: openvswitch
Section: net
Priority: extra
-Maintainer: Open vSwitch developers <ovs-dev@openvswitch.org>
+Maintainer: Open vSwitch developers <dev@openvswitch.org>
Build-Depends: debhelper (>= 5.0.37)
Standards-Version: 3.7.3
* To enable OpenFlow switches to automatically discover the location
of the controller, you must install and configure a DHCP server.
- The secchan(8) manpage (found in the openvswitch-switch package) gives
- a working example configuration file for the ISC DHCP server.
+ The ovs-openflowd(8) manpage (found in the openvswitch-switch
+ package) gives a working example configuration file for the ISC DHCP
+ server.
- -- Ben Pfaff <blp@nicira.com>, Mon, 11 May 2009 13:26:38 -0700
+ -- Ben Pfaff <blp@nicira.com>, Wed, 8 Jul 2009 09:39:53 -0700
# it reboots the system. A value of zero disables the monitor.
THRESHOLD=3
-# INTERVAL: The number of seconds to wait between probing secchan and
-# the datapath.
+# INTERVAL: The number of seconds to wait between probing
+# ovs-openflowd and the datapath.
INTERVAL=1
# LOG_FILE: File to log messages related to monitoring.
LOG_FILE="/var/log/openvswitch/monitor"
-# SWITCH_VCONN: The vconn used to connect to the switch (secchan).
-# The secchan must be configured to listen to this vconn. The default
-# here set is also listened to by default by the openvswitch-switch
-# package, so ordinarily there is no need to modify this.
-SWITCH_VCONN="/var/run/secchan.mgmt"
+# SWITCH_VCONN: The vconn used to connect to the switch
+# (ovs-openflowd). The ovs-openflowd must be configured to listen to
+# this vconn. The default here set is also listened to by default by
+# the openvswitch-switch package, so ordinarily there is no need to
+# modify this.
+SWITCH_VCONN="/var/run/ovs-openflowd.mgmt"
The setup program will now attempt to discover the OpenFlow controller.
Controller discovery may take up to 30 seconds. Please be patient.
.
- See secchan(8) for instructions on how to configure a DHCP server for
+ See ovs-openflowd(8) for instructions on how to configure a DHCP server for
controller discovery.
Template: openvswitch-switch/discovery-failure
The controller's location could not be determined automatically.
.
Ensure that the OpenFlow DHCP server is properly configured. See
- secchan(8) for instructions on how to configure a DHCP server for
+ ovs-openflowd(8) for instructions on how to configure a DHCP server for
controller discovery.
Template: openvswitch-switch/discovery-success
### END INIT INFO
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
-DAEMON=/usr/sbin/secchan
-NAME=secchan
-DESC=secchan
+DAEMON=/usr/sbin/ovs-openflowd
+NAME=ovs-openflowd
+DESC=ovs-openflowd
test -x $DAEMON || exit 0
# let some servers to die gracefully and
# 'restart' will not work
-# Include secchan defaults if available
+# Include ovs-openflowd defaults if available
unset NETDEVS
unset MODE
unset SWITCH_IP
check_op "Setting core limit to $CORE_LIMIT" ulimit -c "$CORE_LIMIT"
fi
- # Compose secchan options.
+ # Compose ovs-openflowd options.
set --
set -- "$@" --verbose=ANY:console:emer --verbose=ANY:syslog:err
set -- "$@" --log-file
-_debian/secchan/secchan usr/sbin
+_debian/utilities/ovs-openflowd usr/sbin
_debian/utilities/ovs-dpctl usr/sbin
_debian/utilities/ovs-discover usr/sbin
_debian/utilities/ovs-kill usr/sbin
-/var/log/openvswitch/secchan.log {
+/var/log/openvswitch/ovs-openflowd.log {
daily
compress
create 640 root adm
missingok
rotate 30
postrotate
- ovs-appctl --target /var/run/secchan.pid --reopen
+ ovs-appctl --target=ovs-openflowd vlog/reopen
endscript
}
-_debian/secchan/secchan.8
+_debian/utilities/ovs-openflowd.8
_debian/utilities/ovs-discover.8
_debian/utilities/ovs-dpctl.8
_debian/utilities/ovs-kill.8
# This is a POSIX shell fragment -*- sh -*-
-# To configure the secure channel, fill in the following properly and
-# uncomment them. Afterward, the secure channel will come up
+# To configure the OpenFlow switch, fill in the following properly and
+# uncomment them. Afterward, the switch will come up
# automatically at boot time. It can be started immediately with
# /etc/init.d/openvswitch-switch start
# Alternatively, use the ovs-switch-setup program (from the
# Set CACERT_MODE to 'secure' or 'bootstrap' for these respective cases.
#CACERT_MODE=secure
-# MGMT_VCONNS: List of vconns (space-separated) on which secchan
+# MGMT_VCONNS: List of vconns (space-separated) on which ovs-openflowd
# should listen for management connections from ovs-ofctl, etc.
# openvswitch-switchui by default connects to
-# unix:/var/run/secchan.mgmt, so do not disable this if you want to
+# unix:/var/run/ovs-openflowd.mgmt, so do not disable this if you want to
# use openvswitch-switchui.
-MGMT_VCONNS="punix:/var/run/secchan.mgmt"
+MGMT_VCONNS="punix:/var/run/ovs-openflowd.mgmt"
# COMMANDS: Access control list for the commands that can be executed
# remotely over the OpenFlow protocol, as a comma-separated list of
#DISCONNECTED_MODE=switch
# STP: Enable or disabled 802.1D-1998 Spanning Tree Protocol. Set to
-# 'yes' to enable STP, 'no' to disable it. If unset, secchan's
+# 'yes' to enable STP, 'no' to disable it. If unset, ovs-openflowd's
# current default is 'no' (but this may change in the future).
#STP=no
#RATE_LIMIT=1000
# INACTIVITY_PROBE: The maximum number of seconds of inactivity on the
-# controller connection before secchan sends an inactivity probe
+# controller connection before ovs-openflowd sends an inactivity probe
# message to the controller. The valid range is 5 and up. If unset,
-# secchan defaults to 5 seconds.
+# ovs-openflowd defaults to 5 seconds.
#INACTIVITY_PROBE=5
-# MAX_BACKOFF: The maximum time that secchan will wait between
+# MAX_BACKOFF: The maximum time that ovs-openflowd will wait between
# attempts to connect to the controller. The valid range is 1 and up.
-# If unset, secchan defaults to 8 seconds.
+# If unset, ovs-openflowd defaults to 8 seconds.
#MAX_BACKOFF=8
-# DAEMON_OPTS: Additional options to pass to secchan, e.g. "--fail=open"
+# DAEMON_OPTS: Additional options to pass to ovs-openflowd, e.g. "--fail=open"
DAEMON_OPTS=""
# CORE_LIMIT: Maximum size for core dumps.
# This is a POSIX shell fragment -*- sh -*-
-# To configure the switch monitor, modify the following. Afterward,
-# the secure channel will come up automatically at boot time. It can
+# To configure the switch UI, modify the following. Afterward,
+# the switch UI will come up automatically at boot time. It can
# be restarted immediately with
# /etc/init.d/openvswitch-switchui start
# sourced by /etc/init.d/openvswitch-switchui
# installed at /etc/default/openvswitch-switchui by the maintainer scripts
-# SWITCH_VCONN: The vconn used to connect to the switch (secchan).
-# The secchan must be configured to listen to this vconn. The default
+# SWITCH_VCONN: The vconn used to connect to the switch (ovs-openflowd).
+# The ovs-openflowd must be configured to listen to this vconn. The default
# here set is also listened to by default by the openvswitch-switch
# package, so ordinarily there is no need to modify this.
-SWITCH_VCONN="unix:/var/run/secchan.mgmt"
+SWITCH_VCONN="unix:/var/run/ovs-openflowd.mgmt"
# EZIO3_DEVICE: To display the switch monitor on an EZIO3 (aka
# MTB-134) 16x2 LCD displays found on server appliances made by
.BR ovs\-dpctl (8),
.BR ovs-pki (8),
-.BR secchan (8)
+.BR ovs-openflowd (8)
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
-"Report-Msgid-Bugs-To: ovs-dev@openvswitch.org\n"
+"Report-Msgid-Bugs-To: dev@openvswitch.org\n"
"POT-Creation-Date: 2009-05-11 13:38-0700\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
#. Description
#: ../openvswitch-switch-config.templates:5001
msgid ""
-"See secchan(8) for instructions on how to configure a DHCP server for "
+"See ovs-openflowd(8) for instructions on how to configure a DHCP server for "
"controller discovery."
msgstr ""
#. Description
#: ../openvswitch-switch-config.templates:6001
msgid ""
-"Ensure that the OpenFlow DHCP server is properly configured. See secchan(8) "
+"Ensure that the OpenFlow DHCP server is properly configured. See ovs-openflowd(8) "
"for instructions on how to configure a DHCP server for controller discovery."
msgstr ""
# without warranty of any kind.
EXTRA_DIST += extras/ezio/ezio3.ti
+
+if HAVE_CURSES
+if HAVE_PCRE
install-data-hook:
@echo tic -x $(srcdir)/extras/ezio/ezio3.ti
@if ! tic -x $(srcdir)/extras/ezio/ezio3.ti; then \
$(PCRE_LIBS) \
$(SSL_LIBS) \
-lm
+endif # HAVE_PCRE
+endif # HAVE_CURSES
static void show_flows(struct rconn *);
static void show_dpid_ip(struct rconn *, const struct dict *);
-static void show_secchan_state(const struct dict *);
+static void show_ofproto_state(const struct dict *);
static void show_fail_open_state(const struct dict *);
static void show_discovery_state(const struct dict *);
static void show_remote_state(const struct dict *);
if (!show_reboot_state()) {
show_flows(rconn);
show_dpid_ip(rconn, dict);
- show_secchan_state(dict);
+ show_ofproto_state(dict);
show_fail_open_state(dict);
show_discovery_state(dict);
show_remote_state(dict);
}
static void
-show_secchan_state(const struct dict *dict)
+show_ofproto_state(const struct dict *dict)
{
static struct message *msg;
const char *is_connected;
retval = netdev_open(name, NETDEV_ETH_TYPE_NONE, &netdev);
if (!retval) {
- bool exclude = netdev_get_in4(netdev, NULL, NULL);
+ bool exclude = netdev_get_in4(netdev, NULL, NULL) == 0;
netdev_close(netdev);
if (exclude) {
continue;
out = prompt("Ctlr rate limit:", in,
"^(Disabled|("NUM100_TO_99999_RE")/s)$");
free(in);
- config.rate_limit = isdigit(out[0]) ? atoi(out) : -1;
+ config.rate_limit
+ = isdigit((unsigned char)out[0]) ? atoi(out) : -1;
free(out);
break;
out = prompt("Activity probe:", in,
"^(Default|("NUM5_TO_99999_RE") s)$");
free(in);
- config.inactivity_probe = isdigit(out[0]) ? atoi(out) : -1;
+ config.inactivity_probe
+ = isdigit((unsigned char)out[0]) ? atoi(out) : -1;
free(out);
break;
out = prompt("Max backoff:", in,
"^(Default|("NUM1_TO_99999_RE") s)$");
free(in);
- config.max_backoff = isdigit(out[0]) ? atoi(out) : -1;
+ config.max_backoff
+ = isdigit((unsigned char)out[0]) ? atoi(out) : -1;
free(out);
break;
}
/* Action structure for OFPAT_OUTPUT, which sends packets out 'port'.
* When the 'port' is the OFPP_CONTROLLER, 'max_len' indicates the max
- * number of bytes to send. A 'max_len' of zero means the entire packet
- * should be sent. */
+ * number of bytes to send. A 'max_len' of zero means no bytes of the
+ * packet should be sent. */
struct ofp_action_output {
uint16_t type; /* OFPAT_OUTPUT. */
uint16_t len; /* Length is 8. */
* ----------------------------------------------------------------------
*/
-/* Protocol between secchan and datapath. */
+/* Protocol between userspace and kernel datapath. */
#ifndef OPENVSWITCH_DATAPATH_PROTOCOL_H
#define OPENVSWITCH_DATAPATH_PROTOCOL_H 1
#define ODP_PORT_GROUP_GET _IOWR('O', 12, struct odp_port_group)
#define ODP_FLOW_GET _IOWR('O', 13, struct odp_flow)
-#define ODP_FLOW_GET_MULTIPLE _IOWR('O', 14, struct odp_flowvec)
+#define ODP_FLOW_PUT _IOWR('O', 14, struct odp_flow)
#define ODP_FLOW_LIST _IOWR('O', 15, struct odp_flowvec)
-
#define ODP_FLOW_FLUSH _IO('O', 16)
-#define ODP_FLOW_PUT _IOWR('O', 17, struct odp_flow)
-#define ODP_FLOW_DEL _IOWR('O', 18, struct odp_flow)
+#define ODP_FLOW_DEL _IOWR('O', 17, struct odp_flow)
-#define ODP_EXECUTE _IOR('O', 19, struct odp_execute)
+#define ODP_EXECUTE _IOR('O', 18, struct odp_execute)
struct odp_stats {
/* Flows. */
__u32 used_nsec;
__u8 tcp_flags;
__u8 ip_tos;
- __u16 reserved;
+ __u16 error; /* Used by ODP_FLOW_GET. */
};
struct odp_flow_key {
lib/dhcp.h \
lib/dhparams.h \
lib/dirs.h \
+ lib/dpif-linux.c \
+ lib/dpif-netdev.c \
+ lib/dpif-provider.h \
+ lib/dpif.c \
+ lib/dpif.h \
lib/dynamic-string.c \
lib/dynamic-string.h \
lib/fatal-signal.c \
lib/list.h \
lib/mac-learning.c \
lib/mac-learning.h \
+ lib/netdev-linux.c \
+ lib/netdev-provider.h \
lib/netdev.c \
lib/netdev.h \
lib/odp-util.c \
lib/random.h \
lib/rconn.c \
lib/rconn.h \
+ lib/rtnetlink.c \
+ lib/rtnetlink.h \
lib/sat-math.h \
lib/sha1.c \
lib/sha1.h \
if HAVE_NETLINK
lib_libopenvswitch_a_SOURCES += \
- lib/dpif.c \
- lib/dpif.h \
lib/netlink-protocol.h \
lib/netlink.c \
lib/netlink.h
lib/hmap.c \
lib/mac-learning.c \
lib/netdev.c \
+ lib/netdev-linux.c \
lib/netlink.c \
lib/odp-util.c \
lib/poll-loop.c \
lib/process.c \
lib/rconn.c \
+ lib/rtnetlink.c \
lib/timeval.c \
lib/unixctl.c \
lib/util.c \
lib/vconn.c \
- secchan/ofproto.c \
- secchan/pktbuf.c \
+ ofproto/ofproto.c \
+ ofproto/pktbuf.c \
vswitchd/bridge.c \
vswitchd/mgmt.c \
vswitchd/ovs-brcompatd.c
/*
- * Copyright (c) 2008 Nicira Networks.
+ * Copyright (c) 2008, 2009 Nicira Networks.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
static inline unsigned long *
bitmap_allocate(size_t n_bits)
{
- return xcalloc(1, ROUND_UP(n_bits, BITMAP_ULONG_BITS));
+ size_t n_longs = DIV_ROUND_UP(n_bits, BITMAP_ULONG_BITS);
+ return xcalloc(sizeof(unsigned long int), n_longs);
}
static inline void
}
if (!error && router.s_addr) {
- error = netdev_add_router(router);
+ error = netdev_add_router(cli->netdev, router);
if (error) {
VLOG_ERR("failed to add default route to "IP_FMT" on %s: %s",
IP_ARGS(&router), netdev_get_name(cli->netdev),
msg->xid = cli->xid;
msg->secs = cli->secs;
msg->type = type;
- memcpy(msg->chaddr, netdev_get_etheraddr(cli->netdev), ETH_ADDR_LEN);
+ netdev_get_etheraddr(cli->netdev, msg->chaddr);
}
/* If time goes backward this returns a large number, which makes it look like
static bool
do_receive_msg(struct dhclient *cli, struct dhcp_msg *msg)
{
+ uint8_t cli_mac[ETH_ADDR_LEN];
struct ofpbuf b;
+ int mtu;
- ofpbuf_init(&b, netdev_get_mtu(cli->netdev) + VLAN_ETH_HEADER_LEN);
+ netdev_get_mtu(cli->netdev, &mtu);
+ ofpbuf_init(&b, mtu + VLAN_ETH_HEADER_LEN);
+ netdev_get_etheraddr(cli->netdev, cli_mac);
for (; cli->received < 50; cli->received++) {
const struct ip_header *ip;
const struct dhcp_header *dhcp;
|| flow.nw_proto != IP_TYPE_UDP
|| flow.tp_dst != htons(DHCP_CLIENT_PORT)
|| !(eth_addr_is_broadcast(flow.dl_dst)
- || eth_addr_equals(flow.dl_dst,
- netdev_get_etheraddr(cli->netdev)))) {
+ || eth_addr_equals(flow.dl_dst, cli_mac))) {
continue;
}
dhcp_assemble(msg, &b);
- memcpy(eh.eth_src, netdev_get_etheraddr(cli->netdev), ETH_ADDR_LEN);
+ netdev_get_etheraddr(cli->netdev, eh.eth_src);
memcpy(eh.eth_dst, eth_addr_broadcast, ETH_ADDR_LEN);
eh.eth_type = htons(ETH_TYPE_IP);
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "dpif.h"
+
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <net/if.h>
+#include <linux/ethtool.h>
+#include <linux/rtnetlink.h>
+#include <linux/sockios.h>
+#include <stdlib.h>
+#include <sys/ioctl.h>
+#include <unistd.h>
+
+#include "dpif-provider.h"
+#include "ofpbuf.h"
+#include "poll-loop.h"
+#include "rtnetlink.h"
+#include "svec.h"
+#include "util.h"
+
+#include "vlog.h"
+#define THIS_MODULE VLM_dpif_linux
+
+/* Datapath interface for the openvswitch Linux kernel module. */
+struct dpif_linux {
+ struct dpif dpif;
+ int fd;
+
+ /* Used by dpif_linux_get_all_names(). */
+ char *local_ifname;
+ int minor;
+
+ /* Change notification. */
+ int local_ifindex; /* Ifindex of local port. */
+ struct svec changed_ports; /* Ports that have changed. */
+ struct rtnetlink_notifier port_notifier;
+ bool change_error;
+};
+
+static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(9999, 5);
+
+static int do_ioctl(const struct dpif *, int cmd, const void *arg);
+static int lookup_minor(const char *name, int *minor);
+static int finish_open(struct dpif *, const char *local_ifname);
+static int get_openvswitch_major(void);
+static int create_minor(const char *name, int minor, struct dpif **dpifp);
+static int open_minor(int minor, struct dpif **dpifp);
+static int make_openvswitch_device(int minor, char **fnp);
+static void dpif_linux_port_changed(const struct rtnetlink_change *,
+ void *dpif);
+
+static struct dpif_linux *
+dpif_linux_cast(const struct dpif *dpif)
+{
+ dpif_assert_class(dpif, &dpif_linux_class);
+ return CONTAINER_OF(dpif, struct dpif_linux, dpif);
+}
+
+static int
+dpif_linux_enumerate(struct svec *all_dps)
+{
+ int major;
+ int error;
+ int i;
+
+ /* Check that the Open vSwitch module is loaded. */
+ major = get_openvswitch_major();
+ if (major < 0) {
+ return -major;
+ }
+
+ error = 0;
+ for (i = 0; i < ODP_MAX; i++) {
+ struct dpif *dpif;
+ char devname[16];
+ int retval;
+
+ sprintf(devname, "dp%d", i);
+ retval = dpif_open(devname, &dpif);
+ if (!retval) {
+ svec_add(all_dps, devname);
+ dpif_close(dpif);
+ } else if (retval != ENODEV && !error) {
+ error = retval;
+ }
+ }
+ return error;
+}
+
+static int
+dpif_linux_open(const char *name UNUSED, char *suffix, bool create,
+ struct dpif **dpifp)
+{
+ int minor;
+
+ minor = !strncmp(name, "dp", 2)
+ && isdigit((unsigned char)name[2]) ? atoi(name + 2) : -1;
+ if (create) {
+ if (minor >= 0) {
+ return create_minor(suffix, minor, dpifp);
+ } else {
+ /* Scan for unused minor number. */
+ for (minor = 0; minor < ODP_MAX; minor++) {
+ int error = create_minor(suffix, minor, dpifp);
+ if (error != EBUSY) {
+ return error;
+ }
+ }
+
+ /* All datapath numbers in use. */
+ return ENOBUFS;
+ }
+ } else {
+ struct dpif_linux *dpif;
+ struct odp_port port;
+ int error;
+
+ if (minor < 0) {
+ error = lookup_minor(suffix, &minor);
+ if (error) {
+ return error;
+ }
+ }
+
+ error = open_minor(minor, dpifp);
+ if (error) {
+ return error;
+ }
+ dpif = dpif_linux_cast(*dpifp);
+
+ /* We need the local port's ifindex for the poll function. Start by
+ * getting the local port's name. */
+ memset(&port, 0, sizeof port);
+ port.port = ODPP_LOCAL;
+ if (ioctl(dpif->fd, ODP_PORT_QUERY, &port)) {
+ error = errno;
+ if (error != ENODEV) {
+ VLOG_WARN("%s: probe returned unexpected error: %s",
+ dpif_name(*dpifp), strerror(error));
+ }
+ dpif_close(*dpifp);
+ return error;
+ }
+
+ /* Then use that to finish up opening. */
+ return finish_open(&dpif->dpif, port.devname);
+ }
+}
+
+static void
+dpif_linux_close(struct dpif *dpif_)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ rtnetlink_notifier_unregister(&dpif->port_notifier);
+ svec_destroy(&dpif->changed_ports);
+ free(dpif->local_ifname);
+ close(dpif->fd);
+ free(dpif);
+}
+
+static int
+dpif_linux_get_all_names(const struct dpif *dpif_, struct svec *all_names)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+
+ svec_add_nocopy(all_names, xasprintf("dp%d", dpif->minor));
+ svec_add(all_names, dpif->local_ifname);
+ return 0;
+}
+
+static int
+dpif_linux_delete(struct dpif *dpif_)
+{
+ return do_ioctl(dpif_, ODP_DP_DESTROY, NULL);
+}
+
+static int
+dpif_linux_get_stats(const struct dpif *dpif_, struct odp_stats *stats)
+{
+ return do_ioctl(dpif_, ODP_DP_STATS, stats);
+}
+
+static int
+dpif_linux_get_drop_frags(const struct dpif *dpif_, bool *drop_fragsp)
+{
+ int drop_frags;
+ int error;
+
+ error = do_ioctl(dpif_, ODP_GET_DROP_FRAGS, &drop_frags);
+ if (!error) {
+ *drop_fragsp = drop_frags & 1;
+ }
+ return error;
+}
+
+static int
+dpif_linux_set_drop_frags(struct dpif *dpif_, bool drop_frags)
+{
+ int drop_frags_int = drop_frags;
+ return do_ioctl(dpif_, ODP_SET_DROP_FRAGS, &drop_frags_int);
+}
+
+static int
+dpif_linux_port_add(struct dpif *dpif_, const char *devname, uint16_t flags,
+ uint16_t *port_no)
+{
+ struct odp_port port;
+ int error;
+
+ memset(&port, 0, sizeof port);
+ strncpy(port.devname, devname, sizeof port.devname);
+ port.flags = flags;
+ error = do_ioctl(dpif_, ODP_PORT_ADD, &port);
+ if (!error) {
+ *port_no = port.port;
+ }
+ return error;
+}
+
+static int
+dpif_linux_port_del(struct dpif *dpif_, uint16_t port_no)
+{
+ int tmp = port_no;
+ return do_ioctl(dpif_, ODP_PORT_DEL, &tmp);
+}
+
+static int
+dpif_linux_port_query_by_number(const struct dpif *dpif_, uint16_t port_no,
+ struct odp_port *port)
+{
+ memset(port, 0, sizeof *port);
+ port->port = port_no;
+ return do_ioctl(dpif_, ODP_PORT_QUERY, port);
+}
+
+static int
+dpif_linux_port_query_by_name(const struct dpif *dpif_, const char *devname,
+ struct odp_port *port)
+{
+ memset(port, 0, sizeof *port);
+ strncpy(port->devname, devname, sizeof port->devname);
+ return do_ioctl(dpif_, ODP_PORT_QUERY, port);
+}
+
+static int
+dpif_linux_flow_flush(struct dpif *dpif_)
+{
+ return do_ioctl(dpif_, ODP_FLOW_FLUSH, NULL);
+}
+
+static int
+dpif_linux_port_list(const struct dpif *dpif_, struct odp_port *ports, int n)
+{
+ struct odp_portvec pv;
+ int error;
+
+ pv.ports = ports;
+ pv.n_ports = n;
+ error = do_ioctl(dpif_, ODP_PORT_LIST, &pv);
+ return error ? -error : pv.n_ports;
+}
+
+static int
+dpif_linux_port_poll(const struct dpif *dpif_, char **devnamep)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+
+ if (dpif->change_error) {
+ dpif->change_error = false;
+ svec_clear(&dpif->changed_ports);
+ return ENOBUFS;
+ } else if (dpif->changed_ports.n) {
+ *devnamep = dpif->changed_ports.names[--dpif->changed_ports.n];
+ return 0;
+ } else {
+ return EAGAIN;
+ }
+}
+
+static void
+dpif_linux_port_poll_wait(const struct dpif *dpif_)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ if (dpif->changed_ports.n || dpif->change_error) {
+ poll_immediate_wake();
+ } else {
+ rtnetlink_notifier_wait();
+ }
+}
+
+static int
+dpif_linux_port_group_get(const struct dpif *dpif_, int group,
+ uint16_t ports[], int n)
+{
+ struct odp_port_group pg;
+ int error;
+
+ assert(n <= UINT16_MAX);
+ pg.group = group;
+ pg.ports = ports;
+ pg.n_ports = n;
+ error = do_ioctl(dpif_, ODP_PORT_GROUP_GET, &pg);
+ return error ? -error : pg.n_ports;
+}
+
+static int
+dpif_linux_port_group_set(struct dpif *dpif_, int group,
+ const uint16_t ports[], int n)
+{
+ struct odp_port_group pg;
+
+ assert(n <= UINT16_MAX);
+ pg.group = group;
+ pg.ports = (uint16_t *) ports;
+ pg.n_ports = n;
+ return do_ioctl(dpif_, ODP_PORT_GROUP_SET, &pg);
+}
+
+static int
+dpif_linux_flow_get(const struct dpif *dpif_, struct odp_flow flows[], int n)
+{
+ struct odp_flowvec fv;
+ fv.flows = flows;
+ fv.n_flows = n;
+ return do_ioctl(dpif_, ODP_FLOW_GET, &fv);
+}
+
+static int
+dpif_linux_flow_put(struct dpif *dpif_, struct odp_flow_put *put)
+{
+ return do_ioctl(dpif_, ODP_FLOW_PUT, put);
+}
+
+static int
+dpif_linux_flow_del(struct dpif *dpif_, struct odp_flow *flow)
+{
+ return do_ioctl(dpif_, ODP_FLOW_DEL, flow);
+}
+
+static int
+dpif_linux_flow_list(const struct dpif *dpif_, struct odp_flow flows[], int n)
+{
+ struct odp_flowvec fv;
+ int error;
+
+ fv.flows = flows;
+ fv.n_flows = n;
+ error = do_ioctl(dpif_, ODP_FLOW_LIST, &fv);
+ return error ? -error : fv.n_flows;
+}
+
+static int
+dpif_linux_execute(struct dpif *dpif_, uint16_t in_port,
+ const union odp_action actions[], int n_actions,
+ const struct ofpbuf *buf)
+{
+ struct odp_execute execute;
+ memset(&execute, 0, sizeof execute);
+ execute.in_port = in_port;
+ execute.actions = (union odp_action *) actions;
+ execute.n_actions = n_actions;
+ execute.data = buf->data;
+ execute.length = buf->size;
+ return do_ioctl(dpif_, ODP_EXECUTE, &execute);
+}
+
+static int
+dpif_linux_recv_get_mask(const struct dpif *dpif_, int *listen_mask)
+{
+ return do_ioctl(dpif_, ODP_GET_LISTEN_MASK, listen_mask);
+}
+
+static int
+dpif_linux_recv_set_mask(struct dpif *dpif_, int listen_mask)
+{
+ return do_ioctl(dpif_, ODP_SET_LISTEN_MASK, &listen_mask);
+}
+
+static int
+dpif_linux_recv(struct dpif *dpif_, struct ofpbuf **bufp)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ struct ofpbuf *buf;
+ int retval;
+ int error;
+
+ buf = ofpbuf_new(65536);
+ retval = read(dpif->fd, ofpbuf_tail(buf), ofpbuf_tailroom(buf));
+ if (retval < 0) {
+ error = errno;
+ if (error != EAGAIN) {
+ VLOG_WARN_RL(&error_rl, "%s: read failed: %s",
+ dpif_name(dpif_), strerror(error));
+ }
+ } else if (retval >= sizeof(struct odp_msg)) {
+ struct odp_msg *msg = buf->data;
+ if (msg->length <= retval) {
+ buf->size += retval;
+ *bufp = buf;
+ return 0;
+ } else {
+ VLOG_WARN_RL(&error_rl, "%s: discarding message truncated "
+ "from %"PRIu32" bytes to %d",
+ dpif_name(dpif_), msg->length, retval);
+ error = ERANGE;
+ }
+ } else if (!retval) {
+ VLOG_WARN_RL(&error_rl, "%s: unexpected end of file", dpif_name(dpif_));
+ error = EPROTO;
+ } else {
+ VLOG_WARN_RL(&error_rl,
+ "%s: discarding too-short message (%d bytes)",
+ dpif_name(dpif_), retval);
+ error = ERANGE;
+ }
+
+ *bufp = NULL;
+ ofpbuf_delete(buf);
+ return error;
+}
+
+static void
+dpif_linux_recv_wait(struct dpif *dpif_)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ poll_fd_wait(dpif->fd, POLLIN);
+}
+
+const struct dpif_class dpif_linux_class = {
+ "", /* This is the default class. */
+ "linux",
+ NULL,
+ NULL,
+ dpif_linux_enumerate,
+ dpif_linux_open,
+ dpif_linux_close,
+ dpif_linux_get_all_names,
+ dpif_linux_delete,
+ dpif_linux_get_stats,
+ dpif_linux_get_drop_frags,
+ dpif_linux_set_drop_frags,
+ dpif_linux_port_add,
+ dpif_linux_port_del,
+ dpif_linux_port_query_by_number,
+ dpif_linux_port_query_by_name,
+ dpif_linux_port_list,
+ dpif_linux_port_poll,
+ dpif_linux_port_poll_wait,
+ dpif_linux_port_group_get,
+ dpif_linux_port_group_set,
+ dpif_linux_flow_get,
+ dpif_linux_flow_put,
+ dpif_linux_flow_del,
+ dpif_linux_flow_flush,
+ dpif_linux_flow_list,
+ dpif_linux_execute,
+ dpif_linux_recv_get_mask,
+ dpif_linux_recv_set_mask,
+ dpif_linux_recv,
+ dpif_linux_recv_wait,
+};
+\f
+static int get_openvswitch_major(void);
+static int get_major(const char *target);
+
+static int
+do_ioctl(const struct dpif *dpif_, int cmd, const void *arg)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ return ioctl(dpif->fd, cmd, arg) ? errno : 0;
+}
+
+static int
+lookup_minor(const char *name, int *minorp)
+{
+ struct ethtool_drvinfo drvinfo;
+ int minor, port_no;
+ struct ifreq ifr;
+ int error;
+ int sock;
+
+ sock = socket(AF_INET, SOCK_DGRAM, 0);
+ if (sock < 0) {
+ VLOG_WARN("socket(AF_INET) failed: %s", strerror(errno));
+ error = errno;
+ goto error;
+ }
+
+ memset(&ifr, 0, sizeof ifr);
+ strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name);
+ ifr.ifr_data = (caddr_t) &drvinfo;
+
+ memset(&drvinfo, 0, sizeof drvinfo);
+ drvinfo.cmd = ETHTOOL_GDRVINFO;
+ if (ioctl(sock, SIOCETHTOOL, &ifr)) {
+ VLOG_WARN("ioctl(SIOCETHTOOL) failed: %s", strerror(errno));
+ error = errno;
+ goto error_close_sock;
+ }
+
+ if (strcmp(drvinfo.driver, "openvswitch")) {
+ VLOG_WARN("%s is not an openvswitch device", name);
+ error = EOPNOTSUPP;
+ goto error_close_sock;
+ }
+
+ if (sscanf(drvinfo.bus_info, "%d.%d", &minor, &port_no) != 2) {
+ VLOG_WARN("%s ethtool bus_info has unexpected format", name);
+ error = EPROTOTYPE;
+ goto error_close_sock;
+ } else if (port_no != ODPP_LOCAL) {
+ /* This is an Open vSwitch device but not the local port. We
+ * intentionally support only using the name of the local port as the
+ * name of a datapath; otherwise, it would be too difficult to
+ * enumerate all the names of a datapath. */
+ error = EOPNOTSUPP;
+ goto error_close_sock;
+ }
+
+ *minorp = minor;
+ close(sock);
+ return 0;
+
+error_close_sock:
+ close(sock);
+error:
+ return error;
+}
+
+static int
+make_openvswitch_device(int minor, char **fnp)
+{
+ const char dirname[] = "/dev/net";
+ int major;
+ dev_t dev;
+ struct stat s;
+ char fn[128];
+
+ major = get_openvswitch_major();
+ if (major < 0) {
+ return -major;
+ }
+ dev = makedev(major, minor);
+
+ *fnp = NULL;
+ sprintf(fn, "%s/dp%d", dirname, minor);
+ if (!stat(fn, &s)) {
+ if (!S_ISCHR(s.st_mode)) {
+ VLOG_WARN_RL(&error_rl, "%s is not a character device, fixing",
+ fn);
+ } else if (s.st_rdev != dev) {
+ VLOG_WARN_RL(&error_rl,
+ "%s is device %u:%u but should be %u:%u, fixing",
+ fn, major(s.st_rdev), minor(s.st_rdev),
+ major(dev), minor(dev));
+ } else {
+ goto success;
+ }
+ if (unlink(fn)) {
+ VLOG_WARN_RL(&error_rl, "%s: unlink failed (%s)",
+ fn, strerror(errno));
+ return errno;
+ }
+ } else if (errno == ENOENT) {
+ if (stat(dirname, &s)) {
+ if (errno == ENOENT) {
+ if (mkdir(dirname, 0755)) {
+ VLOG_WARN_RL(&error_rl, "%s: mkdir failed (%s)",
+ dirname, strerror(errno));
+ return errno;
+ }
+ } else {
+ VLOG_WARN_RL(&error_rl, "%s: stat failed (%s)",
+ dirname, strerror(errno));
+ return errno;
+ }
+ }
+ } else {
+ VLOG_WARN_RL(&error_rl, "%s: stat failed (%s)", fn, strerror(errno));
+ return errno;
+ }
+
+ /* The device needs to be created. */
+ if (mknod(fn, S_IFCHR | 0700, dev)) {
+ VLOG_WARN_RL(&error_rl,
+ "%s: creating character device %u:%u failed (%s)",
+ fn, major(dev), minor(dev), strerror(errno));
+ return errno;
+ }
+
+success:
+ *fnp = xstrdup(fn);
+ return 0;
+}
+
+/* Return the major device number of the Open vSwitch device. If it
+ * cannot be determined, a negative errno is returned. */
+static int
+get_openvswitch_major(void)
+{
+ static int openvswitch_major = -1;
+ if (openvswitch_major < 0) {
+ openvswitch_major = get_major("openvswitch");
+ }
+ return openvswitch_major;
+}
+
+static int
+get_major(const char *target)
+{
+ const char fn[] = "/proc/devices";
+ char line[128];
+ FILE *file;
+ int ln;
+
+ file = fopen(fn, "r");
+ if (!file) {
+ VLOG_ERR("opening %s failed (%s)", fn, strerror(errno));
+ return -errno;
+ }
+
+ for (ln = 1; fgets(line, sizeof line, file); ln++) {
+ char name[64];
+ int major;
+
+ if (!strncmp(line, "Character", 9) || line[0] == '\0') {
+ /* Nothing to do. */
+ } else if (!strncmp(line, "Block", 5)) {
+ /* We only want character devices, so skip the rest of the file. */
+ break;
+ } else if (sscanf(line, "%d %63s", &major, name)) {
+ if (!strcmp(name, target)) {
+ fclose(file);
+ return major;
+ }
+ } else {
+ static bool warned;
+ if (!warned) {
+ VLOG_WARN("%s:%d: syntax error", fn, ln);
+ }
+ warned = true;
+ }
+ }
+
+ VLOG_ERR("%s: %s major not found (is the module loaded?)", fn, target);
+ return -ENODEV;
+}
+
+static int
+finish_open(struct dpif *dpif_, const char *local_ifname)
+{
+ struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+ dpif->local_ifname = strdup(local_ifname);
+ dpif->local_ifindex = if_nametoindex(local_ifname);
+ if (!dpif->local_ifindex) {
+ int error = errno;
+ dpif_close(dpif_);
+ VLOG_WARN("could not get ifindex of %s device: %s",
+ local_ifname, strerror(errno));
+ return error;
+ }
+ return 0;
+}
+
+static int
+create_minor(const char *name, int minor, struct dpif **dpifp)
+{
+ int error = open_minor(minor, dpifp);
+ if (!error) {
+ error = do_ioctl(*dpifp, ODP_DP_CREATE, name);
+ if (!error) {
+ error = finish_open(*dpifp, name);
+ } else {
+ dpif_close(*dpifp);
+ }
+ }
+ return error;
+}
+
+static int
+open_minor(int minor, struct dpif **dpifp)
+{
+ int error;
+ char *fn;
+ int fd;
+
+ error = make_openvswitch_device(minor, &fn);
+ if (error) {
+ return error;
+ }
+
+ fd = open(fn, O_RDONLY | O_NONBLOCK);
+ if (fd >= 0) {
+ struct dpif_linux *dpif = xmalloc(sizeof *dpif);
+ error = rtnetlink_notifier_register(&dpif->port_notifier,
+ dpif_linux_port_changed, dpif);
+ if (!error) {
+ char *name;
+
+ name = xasprintf("dp%d", minor);
+ dpif_init(&dpif->dpif, &dpif_linux_class, name, minor, minor);
+ free(name);
+
+ dpif->fd = fd;
+ dpif->local_ifname = NULL;
+ dpif->minor = minor;
+ dpif->local_ifindex = 0;
+ svec_init(&dpif->changed_ports);
+ dpif->change_error = false;
+ *dpifp = &dpif->dpif;
+ } else {
+ free(dpif);
+ }
+ } else {
+ error = errno;
+ VLOG_WARN("%s: open failed (%s)", fn, strerror(error));
+ }
+ free(fn);
+
+ return error;
+}
+
+static void
+dpif_linux_port_changed(const struct rtnetlink_change *change, void *dpif_)
+{
+ struct dpif_linux *dpif = dpif_;
+
+ if (change) {
+ if (change->master_ifindex == dpif->local_ifindex
+ && (change->nlmsg_type == RTM_NEWLINK
+ || change->nlmsg_type == RTM_DELLINK))
+ {
+ /* Our datapath changed, either adding a new port or deleting an
+ * existing one. */
+ if (!svec_contains(&dpif->changed_ports, change->ifname)) {
+ svec_add(&dpif->changed_ports, change->ifname);
+ svec_sort(&dpif->changed_ports);
+ }
+ }
+ } else {
+ dpif->change_error = true;
+ }
+}
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "dpif.h"
+
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "csum.h"
+#include "dpif-provider.h"
+#include "flow.h"
+#include "hmap.h"
+#include "list.h"
+#include "netdev.h"
+#include "odp-util.h"
+#include "ofp-print.h"
+#include "ofpbuf.h"
+#include "packets.h"
+#include "poll-loop.h"
+#include "queue.h"
+#include "timeval.h"
+#include "util.h"
+
+#include "vlog.h"
+#define THIS_MODULE VLM_dpif_netdev
+
+/* Configuration parameters. */
+enum { N_QUEUES = 2 }; /* Number of queues for dpif_recv(). */
+enum { MAX_QUEUE_LEN = 100 }; /* Maximum number of packets per queue. */
+enum { N_GROUPS = 16 }; /* Number of port groups. */
+enum { MAX_PORTS = 256 }; /* Maximum number of ports. */
+enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
+
+/* Enough headroom to add a vlan tag, plus an extra 2 bytes to allow IP
+ * headers to be aligned on a 4-byte boundary. */
+enum { DP_NETDEV_HEADROOM = 2 + VLAN_HEADER_LEN };
+
+/* Datapath based on the network device interface from netdev.h. */
+struct dp_netdev {
+ struct list node;
+ int dp_idx;
+ int open_cnt;
+ bool deleted;
+
+ bool drop_frags; /* Drop all IP fragments, if true. */
+ struct ovs_queue queues[N_QUEUES]; /* Messages queued for dpif_recv(). */
+ struct hmap flow_table; /* Flow table. */
+ struct odp_port_group groups[N_GROUPS];
+
+ /* Statistics. */
+ long long int n_frags; /* Number of dropped IP fragments. */
+ long long int n_hit; /* Number of flow table matches. */
+ long long int n_missed; /* Number of flow table misses. */
+ long long int n_lost; /* Number of misses not passed to client. */
+
+ /* Ports. */
+ int n_ports;
+ struct dp_netdev_port *ports[MAX_PORTS];
+ struct list port_list;
+ unsigned int serial;
+};
+
+/* A port in a netdev-based datapath. */
+struct dp_netdev_port {
+ int port_no; /* Index into dp_netdev's 'ports'. */
+ struct list node; /* Element in dp_netdev's 'port_list'. */
+ struct netdev *netdev;
+ bool internal; /* Internal port (as ODP_PORT_INTERNAL)? */
+};
+
+/* A flow in dp_netdev's 'flow_table'. */
+struct dp_netdev_flow {
+ struct hmap_node node; /* Element in dp_netdev's 'flow_table'. */
+ flow_t key;
+
+ /* Statistics. */
+ struct timeval used; /* Last used time, in milliseconds. */
+ long long int packet_count; /* Number of packets matched. */
+ long long int byte_count; /* Number of bytes matched. */
+ uint8_t ip_tos; /* IP TOS value. */
+ uint16_t tcp_ctl; /* Bitwise-OR of seen tcp_ctl values. */
+
+ /* Actions. */
+ union odp_action *actions;
+ unsigned int n_actions;
+};
+
+/* Interface to netdev-based datapath. */
+struct dpif_netdev {
+ struct dpif dpif;
+ struct dp_netdev *dp;
+ int listen_mask;
+ unsigned int dp_serial;
+};
+
+/* All netdev-based datapaths. */
+static struct dp_netdev *dp_netdevs[256];
+struct list dp_netdev_list = LIST_INITIALIZER(&dp_netdev_list);
+enum { N_DP_NETDEVS = ARRAY_SIZE(dp_netdevs) };
+
+/* Maximum port MTU seen so far. */
+static int max_mtu = ETH_PAYLOAD_MAX;
+
+static int get_port_by_number(struct dp_netdev *, uint16_t port_no,
+ struct dp_netdev_port **portp);
+static int get_port_by_name(struct dp_netdev *, const char *devname,
+ struct dp_netdev_port **portp);
+static void dp_netdev_free(struct dp_netdev *);
+static void dp_netdev_flow_flush(struct dp_netdev *);
+static int do_add_port(struct dp_netdev *, const char *devname, uint16_t flags,
+ uint16_t port_no);
+static int do_del_port(struct dp_netdev *, uint16_t port_no);
+static int dp_netdev_output_control(struct dp_netdev *, const struct ofpbuf *,
+ int queue_no, int port_no, uint32_t arg);
+static int dp_netdev_execute_actions(struct dp_netdev *,
+ struct ofpbuf *, flow_t *,
+ const union odp_action *, int n);
+
+static struct dpif_netdev *
+dpif_netdev_cast(const struct dpif *dpif)
+{
+ dpif_assert_class(dpif, &dpif_netdev_class);
+ return CONTAINER_OF(dpif, struct dpif_netdev, dpif);
+}
+
+static struct dp_netdev *
+get_dp_netdev(const struct dpif *dpif)
+{
+ return dpif_netdev_cast(dpif)->dp;
+}
+
+static int
+name_to_dp_idx(const char *name)
+{
+ if (!strncmp(name, "dp", 2) && isdigit((unsigned char)name[2])) {
+ int dp_idx = atoi(name + 2);
+ if (dp_idx >= 0 && dp_idx < N_DP_NETDEVS) {
+ return dp_idx;
+ }
+ }
+ return -1;
+}
+
+static struct dp_netdev *
+find_dp_netdev(const char *name)
+{
+ int dp_idx;
+ size_t i;
+
+ dp_idx = name_to_dp_idx(name);
+ if (dp_idx >= 0) {
+ return dp_netdevs[dp_idx];
+ }
+
+ for (i = 0; i < N_DP_NETDEVS; i++) {
+ struct dp_netdev *dp = dp_netdevs[i];
+ if (dp) {
+ struct dp_netdev_port *port;
+ if (!get_port_by_name(dp, name, &port)) {
+ return dp;
+ }
+ }
+ }
+ return NULL;
+}
+
+static struct dpif *
+create_dpif_netdev(struct dp_netdev *dp)
+{
+ struct dpif_netdev *dpif;
+ char *dpname;
+
+ dp->open_cnt++;
+
+ dpname = xasprintf("netdev:dp%d", dp->dp_idx);
+ dpif = xmalloc(sizeof *dpif);
+ dpif_init(&dpif->dpif, &dpif_netdev_class, dpname, dp->dp_idx, dp->dp_idx);
+ dpif->dp = dp;
+ dpif->listen_mask = 0;
+ dpif->dp_serial = dp->serial;
+ free(dpname);
+
+ return &dpif->dpif;
+}
+
+static int
+create_dp_netdev(const char *name, int dp_idx, struct dpif **dpifp)
+{
+ struct dp_netdev *dp;
+ int error;
+ int i;
+
+ if (dp_netdevs[dp_idx]) {
+ return EBUSY;
+ }
+
+ /* Create datapath. */
+ dp_netdevs[dp_idx] = dp = xcalloc(1, sizeof *dp);
+ list_push_back(&dp_netdev_list, &dp->node);
+ dp->dp_idx = dp_idx;
+ dp->open_cnt = 0;
+ dp->drop_frags = false;
+ for (i = 0; i < N_QUEUES; i++) {
+ queue_init(&dp->queues[i]);
+ }
+ hmap_init(&dp->flow_table);
+ for (i = 0; i < N_GROUPS; i++) {
+ dp->groups[i].ports = NULL;
+ dp->groups[i].n_ports = 0;
+ dp->groups[i].group = i;
+ }
+ list_init(&dp->port_list);
+ error = do_add_port(dp, name, ODP_PORT_INTERNAL, ODPP_LOCAL);
+ if (error) {
+ dp_netdev_free(dp);
+ return error;
+ }
+
+ *dpifp = create_dpif_netdev(dp);
+ return 0;
+}
+
+static int
+dpif_netdev_open(const char *name UNUSED, char *suffix, bool create,
+ struct dpif **dpifp)
+{
+ if (create) {
+ if (find_dp_netdev(suffix)) {
+ return EEXIST;
+ } else {
+ int dp_idx = name_to_dp_idx(suffix);
+ if (dp_idx >= 0) {
+ return create_dp_netdev(suffix, dp_idx, dpifp);
+ } else {
+ /* Scan for unused dp_idx number. */
+ for (dp_idx = 0; dp_idx < N_DP_NETDEVS; dp_idx++) {
+ int error = create_dp_netdev(suffix, dp_idx, dpifp);
+ if (error != EBUSY) {
+ return error;
+ }
+ }
+
+ /* All datapath numbers in use. */
+ return ENOBUFS;
+ }
+ }
+ } else {
+ struct dp_netdev *dp = find_dp_netdev(suffix);
+ if (dp) {
+ *dpifp = create_dpif_netdev(dp);
+ return 0;
+ } else {
+ return ENODEV;
+ }
+ }
+}
+
+static void
+dp_netdev_free(struct dp_netdev *dp)
+{
+ int i;
+
+ dp_netdev_flow_flush(dp);
+ while (dp->n_ports > 0) {
+ struct dp_netdev_port *port = CONTAINER_OF(
+ dp->port_list.next, struct dp_netdev_port, node);
+ do_del_port(dp, port->port_no);
+ }
+ for (i = 0; i < N_QUEUES; i++) {
+ queue_destroy(&dp->queues[i]);
+ }
+ hmap_destroy(&dp->flow_table);
+ for (i = 0; i < N_GROUPS; i++) {
+ free(dp->groups[i].ports);
+ }
+ dp_netdevs[dp->dp_idx] = NULL;
+ list_remove(&dp->node);
+ free(dp);
+}
+
+static void
+dpif_netdev_close(struct dpif *dpif)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ assert(dp->open_cnt > 0);
+ if (--dp->open_cnt == 0 && dp->deleted) {
+ dp_netdev_free(dp);
+ }
+ free(dpif);
+}
+
+static int
+dpif_netdev_delete(struct dpif *dpif)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ dp->deleted = true;
+ return 0;
+}
+
+static int
+dpif_netdev_get_stats(const struct dpif *dpif, struct odp_stats *stats)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ memset(stats, 0, sizeof *stats);
+ stats->n_flows = hmap_count(&dp->flow_table);
+ stats->cur_capacity = hmap_capacity(&dp->flow_table);
+ stats->max_capacity = MAX_FLOWS;
+ stats->n_ports = dp->n_ports;
+ stats->max_ports = MAX_PORTS;
+ stats->max_groups = N_GROUPS;
+ stats->n_frags = dp->n_frags;
+ stats->n_hit = dp->n_hit;
+ stats->n_missed = dp->n_missed;
+ stats->n_lost = dp->n_lost;
+ stats->max_miss_queue = MAX_QUEUE_LEN;
+ stats->max_action_queue = MAX_QUEUE_LEN;
+ return 0;
+}
+
+static int
+dpif_netdev_get_drop_frags(const struct dpif *dpif, bool *drop_fragsp)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ *drop_fragsp = dp->drop_frags;
+ return 0;
+}
+
+static int
+dpif_netdev_set_drop_frags(struct dpif *dpif, bool drop_frags)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ dp->drop_frags = drop_frags;
+ return 0;
+}
+
+static int
+do_add_port(struct dp_netdev *dp, const char *devname, uint16_t flags,
+ uint16_t port_no)
+{
+ bool internal = (flags & ODP_PORT_INTERNAL) != 0;
+ struct dp_netdev_port *port;
+ struct netdev *netdev;
+ int mtu;
+ int error;
+
+ /* XXX reject devices already in some dp_netdev. */
+
+ /* Open and validate network device. */
+ if (!internal) {
+ error = netdev_open(devname, NETDEV_ETH_TYPE_ANY, &netdev);
+ } else {
+ char *tapname = xasprintf("tap:%s", devname);
+ error = netdev_open(tapname, NETDEV_ETH_TYPE_ANY, &netdev);
+ free(tapname);
+ }
+ if (error) {
+ return error;
+ }
+ /* XXX reject loopback devices */
+ /* XXX reject non-Ethernet devices */
+
+ error = netdev_turn_flags_on(netdev, NETDEV_PROMISC, false);
+ if (error) {
+ netdev_close(netdev);
+ return error;
+ }
+
+ port = xmalloc(sizeof *port);
+ port->port_no = port_no;
+ port->netdev = netdev;
+ port->internal = internal;
+
+ netdev_get_mtu(netdev, &mtu);
+ if (mtu > max_mtu) {
+ max_mtu = mtu;
+ }
+
+ list_push_back(&dp->port_list, &port->node);
+ dp->ports[port_no] = port;
+ dp->n_ports++;
+ dp->serial++;
+
+ return 0;
+}
+
+static int
+dpif_netdev_port_add(struct dpif *dpif, const char *devname, uint16_t flags,
+ uint16_t *port_nop)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ int port_no;
+
+ for (port_no = 0; port_no < MAX_PORTS; port_no++) {
+ if (!dp->ports[port_no]) {
+ *port_nop = port_no;
+ return do_add_port(dp, devname, flags, port_no);
+ }
+ }
+ return EFBIG;
+}
+
+static int
+dpif_netdev_port_del(struct dpif *dpif, uint16_t port_no)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ return port_no == ODPP_LOCAL ? EINVAL : do_del_port(dp, port_no);
+}
+
+static bool
+is_valid_port_number(uint16_t port_no)
+{
+ return port_no < MAX_PORTS;
+}
+
+static int
+get_port_by_number(struct dp_netdev *dp,
+ uint16_t port_no, struct dp_netdev_port **portp)
+{
+ if (!is_valid_port_number(port_no)) {
+ *portp = NULL;
+ return EINVAL;
+ } else {
+ *portp = dp->ports[port_no];
+ return *portp ? 0 : ENOENT;
+ }
+}
+
+static int
+get_port_by_name(struct dp_netdev *dp,
+ const char *devname, struct dp_netdev_port **portp)
+{
+ struct dp_netdev_port *port;
+
+ LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) {
+ if (!strcmp(netdev_get_name(port->netdev), devname)) {
+ *portp = port;
+ return 0;
+ }
+ }
+ return ENOENT;
+}
+
+static int
+do_del_port(struct dp_netdev *dp, uint16_t port_no)
+{
+ struct dp_netdev_port *port;
+ int error;
+
+ error = get_port_by_number(dp, port_no, &port);
+ if (error) {
+ return error;
+ }
+
+ list_remove(&port->node);
+ dp->ports[port->port_no] = NULL;
+ dp->n_ports--;
+ dp->serial++;
+
+ netdev_close(port->netdev);
+ free(port);
+
+ return 0;
+}
+
+static void
+answer_port_query(const struct dp_netdev_port *port, struct odp_port *odp_port)
+{
+ memset(odp_port, 0, sizeof *odp_port);
+ ovs_strlcpy(odp_port->devname, netdev_get_name(port->netdev),
+ sizeof odp_port->devname);
+ odp_port->port = port->port_no;
+ odp_port->flags = port->internal ? ODP_PORT_INTERNAL : 0;
+}
+
+static int
+dpif_netdev_port_query_by_number(const struct dpif *dpif, uint16_t port_no,
+ struct odp_port *odp_port)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct dp_netdev_port *port;
+ int error;
+
+ error = get_port_by_number(dp, port_no, &port);
+ if (!error) {
+ answer_port_query(port, odp_port);
+ }
+ return error;
+}
+
+static int
+dpif_netdev_port_query_by_name(const struct dpif *dpif, const char *devname,
+ struct odp_port *odp_port)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct dp_netdev_port *port;
+ int error;
+
+ error = get_port_by_name(dp, devname, &port);
+ if (!error) {
+ answer_port_query(port, odp_port);
+ }
+ return error;
+}
+
+static void
+dp_netdev_free_flow(struct dp_netdev *dp, struct dp_netdev_flow *flow)
+{
+ hmap_remove(&dp->flow_table, &flow->node);
+ free(flow->actions);
+ free(flow);
+}
+
+static void
+dp_netdev_flow_flush(struct dp_netdev *dp)
+{
+ struct dp_netdev_flow *flow, *next;
+
+ HMAP_FOR_EACH_SAFE (flow, next, struct dp_netdev_flow, node,
+ &dp->flow_table) {
+ dp_netdev_free_flow(dp, flow);
+ }
+}
+
+static int
+dpif_netdev_flow_flush(struct dpif *dpif)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ dp_netdev_flow_flush(dp);
+ return 0;
+}
+
+static int
+dpif_netdev_port_list(const struct dpif *dpif, struct odp_port *ports, int n)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct dp_netdev_port *port;
+ int i;
+
+ i = 0;
+ LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) {
+ struct odp_port *odp_port = &ports[i];
+ if (i >= n) {
+ break;
+ }
+ answer_port_query(port, odp_port);
+ i++;
+ }
+ return dp->n_ports;
+}
+
+static int
+dpif_netdev_port_poll(const struct dpif *dpif_, char **devnamep UNUSED)
+{
+ struct dpif_netdev *dpif = dpif_netdev_cast(dpif_);
+ if (dpif->dp_serial != dpif->dp->serial) {
+ dpif->dp_serial = dpif->dp->serial;
+ return ENOBUFS;
+ } else {
+ return EAGAIN;
+ }
+}
+
+static void
+dpif_netdev_port_poll_wait(const struct dpif *dpif_)
+{
+ struct dpif_netdev *dpif = dpif_netdev_cast(dpif_);
+ if (dpif->dp_serial != dpif->dp->serial) {
+ poll_immediate_wake();
+ }
+}
+
+static int
+get_port_group(const struct dpif *dpif, int group_no,
+ struct odp_port_group **groupp)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+
+ if (group_no >= 0 && group_no < N_GROUPS) {
+ *groupp = &dp->groups[group_no];
+ return 0;
+ } else {
+ *groupp = NULL;
+ return EINVAL;
+ }
+}
+
+static int
+dpif_netdev_port_group_get(const struct dpif *dpif, int group_no,
+ uint16_t ports[], int n)
+{
+ struct odp_port_group *group;
+ int error;
+
+ if (n < 0) {
+ return -EINVAL;
+ }
+
+ error = get_port_group(dpif, group_no, &group);
+ if (!error) {
+ memcpy(ports, group->ports, MIN(n, group->n_ports) * sizeof *ports);
+ return group->n_ports;
+ } else {
+ return -error;
+ }
+}
+
+static int
+dpif_netdev_port_group_set(struct dpif *dpif, int group_no,
+ const uint16_t ports[], int n)
+{
+ struct odp_port_group *group;
+ int error;
+
+ if (n < 0 || n > MAX_PORTS) {
+ return EINVAL;
+ }
+
+ error = get_port_group(dpif, group_no, &group);
+ if (!error) {
+ free(group->ports);
+ group->ports = xmemdup(ports, n * sizeof *group->ports);
+ group->n_ports = n;
+ group->group = group_no;
+ }
+ return error;
+}
+
+static struct dp_netdev_flow *
+dp_netdev_lookup_flow(const struct dp_netdev *dp, const flow_t *key)
+{
+ struct dp_netdev_flow *flow;
+
+ assert(key->reserved == 0);
+ HMAP_FOR_EACH_WITH_HASH (flow, struct dp_netdev_flow, node,
+ flow_hash(key, 0), &dp->flow_table) {
+ if (flow_equal(&flow->key, key)) {
+ return flow;
+ }
+ }
+ return NULL;
+}
+
+static void
+answer_flow_query(struct dp_netdev_flow *flow, uint32_t query_flags,
+ struct odp_flow *odp_flow)
+{
+ if (flow) {
+ odp_flow->key = flow->key;
+ odp_flow->stats.n_packets = flow->packet_count;
+ odp_flow->stats.n_bytes = flow->byte_count;
+ odp_flow->stats.used_sec = flow->used.tv_sec;
+ odp_flow->stats.used_nsec = flow->used.tv_usec * 1000;
+ odp_flow->stats.tcp_flags = TCP_FLAGS(flow->tcp_ctl);
+ odp_flow->stats.ip_tos = flow->ip_tos;
+ odp_flow->stats.error = 0;
+ if (odp_flow->n_actions > 0) {
+ unsigned int n = MIN(odp_flow->n_actions, flow->n_actions);
+ memcpy(odp_flow->actions, flow->actions,
+ n * sizeof *odp_flow->actions);
+ odp_flow->n_actions = flow->n_actions;
+ }
+
+ if (query_flags & ODPFF_ZERO_TCP_FLAGS) {
+ flow->tcp_ctl = 0;
+ }
+
+ } else {
+ odp_flow->stats.error = ENOENT;
+ }
+}
+
+static int
+dpif_netdev_flow_get(const struct dpif *dpif, struct odp_flow flows[], int n)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ int i;
+
+ for (i = 0; i < n; i++) {
+ struct odp_flow *odp_flow = &flows[i];
+ answer_flow_query(dp_netdev_lookup_flow(dp, &odp_flow->key),
+ odp_flow->flags, odp_flow);
+ }
+ return 0;
+}
+
+static int
+dpif_netdev_validate_actions(const union odp_action *actions, int n_actions,
+ bool *mutates)
+{
+ unsigned int i;
+
+ *mutates = false;
+ for (i = 0; i < n_actions; i++) {
+ const union odp_action *a = &actions[i];
+ switch (a->type) {
+ case ODPAT_OUTPUT:
+ if (a->output.port >= MAX_PORTS) {
+ return EINVAL;
+ }
+ break;
+
+ case ODPAT_OUTPUT_GROUP:
+ *mutates = true;
+ if (a->output_group.group >= N_GROUPS) {
+ return EINVAL;
+ }
+ break;
+
+ case ODPAT_CONTROLLER:
+ break;
+
+ case ODPAT_SET_VLAN_VID:
+ *mutates = true;
+ if (a->vlan_vid.vlan_vid & htons(~VLAN_VID_MASK)) {
+ return EINVAL;
+ }
+ break;
+
+ case ODPAT_SET_VLAN_PCP:
+ *mutates = true;
+ if (a->vlan_pcp.vlan_pcp & ~VLAN_PCP_MASK) {
+ return EINVAL;
+ }
+ break;
+
+ case ODPAT_STRIP_VLAN:
+ case ODPAT_SET_DL_SRC:
+ case ODPAT_SET_DL_DST:
+ case ODPAT_SET_NW_SRC:
+ case ODPAT_SET_NW_DST:
+ case ODPAT_SET_TP_SRC:
+ case ODPAT_SET_TP_DST:
+ *mutates = true;
+ break;
+
+ default:
+ return EOPNOTSUPP;
+ }
+ }
+ return 0;
+}
+
+static int
+set_flow_actions(struct dp_netdev_flow *flow, struct odp_flow *odp_flow)
+{
+ size_t n_bytes;
+ bool mutates;
+ int error;
+
+ if (odp_flow->n_actions >= 4096 / sizeof *odp_flow->actions) {
+ return EINVAL;
+ }
+ error = dpif_netdev_validate_actions(odp_flow->actions,
+ odp_flow->n_actions, &mutates);
+ if (error) {
+ return error;
+ }
+
+ n_bytes = odp_flow->n_actions * sizeof *flow->actions;
+ flow->actions = xrealloc(flow->actions, n_bytes);
+ flow->n_actions = odp_flow->n_actions;
+ memcpy(flow->actions, odp_flow->actions, n_bytes);
+ return 0;
+}
+
+static int
+add_flow(struct dpif *dpif, struct odp_flow *odp_flow)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct dp_netdev_flow *flow;
+ int error;
+
+ flow = xcalloc(1, sizeof *flow);
+ flow->key = odp_flow->key;
+ flow->key.reserved = 0;
+
+ error = set_flow_actions(flow, odp_flow);
+ if (error) {
+ free(flow);
+ return error;
+ }
+
+ hmap_insert(&dp->flow_table, &flow->node, flow_hash(&flow->key, 0));
+ return 0;
+}
+
+static void
+clear_stats(struct dp_netdev_flow *flow)
+{
+ flow->used.tv_sec = 0;
+ flow->used.tv_usec = 0;
+ flow->packet_count = 0;
+ flow->byte_count = 0;
+ flow->ip_tos = 0;
+ flow->tcp_ctl = 0;
+}
+
+static int
+dpif_netdev_flow_put(struct dpif *dpif, struct odp_flow_put *put)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct dp_netdev_flow *flow;
+
+ flow = dp_netdev_lookup_flow(dp, &put->flow.key);
+ if (!flow) {
+ if (put->flags & ODPPF_CREATE) {
+ if (hmap_count(&dp->flow_table) < MAX_FLOWS) {
+ return add_flow(dpif, &put->flow);
+ } else {
+ return EFBIG;
+ }
+ } else {
+ return ENOENT;
+ }
+ } else {
+ if (put->flags & ODPPF_MODIFY) {
+ int error = set_flow_actions(flow, &put->flow);
+ if (!error && put->flags & ODPPF_ZERO_STATS) {
+ clear_stats(flow);
+ }
+ return error;
+ } else {
+ return EEXIST;
+ }
+ }
+}
+
+
+static int
+dpif_netdev_flow_del(struct dpif *dpif, struct odp_flow *odp_flow)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct dp_netdev_flow *flow;
+
+ flow = dp_netdev_lookup_flow(dp, &odp_flow->key);
+ if (flow) {
+ answer_flow_query(flow, 0, odp_flow);
+ dp_netdev_free_flow(dp, flow);
+ return 0;
+ } else {
+ return ENOENT;
+ }
+}
+
+static int
+dpif_netdev_flow_list(const struct dpif *dpif, struct odp_flow flows[], int n)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct dp_netdev_flow *flow;
+ int i;
+
+ i = 0;
+ HMAP_FOR_EACH (flow, struct dp_netdev_flow, node, &dp->flow_table) {
+ if (i >= n) {
+ break;
+ }
+ answer_flow_query(flow, 0, &flows[i++]);
+ }
+ return hmap_count(&dp->flow_table);
+}
+
+static int
+dpif_netdev_execute(struct dpif *dpif, uint16_t in_port,
+ const union odp_action actions[], int n_actions,
+ const struct ofpbuf *packet)
+{
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ struct ofpbuf copy;
+ bool mutates;
+ flow_t flow;
+ int error;
+
+ if (packet->size < ETH_HEADER_LEN || packet->size > UINT16_MAX) {
+ return EINVAL;
+ }
+
+ error = dpif_netdev_validate_actions(actions, n_actions, &mutates);
+ if (error) {
+ return error;
+ }
+
+ if (mutates) {
+ /* We need a deep copy of 'packet' since we're going to modify its
+ * data. */
+ ofpbuf_init(©, DP_NETDEV_HEADROOM + packet->size);
+ copy.data = (char*)copy.base + DP_NETDEV_HEADROOM;
+ ofpbuf_put(©, packet->data, packet->size);
+ } else {
+ /* We still need a shallow copy of 'packet', even though we won't
+ * modify its data, because flow_extract() modifies packet->l2, etc.
+ * We could probably get away with modifying those but it's more polite
+ * if we don't. */
+ copy = *packet;
+ }
+ flow_extract(©, in_port, &flow);
+ error = dp_netdev_execute_actions(dp, ©, &flow, actions, n_actions);
+ if (mutates) {
+ ofpbuf_uninit(©);
+ }
+ return error;
+}
+
+static int
+dpif_netdev_recv_get_mask(const struct dpif *dpif, int *listen_mask)
+{
+ struct dpif_netdev *dpif_netdev = dpif_netdev_cast(dpif);
+ *listen_mask = dpif_netdev->listen_mask;
+ return 0;
+}
+
+static int
+dpif_netdev_recv_set_mask(struct dpif *dpif, int listen_mask)
+{
+ struct dpif_netdev *dpif_netdev = dpif_netdev_cast(dpif);
+ if (!(listen_mask & ~ODPL_ALL)) {
+ dpif_netdev->listen_mask = listen_mask;
+ return 0;
+ } else {
+ return EINVAL;
+ }
+}
+
+static struct ovs_queue *
+find_nonempty_queue(struct dpif *dpif)
+{
+ struct dpif_netdev *dpif_netdev = dpif_netdev_cast(dpif);
+ struct dp_netdev *dp = get_dp_netdev(dpif);
+ int mask = dpif_netdev->listen_mask;
+ int i;
+
+ for (i = 0; i < N_QUEUES; i++) {
+ struct ovs_queue *q = &dp->queues[i];
+ if (q->n && mask & (1u << i)) {
+ return q;
+ }
+ }
+ return NULL;
+}
+
+static int
+dpif_netdev_recv(struct dpif *dpif, struct ofpbuf **bufp)
+{
+ struct ovs_queue *q = find_nonempty_queue(dpif);
+ if (q) {
+ *bufp = queue_pop_head(q);
+ return 0;
+ } else {
+ return EAGAIN;
+ }
+}
+
+static void
+dpif_netdev_recv_wait(struct dpif *dpif)
+{
+ struct ovs_queue *q = find_nonempty_queue(dpif);
+ if (q) {
+ poll_immediate_wake();
+ } else {
+ /* No messages ready to be received, and dp_wait() will ensure that we
+ * wake up to queue new messages, so there is nothing to do. */
+ }
+}
+\f
+static void
+dp_netdev_flow_used(struct dp_netdev_flow *flow, const flow_t *key,
+ const struct ofpbuf *packet)
+{
+ time_timeval(&flow->used);
+ flow->packet_count++;
+ flow->byte_count += packet->size;
+ if (key->dl_type == htons(ETH_TYPE_IP)) {
+ struct ip_header *nh = packet->l3;
+ flow->ip_tos = nh->ip_tos;
+
+ if (key->nw_proto == IPPROTO_TCP) {
+ struct tcp_header *th = packet->l4;
+ flow->tcp_ctl |= th->tcp_ctl;
+ }
+ }
+}
+
+static void
+dp_netdev_port_input(struct dp_netdev *dp, struct dp_netdev_port *port,
+ struct ofpbuf *packet)
+{
+ struct dp_netdev_flow *flow;
+ flow_t key;
+
+ if (flow_extract(packet, port->port_no, &key) && dp->drop_frags) {
+ dp->n_frags++;
+ return;
+ }
+
+ flow = dp_netdev_lookup_flow(dp, &key);
+ if (flow) {
+ dp_netdev_flow_used(flow, &key, packet);
+ dp_netdev_execute_actions(dp, packet, &key,
+ flow->actions, flow->n_actions);
+ dp->n_hit++;
+ } else {
+ dp->n_missed++;
+ dp_netdev_output_control(dp, packet, _ODPL_MISS_NR, port->port_no, 0);
+ }
+}
+
+static void
+dp_netdev_run(void)
+{
+ struct ofpbuf packet;
+ struct dp_netdev *dp;
+
+ ofpbuf_init(&packet, DP_NETDEV_HEADROOM + max_mtu);
+ LIST_FOR_EACH (dp, struct dp_netdev, node, &dp_netdev_list) {
+ struct dp_netdev_port *port;
+
+ LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) {
+ int error;
+
+ /* Reset packet contents. */
+ packet.data = (char*)packet.base + DP_NETDEV_HEADROOM;
+ packet.size = 0;
+
+ error = netdev_recv(port->netdev, &packet);
+ if (!error) {
+ dp_netdev_port_input(dp, port, &packet);
+ } else if (error != EAGAIN) {
+ struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+ VLOG_ERR_RL(&rl, "error receiving data from %s: %s",
+ netdev_get_name(port->netdev), strerror(error));
+ }
+ }
+ }
+ ofpbuf_uninit(&packet);
+}
+
+static void
+dp_netdev_wait(void)
+{
+ struct dp_netdev *dp;
+
+ LIST_FOR_EACH (dp, struct dp_netdev, node, &dp_netdev_list) {
+ struct dp_netdev_port *port;
+ LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) {
+ netdev_recv_wait(port->netdev);
+ }
+ }
+}
+
+static void
+dp_netdev_modify_vlan_tci(struct ofpbuf *packet, flow_t *key,
+ uint16_t tci, uint16_t mask)
+{
+ struct vlan_eth_header *veh;
+
+ if (key->dl_vlan != htons(ODP_VLAN_NONE)) {
+ /* Modify 'mask' bits, but maintain other TCI bits. */
+ veh = packet->l2;
+ veh->veth_tci &= ~htons(mask);
+ veh->veth_tci |= htons(tci);
+ } else {
+ /* Insert new 802.1Q header. */
+ struct eth_header *eh = packet->l2;
+ struct vlan_eth_header tmp;
+ memcpy(tmp.veth_dst, eh->eth_dst, ETH_ADDR_LEN);
+ memcpy(tmp.veth_src, eh->eth_src, ETH_ADDR_LEN);
+ tmp.veth_type = htons(ETH_TYPE_VLAN);
+ tmp.veth_tci = htons(tci);
+ tmp.veth_next_type = eh->eth_type;
+
+ veh = ofpbuf_push_uninit(packet, VLAN_HEADER_LEN);
+ memcpy(veh, &tmp, sizeof tmp);
+ packet->l2 = (char*)packet->l2 - VLAN_HEADER_LEN;
+ }
+
+ key->dl_vlan = veh->veth_tci & htons(VLAN_VID_MASK);
+}
+
+static void
+dp_netdev_strip_vlan(struct ofpbuf *packet, flow_t *key)
+{
+ struct vlan_eth_header *veh = packet->l2;
+ if (veh->veth_type == htons(ETH_TYPE_VLAN)) {
+ struct eth_header tmp;
+
+ memcpy(tmp.eth_dst, veh->veth_dst, ETH_ADDR_LEN);
+ memcpy(tmp.eth_src, veh->veth_src, ETH_ADDR_LEN);
+ tmp.eth_type = veh->veth_next_type;
+
+ packet->size -= VLAN_HEADER_LEN;
+ packet->data = (char*)packet->data + VLAN_HEADER_LEN;
+ packet->l2 = (char*)packet->l2 + VLAN_HEADER_LEN;
+ memcpy(packet->data, &tmp, sizeof tmp);
+
+ key->dl_vlan = htons(ODP_VLAN_NONE);
+ }
+}
+
+static void
+dp_netdev_set_dl_src(struct ofpbuf *packet,
+ const uint8_t dl_addr[ETH_ADDR_LEN])
+{
+ struct eth_header *eh = packet->l2;
+ memcpy(eh->eth_src, dl_addr, sizeof eh->eth_src);
+}
+
+static void
+dp_netdev_set_dl_dst(struct ofpbuf *packet,
+ const uint8_t dl_addr[ETH_ADDR_LEN])
+{
+ struct eth_header *eh = packet->l2;
+ memcpy(eh->eth_dst, dl_addr, sizeof eh->eth_dst);
+}
+
+static void
+dp_netdev_set_nw_addr(struct ofpbuf *packet, flow_t *key,
+ const struct odp_action_nw_addr *a)
+{
+ if (key->dl_type == htons(ETH_TYPE_IP)) {
+ struct ip_header *nh = packet->l3;
+ uint32_t *field;
+
+ field = a->type == ODPAT_SET_NW_SRC ? &nh->ip_src : &nh->ip_dst;
+ if (key->nw_proto == IP_TYPE_TCP) {
+ struct tcp_header *th = packet->l4;
+ th->tcp_csum = recalc_csum32(th->tcp_csum, *field, a->nw_addr);
+ } else if (key->nw_proto == IP_TYPE_UDP) {
+ struct udp_header *uh = packet->l4;
+ if (uh->udp_csum) {
+ uh->udp_csum = recalc_csum32(uh->udp_csum, *field, a->nw_addr);
+ if (!uh->udp_csum) {
+ uh->udp_csum = 0xffff;
+ }
+ }
+ }
+ nh->ip_csum = recalc_csum32(nh->ip_csum, *field, a->nw_addr);
+ *field = a->nw_addr;
+ }
+}
+
+static void
+dp_netdev_set_tp_port(struct ofpbuf *packet, flow_t *key,
+ const struct odp_action_tp_port *a)
+{
+ if (key->dl_type == htons(ETH_TYPE_IP)) {
+ uint16_t *field;
+ if (key->nw_proto == IPPROTO_TCP) {
+ struct tcp_header *th = packet->l4;
+ field = a->type == ODPAT_SET_TP_SRC ? &th->tcp_src : &th->tcp_dst;
+ th->tcp_csum = recalc_csum16(th->tcp_csum, *field, a->tp_port);
+ *field = a->tp_port;
+ } else if (key->nw_proto == IPPROTO_UDP) {
+ struct udp_header *uh = packet->l4;
+ field = a->type == ODPAT_SET_TP_SRC ? &uh->udp_src : &uh->udp_dst;
+ uh->udp_csum = recalc_csum16(uh->udp_csum, *field, a->tp_port);
+ *field = a->tp_port;
+ }
+ }
+}
+
+static void
+dp_netdev_output_port(struct dp_netdev *dp, struct ofpbuf *packet,
+ uint16_t out_port)
+{
+ struct dp_netdev_port *p = dp->ports[out_port];
+ if (p) {
+ netdev_send(p->netdev, packet);
+ }
+}
+
+static void
+dp_netdev_output_group(struct dp_netdev *dp, uint16_t group, uint16_t in_port,
+ struct ofpbuf *packet)
+{
+ struct odp_port_group *g = &dp->groups[group];
+ int i;
+
+ for (i = 0; i < g->n_ports; i++) {
+ uint16_t out_port = g->ports[i];
+ if (out_port != in_port) {
+ dp_netdev_output_port(dp, packet, out_port);
+ }
+ }
+}
+
+static int
+dp_netdev_output_control(struct dp_netdev *dp, const struct ofpbuf *packet,
+ int queue_no, int port_no, uint32_t arg)
+{
+ struct ovs_queue *q = &dp->queues[queue_no];
+ struct odp_msg *header;
+ struct ofpbuf *msg;
+ size_t msg_size;
+
+ if (q->n >= MAX_QUEUE_LEN) {
+ dp->n_lost++;
+ return ENOBUFS;
+ }
+
+ msg_size = sizeof *header + packet->size;
+ msg = ofpbuf_new(msg_size);
+ header = ofpbuf_put_uninit(msg, sizeof *header);
+ header->type = queue_no;
+ header->length = msg_size;
+ header->port = port_no;
+ header->arg = arg;
+ ofpbuf_put(msg, packet->data, packet->size);
+ queue_push_tail(q, msg);
+
+ return 0;
+}
+
+static int
+dp_netdev_execute_actions(struct dp_netdev *dp,
+ struct ofpbuf *packet, flow_t *key,
+ const union odp_action *actions, int n_actions)
+{
+ int i;
+ for (i = 0; i < n_actions; i++) {
+ const union odp_action *a = &actions[i];
+
+ switch (a->type) {
+ case ODPAT_OUTPUT:
+ dp_netdev_output_port(dp, packet, a->output.port);
+ break;
+
+ case ODPAT_OUTPUT_GROUP:
+ dp_netdev_output_group(dp, a->output_group.group, key->in_port,
+ packet);
+ break;
+
+ case ODPAT_CONTROLLER:
+ dp_netdev_output_control(dp, packet, _ODPL_ACTION_NR,
+ key->in_port, a->controller.arg);
+ break;
+
+ case ODPAT_SET_VLAN_VID:
+ dp_netdev_modify_vlan_tci(packet, key, ntohs(a->vlan_vid.vlan_vid),
+ VLAN_VID_MASK);
+ break;
+
+ case ODPAT_SET_VLAN_PCP:
+ dp_netdev_modify_vlan_tci(packet, key, a->vlan_pcp.vlan_pcp << 13,
+ VLAN_PCP_MASK);
+ break;
+
+ case ODPAT_STRIP_VLAN:
+ dp_netdev_strip_vlan(packet, key);
+ break;
+
+ case ODPAT_SET_DL_SRC:
+ dp_netdev_set_dl_src(packet, a->dl_addr.dl_addr);
+ break;
+
+ case ODPAT_SET_DL_DST:
+ dp_netdev_set_dl_dst(packet, a->dl_addr.dl_addr);
+ break;
+
+ case ODPAT_SET_NW_SRC:
+ case ODPAT_SET_NW_DST:
+ dp_netdev_set_nw_addr(packet, key, &a->nw_addr);
+ break;
+
+ case ODPAT_SET_TP_SRC:
+ case ODPAT_SET_TP_DST:
+ dp_netdev_set_tp_port(packet, key, &a->tp_port);
+ break;
+ }
+ }
+ return 0;
+}
+
+const struct dpif_class dpif_netdev_class = {
+ "netdev",
+ "netdev",
+ dp_netdev_run,
+ dp_netdev_wait,
+ NULL, /* enumerate */
+ dpif_netdev_open,
+ dpif_netdev_close,
+ NULL, /* get_all_names */
+ dpif_netdev_delete,
+ dpif_netdev_get_stats,
+ dpif_netdev_get_drop_frags,
+ dpif_netdev_set_drop_frags,
+ dpif_netdev_port_add,
+ dpif_netdev_port_del,
+ dpif_netdev_port_query_by_number,
+ dpif_netdev_port_query_by_name,
+ dpif_netdev_port_list,
+ dpif_netdev_port_poll,
+ dpif_netdev_port_poll_wait,
+ dpif_netdev_port_group_get,
+ dpif_netdev_port_group_set,
+ dpif_netdev_flow_get,
+ dpif_netdev_flow_put,
+ dpif_netdev_flow_del,
+ dpif_netdev_flow_flush,
+ dpif_netdev_flow_list,
+ dpif_netdev_execute,
+ dpif_netdev_recv_get_mask,
+ dpif_netdev_recv_set_mask,
+ dpif_netdev_recv,
+ dpif_netdev_recv_wait,
+};
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DPIF_PROVIDER_H
+#define DPIF_PROVIDER_H 1
+
+/* Provider interface to dpifs, which provide an interface to an Open vSwitch
+ * datapath. */
+
+#include <assert.h>
+#include "dpif.h"
+
+/* Open vSwitch datapath interface.
+ *
+ * This structure should be treated as opaque by dpif implementations. */
+struct dpif {
+ const struct dpif_class *class;
+ char *name;
+ uint8_t netflow_engine_type;
+ uint8_t netflow_engine_id;
+};
+
+void dpif_init(struct dpif *, const struct dpif_class *, const char *name,
+ uint8_t netflow_engine_type, uint8_t netflow_engine_id);
+static inline void dpif_assert_class(const struct dpif *dpif,
+ const struct dpif_class *class)
+{
+ assert(dpif->class == class);
+}
+
+/* Datapath interface class structure, to be defined by each implementation of
+ * a datapath interface
+ *
+ * These functions return 0 if successful or a positive errno value on failure,
+ * except where otherwise noted.
+ *
+ * These functions are expected to execute synchronously, that is, to block as
+ * necessary to obtain a result. Thus, they may not return EAGAIN or
+ * EWOULDBLOCK or EINPROGRESS. We may relax this requirement in the future if
+ * and when we encounter performance problems. */
+struct dpif_class {
+ /* Prefix for names of dpifs in this class, e.g. "udatapath:".
+ *
+ * One dpif class may have the empty string "" as its prefix, in which case
+ * that dpif class is associated with dpif names that don't match any other
+ * class name. */
+ const char *prefix;
+
+ /* Class name, for use in error messages. */
+ const char *name;
+
+ /* Performs periodic work needed by dpifs of this class, if any is
+ * necessary. */
+ void (*run)(void);
+
+ /* Arranges for poll_block() to wake up if the "run" member function needs
+ * to be called. */
+ void (*wait)(void);
+
+ /* Enumerates the names of all known created datapaths, if possible, into
+ * 'all_dps'. The caller has already initialized 'all_dps' and other dpif
+ * classes might already have added names to it.
+ *
+ * This is used by the vswitch at startup, so that it can delete any
+ * datapaths that are not configured.
+ *
+ * Some kinds of datapaths might not be practically enumerable, in which
+ * case this function may be a null pointer. */
+ int (*enumerate)(struct svec *all_dps);
+
+ /* Attempts to open an existing dpif, if 'create' is false, or to open an
+ * existing dpif or create a new one, if 'create' is true. 'name' is the
+ * full dpif name provided by the user, e.g. "udatapath:/var/run/mypath".
+ * This name is useful for error messages but must not be modified.
+ *
+ * 'suffix' is a copy of 'name' following the dpif's 'prefix'.
+ *
+ * If successful, stores a pointer to the new dpif in '*dpifp'. On failure
+ * there are no requirements on what is stored in '*dpifp'. */
+ int (*open)(const char *name, char *suffix, bool create,
+ struct dpif **dpifp);
+
+ /* Closes 'dpif' and frees associated memory. */
+ void (*close)(struct dpif *dpif);
+
+ /* Enumerates all names that may be used to open 'dpif' into 'all_names'.
+ * The Linux datapath, for example, supports opening a datapath both by
+ * number, e.g. "dp0", and by the name of the datapath's local port. For
+ * some datapaths, this might be an infinite set (e.g. in a file name,
+ * slashes may be duplicated any number of times), in which case only the
+ * names most likely to be used should be enumerated.
+ *
+ * The caller has already initialized 'all_names' and might already have
+ * added some names to it. This function should not disturb any existing
+ * names in 'all_names'.
+ *
+ * If a datapath class does not support multiple names for a datapath, this
+ * function may be a null pointer.
+ *
+ * This is used by the vswitch at startup, */
+ int (*get_all_names)(const struct dpif *dpif, struct svec *all_names);
+
+ /* Attempts to destroy the dpif underlying 'dpif'.
+ *
+ * If successful, 'dpif' will not be used again except as an argument for
+ * the 'close' member function. */
+ int (*delete)(struct dpif *dpif);
+
+ /* Retrieves statistics for 'dpif' into 'stats'. */
+ int (*get_stats)(const struct dpif *dpif, struct odp_stats *stats);
+
+ /* Retrieves 'dpif''s current treatment of IP fragments into '*drop_frags':
+ * true indicates that fragments are dropped, false indicates that
+ * fragments are treated in the same way as other IP packets (except that
+ * the L4 header cannot be read). */
+ int (*get_drop_frags)(const struct dpif *dpif, bool *drop_frags);
+
+ /* Changes 'dpif''s treatment of IP fragments to 'drop_frags', whose
+ * meaning is the same as for the get_drop_frags member function. */
+ int (*set_drop_frags)(struct dpif *dpif, bool drop_frags);
+
+ /* Creates a new port in 'dpif' connected to network device 'devname'.
+ * 'flags' is a set of ODP_PORT_* flags. If successful, sets '*port_no'
+ * to the new port's port number. */
+ int (*port_add)(struct dpif *dpif, const char *devname, uint16_t flags,
+ uint16_t *port_no);
+
+ /* Removes port numbered 'port_no' from 'dpif'. */
+ int (*port_del)(struct dpif *dpif, uint16_t port_no);
+
+ /* Queries 'dpif' for a port with the given 'port_no' or 'devname'. Stores
+ * information about the port into '*port' if successful. */
+ int (*port_query_by_number)(const struct dpif *dpif, uint16_t port_no,
+ struct odp_port *port);
+ int (*port_query_by_name)(const struct dpif *dpif, const char *devname,
+ struct odp_port *port);
+
+ /* Stores in 'ports' information about up to 'n' ports attached to 'dpif',
+ * in no particular order. Returns the number of ports attached to 'dpif'
+ * (not the number stored), if successful, otherwise a negative errno
+ * value. */
+ int (*port_list)(const struct dpif *dpif, struct odp_port *ports, int n);
+
+ /* Polls for changes in the set of ports in 'dpif'. If the set of ports in
+ * 'dpif' has changed, then this function should do one of the
+ * following:
+ *
+ * - Preferably: store the name of the device that was added to or deleted
+ * from 'dpif' in '*devnamep' and return 0. The caller is responsible
+ * for freeing '*devnamep' (with free()) when it no longer needs it.
+ *
+ * - Alternatively: return ENOBUFS, without indicating the device that was
+ * added or deleted.
+ *
+ * Occasional 'false positives', in which the function returns 0 while
+ * indicating a device that was not actually added or deleted or returns
+ * ENOBUFS without any change, are acceptable.
+ *
+ * If the set of ports in 'dpif' has not changed, returns EAGAIN. May also
+ * return other positive errno values to indicate that something has gone
+ * wrong. */
+ int (*port_poll)(const struct dpif *dpif, char **devnamep);
+
+ /* Arranges for the poll loop to wake up when 'port_poll' will return a
+ * value other than EAGAIN. */
+ void (*port_poll_wait)(const struct dpif *dpif);
+
+ /* Stores in 'ports' the port numbers of up to 'n' ports that belong to
+ * 'group' in 'dpif'. Returns the number of ports in 'group' (not the
+ * number stored), if successful, otherwise a negative errno value. */
+ int (*port_group_get)(const struct dpif *dpif, int group,
+ uint16_t ports[], int n);
+
+ /* Changes port group 'group' in 'dpif' to consist of the 'n' ports whose
+ * numbers are given in 'ports'.
+ *
+ * Use the get_stats member function to obtain the number of supported port
+ * groups. */
+ int (*port_group_set)(struct dpif *dpif, int group,
+ const uint16_t ports[], int n);
+
+ /* For each flow 'flow' in the 'n' flows in 'flows':
+ *
+ * - If a flow matching 'flow->key' exists in 'dpif':
+ *
+ * Stores 0 into 'flow->stats.error' and stores statistics for the flow
+ * into 'flow->stats'.
+ *
+ * If 'flow->n_actions' is zero, then 'flow->actions' is ignored. If
+ * 'flow->n_actions' is nonzero, then 'flow->actions' should point to
+ * an array of the specified number of actions. At most that many of
+ * the flow's actions will be copied into that array.
+ * 'flow->n_actions' will be updated to the number of actions actually
+ * present in the flow, which may be greater than the number stored if
+ * the flow has more actions than space available in the array.
+ *
+ * - Flow-specific errors are indicated by a positive errno value in
+ * 'flow->stats.error'. In particular, ENOENT indicates that no flow
+ * matching 'flow->key' exists in 'dpif'. When an error value is stored,
+ * the contents of 'flow->key' are preserved but other members of 'flow'
+ * should be treated as indeterminate.
+ *
+ * Returns 0 if all 'n' flows in 'flows' were updated (whether they were
+ * individually successful or not is indicated by 'flow->stats.error',
+ * however). Returns a positive errno value if an error that prevented
+ * this update occurred, in which the caller must not depend on any
+ * elements in 'flows' being updated or not updated.
+ */
+ int (*flow_get)(const struct dpif *dpif, struct odp_flow flows[], int n);
+
+ /* Adds or modifies a flow in 'dpif' as specified in 'put':
+ *
+ * - If the flow specified in 'put->flow' does not exist in 'dpif', then
+ * behavior depends on whether ODPPF_CREATE is specified in 'put->flags':
+ * if it is, the flow will be added, otherwise the operation will fail
+ * with ENOENT.
+ *
+ * - Otherwise, the flow specified in 'put->flow' does exist in 'dpif'.
+ * Behavior in this case depends on whether ODPPF_MODIFY is specified in
+ * 'put->flags': if it is, the flow's actions will be updated, otherwise
+ * the operation will fail with EEXIST. If the flow's actions are
+ * updated, then its statistics will be zeroed if ODPPF_ZERO_STATS is set
+ * in 'put->flags', left as-is otherwise.
+ */
+ int (*flow_put)(struct dpif *dpif, struct odp_flow_put *put);
+
+ /* Deletes a flow matching 'flow->key' from 'dpif' or returns ENOENT if
+ * 'dpif' does not contain such a flow.
+ *
+ * If successful, updates 'flow->stats', 'flow->n_actions', and
+ * 'flow->actions' as described in more detail under the flow_get member
+ * function below. */
+ int (*flow_del)(struct dpif *dpif, struct odp_flow *flow);
+
+ /* Deletes all flows from 'dpif' and clears all of its queues of received
+ * packets. */
+ int (*flow_flush)(struct dpif *dpif);
+
+ /* Stores up to 'n' flows in 'dpif' into 'flows', updating their statistics
+ * and actions as described under the flow_get member function. If
+ * successful, returns the number of flows actually present in 'dpif',
+ * which might be greater than the number stored (if 'dpif' has more than
+ * 'n' flows). On failure, returns a negative errno value. */
+ int (*flow_list)(const struct dpif *dpif, struct odp_flow flows[], int n);
+
+ /* Performs the 'n_actions' actions in 'actions' on the Ethernet frame
+ * specified in 'packet'.
+ *
+ * Pretends that the frame was originally received on the port numbered
+ * 'in_port'. This affects only ODPAT_OUTPUT_GROUP actions, which will not
+ * send a packet out their input port. Specify the number of an unused
+ * port (e.g. UINT16_MAX is currently always unused) to avoid this
+ * behavior. */
+ int (*execute)(struct dpif *dpif, uint16_t in_port,
+ const union odp_action actions[], int n_actions,
+ const struct ofpbuf *packet);
+
+ /* Retrieves 'dpif''s "listen mask" into '*listen_mask'. Each ODPL_* bit
+ * set in '*listen_mask' indicates the 'dpif' will receive messages of the
+ * corresponding type when it calls the recv member function. */
+ int (*recv_get_mask)(const struct dpif *dpif, int *listen_mask);
+
+ /* Sets 'dpif''s "listen mask" to 'listen_mask'. Each ODPL_* bit set in
+ * 'listen_mask' indicates the 'dpif' will receive messages of the
+ * corresponding type when it calls the recv member function. */
+ int (*recv_set_mask)(struct dpif *dpif, int listen_mask);
+
+ /* Attempts to receive a message from 'dpif'. If successful, stores the
+ * message into '*packetp'. The message, if one is received, must begin
+ * with 'struct odp_msg' as a header. Only messages of the types selected
+ * with the set_listen_mask member function should be received.
+ *
+ * This function must not block. If no message is ready to be received
+ * when it is called, it should return EAGAIN without blocking. */
+ int (*recv)(struct dpif *dpif, struct ofpbuf **packetp);
+
+ /* Arranges for the poll loop to wake up when 'dpif' has a message queued
+ * to be received with the recv member function. */
+ void (*recv_wait)(struct dpif *dpif);
+};
+
+extern const struct dpif_class dpif_linux_class;
+extern const struct dpif_class dpif_netdev_class;
+
+#endif /* dpif-provider.h */
*/
#include <config.h>
-#include "dpif.h"
+#include "dpif-provider.h"
#include <assert.h>
#include <ctype.h>
#include <errno.h>
-#include <fcntl.h>
#include <inttypes.h>
-#include <net/if.h>
-#include <linux/rtnetlink.h>
-#include <linux/ethtool.h>
-#include <linux/sockios.h>
-#include <netinet/in.h>
#include <stdlib.h>
#include <string.h>
-#include <sys/ioctl.h>
-#include <sys/stat.h>
-#include <sys/sysmacros.h>
-#include <unistd.h>
#include "coverage.h"
#include "dynamic-string.h"
#include "vlog.h"
#define THIS_MODULE VLM_dpif
+static const struct dpif_class *dpif_classes[] = {
+ &dpif_linux_class,
+ &dpif_netdev_class,
+};
+enum { N_DPIF_CLASSES = ARRAY_SIZE(dpif_classes) };
+
/* Rate limit for individual messages going to or from the datapath, output at
* DBG level. This is very high because, if these are enabled, it is because
* we really need to see them. */
/* Not really much point in logging many dpif errors. */
static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(9999, 5);
-static int get_minor_from_name(const char *name, unsigned int *minor);
-static int name_to_minor(const char *name, unsigned int *minor);
-static int lookup_minor(const char *name, unsigned int *minor);
-static int open_by_minor(unsigned int minor, struct dpif *);
-static int make_openvswitch_device(unsigned int minor, char **fnp);
+static void log_operation(const struct dpif *, const char *operation,
+ int error);
+static void log_flow_operation(const struct dpif *, const char *operation,
+ int error, struct odp_flow *flow);
+static void log_flow_put(struct dpif *, int error,
+ const struct odp_flow_put *);
+static bool should_log_flow_message(int error);
static void check_rw_odp_flow(struct odp_flow *);
+/* Performs periodic work needed by all the various kinds of dpifs.
+ *
+ * If your program opens any dpifs, it must call both this function and
+ * netdev_run() within its main poll loop. */
+void
+dp_run(void)
+{
+ int i;
+ for (i = 0; i < N_DPIF_CLASSES; i++) {
+ const struct dpif_class *class = dpif_classes[i];
+ if (class->run) {
+ class->run();
+ }
+ }
+}
-/* Clears 'all_dps' and enumerates the names of all known created
- * datapaths into it. Returns 0 if successful, otherwise a positive
- * errno value. */
+/* Arranges for poll_block() to wake up when dp_run() needs to be called.
+ *
+ * If your program opens any dpifs, it must call both this function and
+ * netdev_wait() within its main poll loop. */
+void
+dp_wait(void)
+{
+ int i;
+ for (i = 0; i < N_DPIF_CLASSES; i++) {
+ const struct dpif_class *class = dpif_classes[i];
+ if (class->wait) {
+ class->wait();
+ }
+ }
+}
+
+/* Clears 'all_dps' and enumerates the names of all known created datapaths,
+ * where possible, into it. The caller must first initialize 'all_dps'.
+ * Returns 0 if successful, otherwise a positive errno value.
+ *
+ * Some kinds of datapaths might not be practically enumerable. This is not
+ * considered an error. */
int
dp_enumerate(struct svec *all_dps)
{
svec_clear(all_dps);
error = 0;
- for (i = 0; i < ODP_MAX; i++) {
- struct dpif dpif;
- char devname[16];
- int retval;
-
- sprintf(devname, "dp%d", i);
- retval = dpif_open(devname, &dpif);
- if (!retval) {
- svec_add(all_dps, devname);
- dpif_close(&dpif);
- } else if (retval != ENODEV && !error) {
- error = retval;
+ for (i = 0; i < N_DPIF_CLASSES; i++) {
+ const struct dpif_class *class = dpif_classes[i];
+ int retval = class->enumerate ? class->enumerate(all_dps) : 0;
+ if (retval) {
+ VLOG_WARN("failed to enumerate %s datapaths: %s",
+ class->name, strerror(retval));
+ if (!error) {
+ error = retval;
+ }
}
}
return error;
}
-int
-dpif_open(const char *name, struct dpif *dpif)
+static int
+do_open(const char *name_, bool create, struct dpif **dpifp)
{
- int listen_mask;
+ char *name = xstrdup(name_);
+ char *prefix, *suffix, *colon;
+ struct dpif *dpif = NULL;
int error;
+ int i;
- dpif->fd = -1;
-
- error = name_to_minor(name, &dpif->minor);
- if (error) {
- return error;
- }
-
- error = open_by_minor(dpif->minor, dpif);
- if (error) {
- return error;
+ colon = strchr(name, ':');
+ if (colon) {
+ *colon = '\0';
+ prefix = name;
+ suffix = colon + 1;
+ } else {
+ prefix = "";
+ suffix = name;
}
- /* We can open the device, but that doesn't mean that it's been created.
- * If it hasn't been, then any command other than ODP_DP_CREATE will
- * return ENODEV. Try something innocuous. */
- listen_mask = 0; /* Make Valgrind happy. */
- if (ioctl(dpif->fd, ODP_GET_LISTEN_MASK, &listen_mask)) {
- error = errno;
- if (error != ENODEV) {
- VLOG_WARN("dp%u: probe returned unexpected error: %s",
- dpif->minor, strerror(error));
+ for (i = 0; i < N_DPIF_CLASSES; i++) {
+ const struct dpif_class *class = dpif_classes[i];
+ if (!strcmp(prefix, class->prefix)) {
+ error = class->open(name_, suffix, create, &dpif);
+ goto exit;
}
- dpif_close(dpif);
- return error;
}
- return 0;
+ error = EAFNOSUPPORT;
+
+exit:
+ *dpifp = error ? NULL : dpif;
+ return error;
+}
+
+/* Tries to open an existing datapath named 'name'. Will fail if no datapath
+ * named 'name' exists. Returns 0 if successful, otherwise a positive errno
+ * value. On success stores a pointer to the datapath in '*dpifp', otherwise a
+ * null pointer. */
+int
+dpif_open(const char *name, struct dpif **dpifp)
+{
+ return do_open(name, false, dpifp);
+}
+
+/* Tries to create and open a new datapath with the given 'name'. Will fail if
+ * a datapath named 'name' already exists. Returns 0 if successful, otherwise
+ * a positive errno value. On success stores a pointer to the datapath in
+ * '*dpifp', otherwise a null pointer.*/
+int
+dpif_create(const char *name, struct dpif **dpifp)
+{
+ return do_open(name, true, dpifp);
}
+/* Closes and frees the connection to 'dpif'. Does not destroy the datapath
+ * itself; call dpif_delete() first, instead, if that is desirable. */
void
dpif_close(struct dpif *dpif)
{
if (dpif) {
- close(dpif->fd);
- dpif->fd = -1;
+ char *name = dpif->name;
+ dpif->class->close(dpif);
+ free(name);
}
}
-static int
-do_ioctl(const struct dpif *dpif, int cmd, const char *cmd_name,
- const void *arg)
+/* Returns the name of datapath 'dpif' (for use in log messages). */
+const char *
+dpif_name(const struct dpif *dpif)
{
- int error = ioctl(dpif->fd, cmd, arg) ? errno : 0;
- if (cmd_name) {
- if (error) {
- VLOG_WARN_RL(&error_rl, "dp%u: ioctl(%s) failed (%s)",
- dpif->minor, cmd_name, strerror(error));
- } else {
- VLOG_DBG_RL(&dpmsg_rl, "dp%u: ioctl(%s): success",
- dpif->minor, cmd_name);
- }
- }
- return error;
+ return dpif->name;
}
+/* Enumerates all names that may be used to open 'dpif' into 'all_names'. The
+ * Linux datapath, for example, supports opening a datapath both by number,
+ * e.g. "dp0", and by the name of the datapath's local port. For some
+ * datapaths, this might be an infinite set (e.g. in a file name, slashes may
+ * be duplicated any number of times), in which case only the names most likely
+ * to be used will be enumerated.
+ *
+ * The caller must already have initialized 'all_names'. Any existing names in
+ * 'all_names' will not be disturbed. */
int
-dpif_create(const char *name, struct dpif *dpif)
+dpif_get_all_names(const struct dpif *dpif, struct svec *all_names)
{
- unsigned int minor;
- int error;
-
- if (!get_minor_from_name(name, &minor)) {
- /* Minor was specified in 'name', go ahead and create it. */
- error = open_by_minor(minor, dpif);
+ if (dpif->class->get_all_names) {
+ int error = dpif->class->get_all_names(dpif, all_names);
if (error) {
- return error;
- }
-
- if (!strncmp(name, "nl:", 3)) {
- char devname[128];
- sprintf(devname, "of%u", minor);
- error = ioctl(dpif->fd, ODP_DP_CREATE, devname) < 0 ? errno : 0;
- } else {
- error = ioctl(dpif->fd, ODP_DP_CREATE, name) < 0 ? errno : 0;
- }
- if (error) {
- dpif_close(dpif);
+ VLOG_WARN_RL(&error_rl,
+ "failed to retrieve names for datpath %s: %s",
+ dpif_name(dpif), strerror(error));
}
return error;
} else {
- for (minor = 0; minor < ODP_MAX; minor++) {
- error = open_by_minor(minor, dpif);
- if (error) {
- return error;
- }
-
- error = ioctl(dpif->fd, ODP_DP_CREATE, name) < 0 ? errno : 0;
- if (!error) {
- return 0;
- }
- dpif_close(dpif);
- if (error != EBUSY) {
- return error;
- }
- }
- return ENOBUFS;
+ svec_add(all_names, dpif_name(dpif));
+ return 0;
}
}
+/* Destroys the datapath that 'dpif' is connected to, first removing all of its
+ * ports. After calling this function, it does not make sense to pass 'dpif'
+ * to any functions other than dpif_name() or dpif_close(). */
int
-dpif_get_name(struct dpif *dpif, char *name, size_t name_size)
+dpif_delete(struct dpif *dpif)
{
- struct odp_port port;
int error;
- assert(name_size > 0);
- *name = '\0';
+ COVERAGE_INC(dpif_destroy);
- error = dpif_port_query_by_number(dpif, ODPP_LOCAL, &port);
- if (!error) {
- ovs_strlcpy(name, port.devname, name_size);
- }
+ error = dpif->class->delete(dpif);
+ log_operation(dpif, "delete", error);
return error;
}
-int
-dpif_delete(struct dpif *dpif)
-{
- COVERAGE_INC(dpif_destroy);
- return do_ioctl(dpif, ODP_DP_DESTROY, "ODP_DP_DESTROY", NULL);
-}
-
+/* Retrieves statistics for 'dpif' into 'stats'. Returns 0 if successful,
+ * otherwise a positive errno value. */
int
dpif_get_dp_stats(const struct dpif *dpif, struct odp_stats *stats)
{
- memset(stats, 0, sizeof *stats);
- return do_ioctl(dpif, ODP_DP_STATS, "ODP_DP_STATS", stats);
-}
-
-int
-dpif_get_drop_frags(const struct dpif *dpif, bool *drop_frags)
-{
- int tmp;
- int error = do_ioctl(dpif, ODP_GET_DROP_FRAGS, "ODP_GET_DROP_FRAGS", &tmp);
- *drop_frags = error ? tmp & 1 : false;
+ int error = dpif->class->get_stats(dpif, stats);
+ if (error) {
+ memset(stats, 0, sizeof *stats);
+ }
+ log_operation(dpif, "get_stats", error);
return error;
}
+/* Retrieves the current IP fragment handling policy for 'dpif' into
+ * '*drop_frags': true indicates that fragments are dropped, false indicates
+ * that fragments are treated in the same way as other IP packets (except that
+ * the L4 header cannot be read). Returns 0 if successful, otherwise a
+ * positive errno value. */
int
-dpif_set_drop_frags(struct dpif *dpif, bool drop_frags)
-{
- int tmp = drop_frags;
- return do_ioctl(dpif, ODP_SET_DROP_FRAGS, "ODP_SET_DROP_FRAGS", &tmp);
-}
-
-int
-dpif_get_listen_mask(const struct dpif *dpif, int *listen_mask)
+dpif_get_drop_frags(const struct dpif *dpif, bool *drop_frags)
{
- int error = do_ioctl(dpif, ODP_GET_LISTEN_MASK, "ODP_GET_LISTEN_MASK",
- listen_mask);
+ int error = dpif->class->get_drop_frags(dpif, drop_frags);
if (error) {
- *listen_mask = 0;
+ *drop_frags = false;
}
+ log_operation(dpif, "get_drop_frags", error);
return error;
}
+/* Changes 'dpif''s treatment of IP fragments to 'drop_frags', whose meaning is
+ * the same as for the get_drop_frags member function. Returns 0 if
+ * successful, otherwise a positive errno value. */
int
-dpif_set_listen_mask(struct dpif *dpif, int listen_mask)
+dpif_set_drop_frags(struct dpif *dpif, bool drop_frags)
{
- return do_ioctl(dpif, ODP_SET_LISTEN_MASK, "ODP_SET_LISTEN_MASK",
- &listen_mask);
+ int error = dpif->class->set_drop_frags(dpif, drop_frags);
+ log_operation(dpif, "set_drop_frags", error);
+ return error;
}
+/* Attempts to add 'devname' as a port on 'dpif', given the combination of
+ * ODP_PORT_* flags in 'flags'. If successful, returns 0 and sets '*port_nop'
+ * to the new port's port number (if 'port_nop' is non-null). On failure,
+ * returns a positive errno value and sets '*port_nop' to UINT16_MAX (if
+ * 'port_nop' is non-null). */
int
-dpif_purge(struct dpif *dpif)
+dpif_port_add(struct dpif *dpif, const char *devname, uint16_t flags,
+ uint16_t *port_nop)
{
- struct odp_stats stats;
- unsigned int i;
+ uint16_t port_no;
int error;
- COVERAGE_INC(dpif_purge);
-
- error = dpif_get_dp_stats(dpif, &stats);
- if (error) {
- return error;
- }
-
- for (i = 0; i < stats.max_miss_queue + stats.max_action_queue; i++) {
- struct ofpbuf *buf;
- error = dpif_recv(dpif, &buf);
- if (error) {
- return error == EAGAIN ? 0 : error;
- }
- ofpbuf_delete(buf);
- }
- return 0;
-}
-
-int
-dpif_port_add(struct dpif *dpif, const char *devname, uint16_t port_no,
- uint16_t flags)
-{
- struct odp_port port;
-
COVERAGE_INC(dpif_port_add);
- memset(&port, 0, sizeof port);
- strncpy(port.devname, devname, sizeof port.devname);
- port.port = port_no;
- port.flags = flags;
- if (!ioctl(dpif->fd, ODP_PORT_ADD, &port)) {
- VLOG_DBG_RL(&dpmsg_rl, "dp%u: added %s as port %"PRIu16,
- dpif->minor, devname, port_no);
- return 0;
+
+ error = dpif->class->port_add(dpif, devname, flags, &port_no);
+ if (!error) {
+ VLOG_DBG_RL(&dpmsg_rl, "%s: added %s as port %"PRIu16,
+ dpif_name(dpif), devname, port_no);
} else {
- VLOG_WARN_RL(&error_rl, "dp%u: failed to add %s as port "
- "%"PRIu16": %s", dpif->minor, devname, port_no,
- strerror(errno));
- return errno;
+ VLOG_WARN_RL(&error_rl, "%s: failed to add %s as port: %s",
+ dpif_name(dpif), devname, strerror(error));
+ port_no = UINT16_MAX;
+ }
+ if (port_nop) {
+ *port_nop = port_no;
}
+ return error;
}
+/* Attempts to remove 'dpif''s port number 'port_no'. Returns 0 if successful,
+ * otherwise a positive errno value. */
int
dpif_port_del(struct dpif *dpif, uint16_t port_no)
{
- int tmp = port_no;
+ int error;
+
COVERAGE_INC(dpif_port_del);
- return do_ioctl(dpif, ODP_PORT_DEL, "ODP_PORT_DEL", &tmp);
+
+ error = dpif->class->port_del(dpif, port_no);
+ log_operation(dpif, "port_del", error);
+ return error;
}
+/* Looks up port number 'port_no' in 'dpif'. On success, returns 0 and
+ * initializes '*port' appropriately; on failure, returns a positive errno
+ * value. */
int
dpif_port_query_by_number(const struct dpif *dpif, uint16_t port_no,
struct odp_port *port)
{
- memset(port, 0, sizeof *port);
- port->port = port_no;
- if (!ioctl(dpif->fd, ODP_PORT_QUERY, port)) {
- VLOG_DBG_RL(&dpmsg_rl, "dp%u: port %"PRIu16" is device %s",
- dpif->minor, port_no, port->devname);
- return 0;
+ int error = dpif->class->port_query_by_number(dpif, port_no, port);
+ if (!error) {
+ VLOG_DBG_RL(&dpmsg_rl, "%s: port %"PRIu16" is device %s",
+ dpif_name(dpif), port_no, port->devname);
} else {
- VLOG_WARN_RL(&error_rl, "dp%u: failed to query port %"PRIu16": %s",
- dpif->minor, port_no, strerror(errno));
- return errno;
+ memset(port, 0, sizeof *port);
+ VLOG_WARN_RL(&error_rl, "%s: failed to query port %"PRIu16": %s",
+ dpif_name(dpif), port_no, strerror(error));
}
+ return error;
}
+/* Looks up port named 'devname' in 'dpif'. On success, returns 0 and
+ * initializes '*port' appropriately; on failure, returns a positive errno
+ * value. */
int
dpif_port_query_by_name(const struct dpif *dpif, const char *devname,
struct odp_port *port)
{
- memset(port, 0, sizeof *port);
- strncpy(port->devname, devname, sizeof port->devname);
- if (!ioctl(dpif->fd, ODP_PORT_QUERY, port)) {
- VLOG_DBG_RL(&dpmsg_rl, "dp%u: device %s is on port %"PRIu16,
- dpif->minor, devname, port->port);
- return 0;
+ int error = dpif->class->port_query_by_name(dpif, devname, port);
+ if (!error) {
+ VLOG_DBG_RL(&dpmsg_rl, "%s: device %s is on port %"PRIu16,
+ dpif_name(dpif), devname, port->port);
} else {
- VLOG_WARN_RL(&error_rl, "dp%u: failed to query port %s: %s",
- dpif->minor, devname, strerror(errno));
- return errno;
+ memset(port, 0, sizeof *port);
+
+ /* Log level is DBG here because all the current callers are interested
+ * in whether 'dpif' actually has a port 'devname', so that it's not an
+ * issue worth logging if it doesn't. */
+ VLOG_DBG_RL(&error_rl, "%s: failed to query port %s: %s",
+ dpif_name(dpif), devname, strerror(error));
}
+ return error;
}
/* Looks up port number 'port_no' in 'dpif'. On success, returns 0 and copies
return error;
}
+/* Obtains a list of all the ports in 'dpif'.
+ *
+ * If successful, returns 0 and sets '*portsp' to point to an array of
+ * appropriately initialized port structures and '*n_portsp' to the number of
+ * ports in the array. The caller is responsible for freeing '*portp' by
+ * calling free().
+ *
+ * On failure, returns a positive errno value and sets '*portsp' to NULL and
+ * '*n_portsp' to 0. */
int
dpif_port_list(const struct dpif *dpif,
- struct odp_port **ports, size_t *n_ports)
+ struct odp_port **portsp, size_t *n_portsp)
{
- struct odp_portvec pv;
- struct odp_stats stats;
+ struct odp_port *ports;
+ size_t n_ports = 0;
int error;
- do {
+ for (;;) {
+ struct odp_stats stats;
+ int retval;
+
error = dpif_get_dp_stats(dpif, &stats);
if (error) {
- goto error;
+ goto exit;
}
- *ports = xcalloc(1, stats.n_ports * sizeof **ports);
- pv.ports = *ports;
- pv.n_ports = stats.n_ports;
- error = do_ioctl(dpif, ODP_PORT_LIST, "ODP_PORT_LIST", &pv);
- if (error) {
- free(*ports);
- goto error;
+ ports = xcalloc(stats.n_ports, sizeof *ports);
+ retval = dpif->class->port_list(dpif, ports, stats.n_ports);
+ if (retval < 0) {
+ /* Hard error. */
+ error = -retval;
+ free(ports);
+ goto exit;
+ } else if (retval <= stats.n_ports) {
+ /* Success. */
+ error = 0;
+ n_ports = retval;
+ goto exit;
+ } else {
+ /* Soft error: port count increased behind our back. Try again. */
+ free(ports);
}
- } while (pv.n_ports != stats.n_ports);
- *n_ports = pv.n_ports;
- return 0;
+ }
-error:
- *ports = NULL;
- *n_ports = 0;
+exit:
+ if (error) {
+ *portsp = NULL;
+ *n_portsp = 0;
+ } else {
+ *portsp = ports;
+ *n_portsp = n_ports;
+ }
+ log_operation(dpif, "port_list", error);
return error;
}
+/* Polls for changes in the set of ports in 'dpif'. If the set of ports in
+ * 'dpif' has changed, this function does one of the following:
+ *
+ * - Stores the name of the device that was added to or deleted from 'dpif' in
+ * '*devnamep' and returns 0. The caller is responsible for freeing
+ * '*devnamep' (with free()) when it no longer needs it.
+ *
+ * - Returns ENOBUFS and sets '*devnamep' to NULL.
+ *
+ * This function may also return 'false positives', where it returns 0 and
+ * '*devnamep' names a device that was not actually added or deleted or it
+ * returns ENOBUFS without any change.
+ *
+ * Returns EAGAIN if the set of ports in 'dpif' has not changed. May also
+ * return other positive errno values to indicate that something has gone
+ * wrong. */
int
-dpif_port_group_set(struct dpif *dpif, uint16_t group,
- const uint16_t ports[], size_t n_ports)
+dpif_port_poll(const struct dpif *dpif, char **devnamep)
{
- struct odp_port_group pg;
+ int error = dpif->class->port_poll(dpif, devnamep);
+ if (error) {
+ *devnamep = NULL;
+ }
+ return error;
+}
- COVERAGE_INC(dpif_port_group_set);
- assert(n_ports <= UINT16_MAX);
- pg.group = group;
- pg.ports = (uint16_t *) ports;
- pg.n_ports = n_ports;
- return do_ioctl(dpif, ODP_PORT_GROUP_SET, "ODP_PORT_GROUP_SET", &pg);
+/* Arranges for the poll loop to wake up when port_poll(dpif) will return a
+ * value other than EAGAIN. */
+void
+dpif_port_poll_wait(const struct dpif *dpif)
+{
+ dpif->class->port_poll_wait(dpif);
}
-/* Careful: '*n_out' can be greater than 'n_ports' on return, if 'n_ports' is
- * less than the number of ports in 'group'. */
+/* Retrieves a list of the port numbers in port group 'group' in 'dpif'.
+ *
+ * On success, returns 0 and points '*ports' to a newly allocated array of
+ * integers, each of which is a 'dpif' port number for a port in
+ * 'group'. Stores the number of elements in the array in '*n_ports'. The
+ * caller is responsible for freeing '*ports' by calling free().
+ *
+ * On failure, returns a positive errno value and sets '*ports' to NULL and
+ * '*n_ports' to 0. */
int
dpif_port_group_get(const struct dpif *dpif, uint16_t group,
- uint16_t ports[], size_t n_ports, size_t *n_out)
+ uint16_t **ports, size_t *n_ports)
+{
+ int error;
+
+ *ports = NULL;
+ *n_ports = 0;
+ for (;;) {
+ int retval = dpif->class->port_group_get(dpif, group,
+ *ports, *n_ports);
+ if (retval < 0) {
+ /* Hard error. */
+ error = -retval;
+ free(*ports);
+ *ports = NULL;
+ *n_ports = 0;
+ break;
+ } else if (retval <= *n_ports) {
+ /* Success. */
+ error = 0;
+ *n_ports = retval;
+ break;
+ } else {
+ /* Soft error: there were more ports than we expected in the
+ * group. Try again. */
+ free(*ports);
+ *ports = xcalloc(retval, sizeof **ports);
+ *n_ports = retval;
+ }
+ }
+ log_operation(dpif, "port_group_get", error);
+ return error;
+}
+
+/* Updates port group 'group' in 'dpif', making it contain the 'n_ports' ports
+ * whose 'dpif' port numbers are given in 'n_ports'. Returns 0 if
+ * successful, otherwise a positive errno value.
+ *
+ * Behavior is undefined if the values in ports[] are not unique. */
+int
+dpif_port_group_set(struct dpif *dpif, uint16_t group,
+ const uint16_t ports[], size_t n_ports)
{
- struct odp_port_group pg;
int error;
- assert(n_ports <= UINT16_MAX);
- pg.group = group;
- pg.ports = ports;
- pg.n_ports = n_ports;
- error = do_ioctl(dpif, ODP_PORT_GROUP_GET, "ODP_PORT_GROUP_GET", &pg);
- *n_out = error ? 0 : pg.n_ports;
+ COVERAGE_INC(dpif_port_group_set);
+
+ error = dpif->class->port_group_set(dpif, group, ports, n_ports);
+ log_operation(dpif, "port_group_set", error);
return error;
}
+/* Deletes all flows from 'dpif'. Returns 0 if successful, otherwise a
+ * positive errno value. */
int
dpif_flow_flush(struct dpif *dpif)
{
+ int error;
+
COVERAGE_INC(dpif_flow_flush);
- return do_ioctl(dpif, ODP_FLOW_FLUSH, "ODP_FLOW_FLUSH", NULL);
-}
-static enum vlog_level
-flow_message_log_level(int error)
-{
- return error ? VLL_WARN : VLL_DBG;
+ error = dpif->class->flow_flush(dpif);
+ log_operation(dpif, "flow_flush", error);
+ return error;
}
-static bool
-should_log_flow_message(int error)
+/* Queries 'dpif' for a flow entry matching 'flow->key'.
+ *
+ * If a flow matching 'flow->key' exists in 'dpif', stores statistics for the
+ * flow into 'flow->stats'. If 'flow->n_actions' is zero, then 'flow->actions'
+ * is ignored. If 'flow->n_actions' is nonzero, then 'flow->actions' should
+ * point to an array of the specified number of actions. At most that many of
+ * the flow's actions will be copied into that array. 'flow->n_actions' will
+ * be updated to the number of actions actually present in the flow, which may
+ * be greater than the number stored if the flow has more actions than space
+ * available in the array.
+ *
+ * If no flow matching 'flow->key' exists in 'dpif', returns ENOENT. On other
+ * failure, returns a positive errno value. */
+int
+dpif_flow_get(const struct dpif *dpif, struct odp_flow *flow)
{
- return !vlog_should_drop(THIS_MODULE, flow_message_log_level(error),
- error ? &error_rl : &dpmsg_rl);
-}
+ int error;
-static void
-log_flow_message(const struct dpif *dpif, int error,
- const char *operation,
- const flow_t *flow, const struct odp_flow_stats *stats,
- const union odp_action *actions, size_t n_actions)
-{
- struct ds ds = DS_EMPTY_INITIALIZER;
- ds_put_format(&ds, "dp%u: ", dpif->minor);
- if (error) {
- ds_put_cstr(&ds, "failed to ");
- }
- ds_put_format(&ds, "%s ", operation);
- if (error) {
- ds_put_format(&ds, "(%s) ", strerror(error));
- }
- flow_format(&ds, flow);
- if (stats) {
- ds_put_cstr(&ds, ", ");
- format_odp_flow_stats(&ds, stats);
+ COVERAGE_INC(dpif_flow_get);
+
+ check_rw_odp_flow(flow);
+ error = dpif->class->flow_get(dpif, flow, 1);
+ if (!error) {
+ error = flow->stats.error;
}
- if (actions || n_actions) {
- ds_put_cstr(&ds, ", actions:");
- format_odp_actions(&ds, actions, n_actions);
+ if (should_log_flow_message(error)) {
+ log_flow_operation(dpif, "flow_get", error, flow);
}
- vlog(THIS_MODULE, flow_message_log_level(error), "%s", ds_cstr(&ds));
- ds_destroy(&ds);
+ return error;
}
-static int
-do_flow_ioctl(const struct dpif *dpif, int cmd, struct odp_flow *flow,
- const char *operation, bool show_stats)
+/* For each flow 'flow' in the 'n' flows in 'flows':
+ *
+ * - If a flow matching 'flow->key' exists in 'dpif':
+ *
+ * Stores 0 into 'flow->stats.error' and stores statistics for the flow
+ * into 'flow->stats'.
+ *
+ * If 'flow->n_actions' is zero, then 'flow->actions' is ignored. If
+ * 'flow->n_actions' is nonzero, then 'flow->actions' should point to an
+ * array of the specified number of actions. At most that many of the
+ * flow's actions will be copied into that array. 'flow->n_actions' will
+ * be updated to the number of actions actually present in the flow, which
+ * may be greater than the number stored if the flow has more actions than
+ * space available in the array.
+ *
+ * - Flow-specific errors are indicated by a positive errno value in
+ * 'flow->stats.error'. In particular, ENOENT indicates that no flow
+ * matching 'flow->key' exists in 'dpif'. When an error value is stored, the
+ * contents of 'flow->key' are preserved but other members of 'flow' should
+ * be treated as indeterminate.
+ *
+ * Returns 0 if all 'n' flows in 'flows' were updated (whether they were
+ * individually successful or not is indicated by 'flow->stats.error',
+ * however). Returns a positive errno value if an error that prevented this
+ * update occurred, in which the caller must not depend on any elements in
+ * 'flows' being updated or not updated.
+ */
+int
+dpif_flow_get_multiple(const struct dpif *dpif,
+ struct odp_flow flows[], size_t n)
{
- int error = do_ioctl(dpif, cmd, NULL, flow);
- if (error && show_stats) {
- flow->n_actions = 0;
- }
- if (should_log_flow_message(error)) {
- log_flow_message(dpif, error, operation, &flow->key,
- show_stats && !error ? &flow->stats : NULL,
- flow->actions, flow->n_actions);
+ int error;
+ size_t i;
+
+ COVERAGE_ADD(dpif_flow_get, n);
+
+ for (i = 0; i < n; i++) {
+ check_rw_odp_flow(&flows[i]);
}
+
+ error = dpif->class->flow_get(dpif, flows, n);
+ log_operation(dpif, "flow_get_multiple", error);
return error;
}
+/* Adds or modifies a flow in 'dpif' as specified in 'put':
+ *
+ * - If the flow specified in 'put->flow' does not exist in 'dpif', then
+ * behavior depends on whether ODPPF_CREATE is specified in 'put->flags': if
+ * it is, the flow will be added, otherwise the operation will fail with
+ * ENOENT.
+ *
+ * - Otherwise, the flow specified in 'put->flow' does exist in 'dpif'.
+ * Behavior in this case depends on whether ODPPF_MODIFY is specified in
+ * 'put->flags': if it is, the flow's actions will be updated, otherwise the
+ * operation will fail with EEXIST. If the flow's actions are updated, then
+ * its statistics will be zeroed if ODPPF_ZERO_STATS is set in 'put->flags',
+ * left as-is otherwise.
+ *
+ * Returns 0 if successful, otherwise a positive errno value.
+ */
int
dpif_flow_put(struct dpif *dpif, struct odp_flow_put *put)
{
- int error = do_ioctl(dpif, ODP_FLOW_PUT, NULL, put);
+ int error;
+
COVERAGE_INC(dpif_flow_put);
+
+ error = dpif->class->flow_put(dpif, put);
if (should_log_flow_message(error)) {
- struct ds operation = DS_EMPTY_INITIALIZER;
- ds_put_cstr(&operation, "put");
- if (put->flags & ODPPF_CREATE) {
- ds_put_cstr(&operation, "[create]");
- }
- if (put->flags & ODPPF_MODIFY) {
- ds_put_cstr(&operation, "[modify]");
- }
- if (put->flags & ODPPF_ZERO_STATS) {
- ds_put_cstr(&operation, "[zero]");
- }
-#define ODPPF_ALL (ODPPF_CREATE | ODPPF_MODIFY | ODPPF_ZERO_STATS)
- if (put->flags & ~ODPPF_ALL) {
- ds_put_format(&operation, "[%x]", put->flags & ~ODPPF_ALL);
- }
- log_flow_message(dpif, error, ds_cstr(&operation), &put->flow.key,
- !error ? &put->flow.stats : NULL,
- put->flow.actions, put->flow.n_actions);
- ds_destroy(&operation);
+ log_flow_put(dpif, error, put);
}
return error;
}
+/* Deletes a flow matching 'flow->key' from 'dpif' or returns ENOENT if 'dpif'
+ * does not contain such a flow.
+ *
+ * If successful, updates 'flow->stats', 'flow->n_actions', and 'flow->actions'
+ * as described for dpif_flow_get(). */
int
dpif_flow_del(struct dpif *dpif, struct odp_flow *flow)
{
+ int error;
+
COVERAGE_INC(dpif_flow_del);
- check_rw_odp_flow(flow);
- memset(&flow->stats, 0, sizeof flow->stats);
- return do_flow_ioctl(dpif, ODP_FLOW_DEL, flow, "delete flow", true);
-}
-int
-dpif_flow_get(const struct dpif *dpif, struct odp_flow *flow)
-{
- COVERAGE_INC(dpif_flow_query);
check_rw_odp_flow(flow);
memset(&flow->stats, 0, sizeof flow->stats);
- return do_flow_ioctl(dpif, ODP_FLOW_GET, flow, "get flow", true);
-}
-
-int
-dpif_flow_get_multiple(const struct dpif *dpif,
- struct odp_flow flows[], size_t n)
-{
- struct odp_flowvec fv;
- size_t i;
- COVERAGE_ADD(dpif_flow_query_multiple, n);
- fv.flows = flows;
- fv.n_flows = n;
- for (i = 0; i < n; i++) {
- check_rw_odp_flow(&flows[i]);
+ error = dpif->class->flow_del(dpif, flow);
+ if (should_log_flow_message(error)) {
+ log_flow_operation(dpif, "delete flow", error, flow);
}
- return do_ioctl(dpif, ODP_FLOW_GET_MULTIPLE, "ODP_FLOW_GET_MULTIPLE",
- &fv);
+ return error;
}
+/* Stores up to 'n' flows in 'dpif' into 'flows', including their statistics
+ * but not including any information about their actions. If successful,
+ * returns 0 and sets '*n_out' to the number of flows actually present in
+ * 'dpif', which might be greater than the number stored (if 'dpif' has more
+ * than 'n' flows). On failure, returns a negative errno value and sets
+ * '*n_out' to 0. */
int
dpif_flow_list(const struct dpif *dpif, struct odp_flow flows[], size_t n,
size_t *n_out)
{
- struct odp_flowvec fv;
uint32_t i;
- int error;
+ int retval;
COVERAGE_INC(dpif_flow_query_list);
- fv.flows = flows;
- fv.n_flows = n;
if (RUNNING_ON_VALGRIND) {
memset(flows, 0, n * sizeof *flows);
} else {
flows[i].n_actions = 0;
}
}
- error = do_ioctl(dpif, ODP_FLOW_LIST, NULL, &fv);
- if (error) {
+ retval = dpif->class->flow_list(dpif, flows, n);
+ if (retval < 0) {
*n_out = 0;
- VLOG_WARN_RL(&error_rl, "dp%u: flow list failed (%s)",
- dpif->minor, strerror(error));
+ VLOG_WARN_RL(&error_rl, "%s: flow list failed (%s)",
+ dpif_name(dpif), strerror(-retval));
+ return -retval;
} else {
- COVERAGE_ADD(dpif_flow_query_list_n, fv.n_flows);
- *n_out = fv.n_flows;
- VLOG_DBG_RL(&dpmsg_rl, "dp%u: listed %zu flows", dpif->minor, *n_out);
+ COVERAGE_ADD(dpif_flow_query_list_n, retval);
+ *n_out = MIN(n, retval);
+ VLOG_DBG_RL(&dpmsg_rl, "%s: listed %zu flows (of %d)",
+ dpif_name(dpif), *n_out, retval);
+ return 0;
}
- return error;
}
+/* Retrieves all of the flows in 'dpif'.
+ *
+ * If successful, returns 0 and stores in '*flowsp' a pointer to a newly
+ * allocated array of flows, including their statistics but not including any
+ * information about their actions, and sets '*np' to the number of flows in
+ * '*flowsp'. The caller is responsible for freeing '*flowsp' by calling
+ * free().
+ *
+ * On failure, returns a positive errno value and sets '*flowsp' to NULL and
+ * '*np' to 0. */
int
dpif_flow_list_all(const struct dpif *dpif,
struct odp_flow **flowsp, size_t *np)
}
if (stats.n_flows != n_flows) {
- VLOG_WARN_RL(&error_rl, "dp%u: datapath stats reported %"PRIu32" "
+ VLOG_WARN_RL(&error_rl, "%s: datapath stats reported %"PRIu32" "
"flows but flow listing reported %zu",
- dpif->minor, stats.n_flows, n_flows);
+ dpif_name(dpif), stats.n_flows, n_flows);
}
*flowsp = flows;
*np = n_flows;
return 0;
}
+/* Causes 'dpif' to perform the 'n_actions' actions in 'actions' on the
+ * Ethernet frame specified in 'packet'.
+ *
+ * Pretends that the frame was originally received on the port numbered
+ * 'in_port'. This affects only ODPAT_OUTPUT_GROUP actions, which will not
+ * send a packet out their input port. Specify the number of an unused port
+ * (e.g. UINT16_MAX is currently always unused) to avoid this behavior.
+ *
+ * Returns 0 if successful, otherwise a positive errno value. */
int
dpif_execute(struct dpif *dpif, uint16_t in_port,
const union odp_action actions[], size_t n_actions,
COVERAGE_INC(dpif_execute);
if (n_actions > 0) {
- struct odp_execute execute;
- memset(&execute, 0, sizeof execute);
- execute.in_port = in_port;
- execute.actions = (union odp_action *) actions;
- execute.n_actions = n_actions;
- execute.data = buf->data;
- execute.length = buf->size;
- error = do_ioctl(dpif, ODP_EXECUTE, NULL, &execute);
+ error = dpif->class->execute(dpif, in_port, actions, n_actions, buf);
} else {
error = 0;
}
if (!(error ? VLOG_DROP_WARN(&error_rl) : VLOG_DROP_DBG(&dpmsg_rl))) {
struct ds ds = DS_EMPTY_INITIALIZER;
char *packet = ofp_packet_to_string(buf->data, buf->size, buf->size);
- ds_put_format(&ds, "dp%u: execute ", dpif->minor);
+ ds_put_format(&ds, "%s: execute ", dpif_name(dpif));
format_odp_actions(&ds, actions, n_actions);
if (error) {
ds_put_format(&ds, " failed (%s)", strerror(error));
return error;
}
+/* Retrieves 'dpif''s "listen mask" into '*listen_mask'. Each ODPL_* bit set
+ * in '*listen_mask' indicates that dpif_recv() will receive messages of that
+ * type. Returns 0 if successful, otherwise a positive errno value. */
int
-dpif_recv(struct dpif *dpif, struct ofpbuf **bufp)
+dpif_recv_get_mask(const struct dpif *dpif, int *listen_mask)
{
- struct ofpbuf *buf;
- int retval;
- int error;
-
- buf = ofpbuf_new(65536);
- retval = read(dpif->fd, ofpbuf_tail(buf), ofpbuf_tailroom(buf));
- if (retval < 0) {
- error = errno;
- if (error != EAGAIN) {
- VLOG_WARN_RL(&error_rl, "dp%u: read failed: %s",
- dpif->minor, strerror(error));
- }
- } else if (retval >= sizeof(struct odp_msg)) {
- struct odp_msg *msg = buf->data;
- if (msg->length <= retval) {
- buf->size += retval;
- if (VLOG_IS_DBG_ENABLED()) {
- void *payload = msg + 1;
- size_t length = buf->size - sizeof *msg;
- char *s = ofp_packet_to_string(payload, length, length);
- VLOG_DBG_RL(&dpmsg_rl, "dp%u: received %s message of length "
- "%zu on port %"PRIu16": %s", dpif->minor,
- (msg->type == _ODPL_MISS_NR ? "miss"
- : msg->type == _ODPL_ACTION_NR ? "action"
- : "<unknown>"),
- msg->length - sizeof(struct odp_msg),
- msg->port, s);
- free(s);
- }
- *bufp = buf;
- COVERAGE_INC(dpif_recv);
- return 0;
- } else {
- VLOG_WARN_RL(&error_rl, "dp%u: discarding message truncated "
- "from %"PRIu32" bytes to %d",
- dpif->minor, msg->length, retval);
- error = ERANGE;
- }
- } else if (!retval) {
- VLOG_WARN_RL(&error_rl, "dp%u: unexpected end of file", dpif->minor);
- error = EPROTO;
- } else {
- VLOG_WARN_RL(&error_rl,
- "dp%u: discarding too-short message (%d bytes)",
- dpif->minor, retval);
- error = ERANGE;
+ int error = dpif->class->recv_get_mask(dpif, listen_mask);
+ if (error) {
+ *listen_mask = 0;
}
-
- *bufp = NULL;
- ofpbuf_delete(buf);
+ log_operation(dpif, "recv_get_mask", error);
return error;
}
-void
-dpif_recv_wait(struct dpif *dpif)
-{
- poll_fd_wait(dpif->fd, POLLIN);
-}
-\f
-struct dpifmon {
- struct dpif dpif;
- struct nl_sock *sock;
- int local_ifindex;
-};
-
+/* Sets 'dpif''s "listen mask" to 'listen_mask'. Each ODPL_* bit set in
+ * '*listen_mask' requests that dpif_recv() receive messages of that type.
+ * Returns 0 if successful, otherwise a positive errno value. */
int
-dpifmon_create(const char *datapath_name, struct dpifmon **monp)
+dpif_recv_set_mask(struct dpif *dpif, int listen_mask)
{
- struct dpifmon *mon;
- char local_name[IFNAMSIZ];
- int error;
-
- mon = *monp = xmalloc(sizeof *mon);
-
- error = dpif_open(datapath_name, &mon->dpif);
- if (error) {
- goto error;
- }
- error = dpif_get_name(&mon->dpif, local_name, sizeof local_name);
- if (error) {
- goto error_close_dpif;
- }
-
- mon->local_ifindex = if_nametoindex(local_name);
- if (!mon->local_ifindex) {
- error = errno;
- VLOG_WARN("could not get ifindex of %s device: %s",
- local_name, strerror(errno));
- goto error_close_dpif;
- }
-
- error = nl_sock_create(NETLINK_ROUTE, RTNLGRP_LINK, 0, 0, &mon->sock);
- if (error) {
- VLOG_WARN("could not create rtnetlink socket: %s", strerror(error));
- goto error_close_dpif;
- }
-
- return 0;
-
-error_close_dpif:
- dpif_close(&mon->dpif);
-error:
- free(mon);
- *monp = NULL;
+ int error = dpif->class->recv_set_mask(dpif, listen_mask);
+ log_operation(dpif, "recv_set_mask", error);
return error;
}
-void
-dpifmon_destroy(struct dpifmon *mon)
+/* Attempts to receive a message from 'dpif'. If successful, stores the
+ * message into '*packetp'. The message, if one is received, will begin with
+ * 'struct odp_msg' as a header. Only messages of the types selected with
+ * dpif_set_listen_mask() will ordinarily be received (but if a message type is
+ * enabled and then later disabled, some stragglers might pop up).
+ *
+ * Returns 0 if successful, otherwise a positive errno value. Returns EAGAIN
+ * if no message is immediately available. */
+int
+dpif_recv(struct dpif *dpif, struct ofpbuf **packetp)
{
- if (mon) {
- dpif_close(&mon->dpif);
- nl_sock_destroy(mon->sock);
+ int error = dpif->class->recv(dpif, packetp);
+ if (!error) {
+ if (VLOG_IS_DBG_ENABLED()) {
+ struct ofpbuf *buf = *packetp;
+ struct odp_msg *msg = buf->data;
+ void *payload = msg + 1;
+ size_t payload_len = buf->size - sizeof *msg;
+ char *s = ofp_packet_to_string(payload, payload_len, payload_len);
+ VLOG_DBG_RL(&dpmsg_rl, "%s: received %s message of length "
+ "%zu on port %"PRIu16": %s", dpif_name(dpif),
+ (msg->type == _ODPL_MISS_NR ? "miss"
+ : msg->type == _ODPL_ACTION_NR ? "action"
+ : "<unknown>"),
+ payload_len, msg->port, s);
+ free(s);
+ }
+ } else {
+ *packetp = NULL;
}
+ return error;
}
+/* Discards all messages that would otherwise be received by dpif_recv() on
+ * 'dpif'. Returns 0 if successful, otherwise a positive errno value. */
int
-dpifmon_poll(struct dpifmon *mon, char **devnamep)
+dpif_recv_purge(struct dpif *dpif)
{
- static struct vlog_rate_limit slow_rl = VLOG_RATE_LIMIT_INIT(1, 5);
- static const struct nl_policy rtnlgrp_link_policy[] = {
- [IFLA_IFNAME] = { .type = NL_A_STRING },
- [IFLA_MASTER] = { .type = NL_A_U32, .optional = true },
- };
- struct nlattr *attrs[ARRAY_SIZE(rtnlgrp_link_policy)];
- struct ofpbuf *buf;
+ struct odp_stats stats;
+ unsigned int i;
int error;
- *devnamep = NULL;
-again:
- error = nl_sock_recv(mon->sock, &buf, false);
- switch (error) {
- case 0:
- if (!nl_policy_parse(buf, NLMSG_HDRLEN + sizeof(struct ifinfomsg),
- rtnlgrp_link_policy,
- attrs, ARRAY_SIZE(rtnlgrp_link_policy))) {
- VLOG_WARN_RL(&slow_rl, "received bad rtnl message");
- error = ENOBUFS;
- } else {
- const char *devname = nl_attr_get_string(attrs[IFLA_IFNAME]);
- bool for_us;
-
- if (attrs[IFLA_MASTER]) {
- uint32_t master_ifindex = nl_attr_get_u32(attrs[IFLA_MASTER]);
- for_us = master_ifindex == mon->local_ifindex;
- } else {
- /* It's for us if that device is one of our ports. This is
- * open-coded instead of using dpif_port_query_by_name() to
- * avoid logging a warning on failure. */
- struct odp_port port;
- memset(&port, 0, sizeof port);
- strncpy(port.devname, devname, sizeof port.devname);
- for_us = !ioctl(mon->dpif.fd, ODP_PORT_QUERY, &port);
- }
+ COVERAGE_INC(dpif_purge);
- if (!for_us) {
- /* Not for us, try again. */
- ofpbuf_delete(buf);
- COVERAGE_INC(dpifmon_poll_false_wakeup);
- goto again;
- }
- COVERAGE_INC(dpifmon_poll_changed);
- *devnamep = xstrdup(devname);
+ error = dpif_get_dp_stats(dpif, &stats);
+ if (error) {
+ return error;
+ }
+
+ for (i = 0; i < stats.max_miss_queue + stats.max_action_queue; i++) {
+ struct ofpbuf *buf;
+ error = dpif_recv(dpif, &buf);
+ if (error) {
+ return error == EAGAIN ? 0 : error;
}
ofpbuf_delete(buf);
- break;
-
- case EAGAIN:
- /* Nothing to do. */
- break;
-
- case ENOBUFS:
- VLOG_WARN_RL(&slow_rl, "dpifmon socket overflowed");
- break;
-
- default:
- VLOG_WARN_RL(&slow_rl, "error on dpifmon socket: %s", strerror(error));
- break;
}
- return error;
+ return 0;
}
+/* Arranges for the poll loop to wake up when 'dpif' has a message queued to be
+ * received with dpif_recv(). */
void
-dpifmon_run(struct dpifmon *mon UNUSED)
+dpif_recv_wait(struct dpif *dpif)
{
- /* Nothing to do in this implementation. */
+ dpif->class->recv_wait(dpif);
}
+/* Obtains the NetFlow engine type and engine ID for 'dpif' into '*engine_type'
+ * and '*engine_id', respectively. */
void
-dpifmon_wait(struct dpifmon *mon)
+dpif_get_netflow_ids(const struct dpif *dpif,
+ uint8_t *engine_type, uint8_t *engine_id)
{
- nl_sock_wait(mon->sock, POLLIN);
+ *engine_type = dpif->netflow_engine_type;
+ *engine_id = dpif->netflow_engine_id;
}
\f
-static int get_openvswitch_major(void);
-static int get_major(const char *target, int default_major);
-
-static int
-lookup_minor(const char *name, unsigned int *minor)
+void
+dpif_init(struct dpif *dpif, const struct dpif_class *class, const char *name,
+ uint8_t netflow_engine_type, uint8_t netflow_engine_id)
{
- struct ethtool_drvinfo drvinfo;
- struct ifreq ifr;
- int error;
- int sock;
-
- *minor = -1;
- sock = socket(AF_INET, SOCK_DGRAM, 0);
- if (sock < 0) {
- VLOG_WARN("socket(AF_INET) failed: %s", strerror(errno));
- error = errno;
- goto error;
- }
-
- memset(&ifr, 0, sizeof ifr);
- strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name);
- ifr.ifr_data = (caddr_t) &drvinfo;
-
- memset(&drvinfo, 0, sizeof drvinfo);
- drvinfo.cmd = ETHTOOL_GDRVINFO;
- if (ioctl(sock, SIOCETHTOOL, &ifr)) {
- VLOG_WARN("ioctl(SIOCETHTOOL) failed: %s", strerror(errno));
- error = errno;
- goto error_close_sock;
- }
-
- if (strcmp(drvinfo.driver, "openvswitch")) {
- VLOG_WARN("%s is not an openvswitch device", name);
- error = EOPNOTSUPP;
- goto error_close_sock;
- }
-
- if (!isdigit(drvinfo.bus_info[0])) {
- VLOG_WARN("%s ethtool info does not contain an openvswitch minor",
- name);
- error = EPROTOTYPE;
- goto error_close_sock;
- }
-
- *minor = atoi(drvinfo.bus_info);
- close(sock);
- return 0;
-
-error_close_sock:
- close(sock);
-error:
- return error;
+ dpif->class = class;
+ dpif->name = xstrdup(name);
+ dpif->netflow_engine_type = netflow_engine_type;
+ dpif->netflow_engine_id = netflow_engine_id;
}
-
-static int
-make_openvswitch_device(unsigned int minor, char **fnp)
+\f
+static void
+log_operation(const struct dpif *dpif, const char *operation, int error)
{
- dev_t dev = makedev(get_openvswitch_major(), minor);
- const char dirname[] = "/dev/net";
- struct stat s;
- char fn[128];
-
- *fnp = NULL;
- sprintf(fn, "%s/dp%d", dirname, minor);
- if (!stat(fn, &s)) {
- if (!S_ISCHR(s.st_mode)) {
- VLOG_WARN_RL(&error_rl, "%s is not a character device, fixing",
- fn);
- } else if (s.st_rdev != dev) {
- VLOG_WARN_RL(&error_rl,
- "%s is device %u:%u instead of %u:%u, fixing",
- fn, major(s.st_rdev), minor(s.st_rdev),
- major(dev), minor(dev));
- } else {
- goto success;
- }
- if (unlink(fn)) {
- VLOG_WARN_RL(&error_rl, "%s: unlink failed (%s)",
- fn, strerror(errno));
- return errno;
- }
- } else if (errno == ENOENT) {
- if (stat(dirname, &s)) {
- if (errno == ENOENT) {
- if (mkdir(dirname, 0755)) {
- VLOG_WARN_RL(&error_rl, "%s: mkdir failed (%s)",
- dirname, strerror(errno));
- return errno;
- }
- } else {
- VLOG_WARN_RL(&error_rl, "%s: stat failed (%s)",
- dirname, strerror(errno));
- return errno;
- }
- }
+ if (!error) {
+ VLOG_DBG_RL(&dpmsg_rl, "%s: %s success", dpif_name(dpif), operation);
} else {
- VLOG_WARN_RL(&error_rl, "%s: stat failed (%s)", fn, strerror(errno));
- return errno;
- }
-
- /* The device needs to be created. */
- if (mknod(fn, S_IFCHR | 0700, dev)) {
- VLOG_WARN_RL(&error_rl,
- "%s: creating character device %u:%u failed (%s)",
- fn, major(dev), minor(dev), strerror(errno));
- return errno;
+ VLOG_WARN_RL(&error_rl, "%s: %s failed (%s)",
+ dpif_name(dpif), operation, strerror(error));
}
-
-success:
- *fnp = xstrdup(fn);
- return 0;
}
-
-static int
-get_openvswitch_major(void)
+static enum vlog_level
+flow_message_log_level(int error)
{
- static unsigned int openvswitch_major;
- if (!openvswitch_major) {
- enum { DEFAULT_MAJOR = 248 };
- openvswitch_major = get_major("openvswitch", DEFAULT_MAJOR);
- }
- return openvswitch_major;
+ return error ? VLL_WARN : VLL_DBG;
}
-static int
-get_major(const char *target, int default_major)
+static bool
+should_log_flow_message(int error)
{
- const char fn[] = "/proc/devices";
- char line[128];
- FILE *file;
- int ln;
-
- file = fopen(fn, "r");
- if (!file) {
- VLOG_ERR("opening %s failed (%s)", fn, strerror(errno));
- goto error;
- }
-
- for (ln = 1; fgets(line, sizeof line, file); ln++) {
- char name[64];
- int major;
-
- if (!strncmp(line, "Character", 9) || line[0] == '\0') {
- /* Nothing to do. */
- } else if (!strncmp(line, "Block", 5)) {
- /* We only want character devices, so skip the rest of the file. */
- break;
- } else if (sscanf(line, "%d %63s", &major, name)) {
- if (!strcmp(name, target)) {
- fclose(file);
- return major;
- }
- } else {
- static bool warned;
- if (!warned) {
- VLOG_WARN("%s:%d: syntax error", fn, ln);
- }
- warned = true;
- }
- }
-
- VLOG_ERR("%s: %s major not found (is the module loaded?), using "
- "default major %d", fn, target, default_major);
-error:
- VLOG_INFO("using default major %d for %s", default_major, target);
- return default_major;
+ return !vlog_should_drop(THIS_MODULE, flow_message_log_level(error),
+ error ? &error_rl : &dpmsg_rl);
}
-static int
-name_to_minor(const char *name, unsigned int *minor)
+static void
+log_flow_message(const struct dpif *dpif, int error, const char *operation,
+ const flow_t *flow, const struct odp_flow_stats *stats,
+ const union odp_action *actions, size_t n_actions)
{
- if (!get_minor_from_name(name, minor)) {
- return 0;
+ struct ds ds = DS_EMPTY_INITIALIZER;
+ ds_put_format(&ds, "%s: ", dpif_name(dpif));
+ if (error) {
+ ds_put_cstr(&ds, "failed to ");
+ }
+ ds_put_format(&ds, "%s ", operation);
+ if (error) {
+ ds_put_format(&ds, "(%s) ", strerror(error));
+ }
+ flow_format(&ds, flow);
+ if (stats) {
+ ds_put_cstr(&ds, ", ");
+ format_odp_flow_stats(&ds, stats);
}
- return lookup_minor(name, minor);
+ if (actions || n_actions) {
+ ds_put_cstr(&ds, ", actions:");
+ format_odp_actions(&ds, actions, n_actions);
+ }
+ vlog(THIS_MODULE, flow_message_log_level(error), "%s", ds_cstr(&ds));
+ ds_destroy(&ds);
}
-static int
-get_minor_from_name(const char *name, unsigned int *minor)
+static void
+log_flow_operation(const struct dpif *dpif, const char *operation, int error,
+ struct odp_flow *flow)
{
- if (!strncmp(name, "dp", 2) && isdigit(name[2])) {
- *minor = atoi(name + 2);
- return 0;
- } else if (!strncmp(name, "nl:", 3) && isdigit(name[3])) {
- /* This is for compatibility only and will be dropped. */
- *minor = atoi(name + 3);
- return 0;
- } else {
- return EINVAL;
+ if (error) {
+ flow->n_actions = 0;
}
+ log_flow_message(dpif, error, operation, &flow->key,
+ !error ? &flow->stats : NULL,
+ flow->actions, flow->n_actions);
}
-static int
-open_by_minor(unsigned int minor, struct dpif *dpif)
+static void
+log_flow_put(struct dpif *dpif, int error, const struct odp_flow_put *put)
{
- int error;
- char *fn;
- int fd;
+ enum { ODPPF_ALL = ODPPF_CREATE | ODPPF_MODIFY | ODPPF_ZERO_STATS };
+ struct ds s;
- dpif->minor = -1;
- dpif->fd = -1;
- error = make_openvswitch_device(minor, &fn);
- if (error) {
- return error;
+ ds_init(&s);
+ ds_put_cstr(&s, "put");
+ if (put->flags & ODPPF_CREATE) {
+ ds_put_cstr(&s, "[create]");
}
-
- fd = open(fn, O_RDONLY | O_NONBLOCK);
- if (fd < 0) {
- error = errno;
- VLOG_WARN("%s: open failed (%s)", fn, strerror(error));
- free(fn);
- return error;
+ if (put->flags & ODPPF_MODIFY) {
+ ds_put_cstr(&s, "[modify]");
}
-
- free(fn);
- dpif->minor = minor;
- dpif->fd = fd;
- return 0;
+ if (put->flags & ODPPF_ZERO_STATS) {
+ ds_put_cstr(&s, "[zero]");
+ }
+ if (put->flags & ~ODPPF_ALL) {
+ ds_put_format(&s, "[%x]", put->flags & ~ODPPF_ALL);
+ }
+ log_flow_message(dpif, error, ds_cstr(&s), &put->flow.key,
+ !error ? &put->flow.stats : NULL,
+ put->flow.actions, put->flow.n_actions);
+ ds_destroy(&s);
}
-\f
+
/* There is a tendency to construct odp_flow objects on the stack and to
* forget to properly initialize their "actions" and "n_actions" members.
* When this happens, we get memory corruption because the kernel
#include <stddef.h>
#include <stdint.h>
+struct dpif;
struct ofpbuf;
struct svec;
-/* A datapath interface. Opaque. */
-struct dpif {
- unsigned int minor; /* For use in error messages. */
- int fd;
-};
-
+void dp_run(void);
+void dp_wait(void);
int dp_enumerate(struct svec *);
-int dpif_open(const char *name, struct dpif *);
-int dpif_create(const char *name, struct dpif *);
+int dpif_open(const char *name, struct dpif **);
+int dpif_create(const char *name, struct dpif **);
void dpif_close(struct dpif *);
-static inline unsigned int dpif_id(const struct dpif *dpif);
-int dpif_get_name(struct dpif *, char *name, size_t name_size);
+const char *dpif_name(const struct dpif *);
+int dpif_get_all_names(const struct dpif *, struct svec *);
int dpif_delete(struct dpif *);
int dpif_get_drop_frags(const struct dpif *, bool *drop_frags);
int dpif_set_drop_frags(struct dpif *, bool drop_frags);
-int dpif_get_listen_mask(const struct dpif *, int *listen_mask);
-int dpif_set_listen_mask(struct dpif *, int listen_mask);
-int dpif_purge(struct dpif *);
-
-int dpif_port_add(struct dpif *, const char *devname, uint16_t port_no,
- uint16_t flags);
+int dpif_port_add(struct dpif *, const char *devname, uint16_t flags,
+ uint16_t *port_no);
int dpif_port_del(struct dpif *, uint16_t port_no);
int dpif_port_query_by_number(const struct dpif *, uint16_t port_no,
struct odp_port *);
int dpif_port_query_by_name(const struct dpif *, const char *devname,
struct odp_port *);
-int dpif_port_get_name(struct dpif *dpif, uint16_t port_no,
+int dpif_port_get_name(struct dpif *, uint16_t port_no,
char *name, size_t name_size);
int dpif_port_list(const struct dpif *, struct odp_port **, size_t *n_ports);
+int dpif_port_poll(const struct dpif *, char **devnamep);
+void dpif_port_poll_wait(const struct dpif *);
+
+int dpif_port_group_get(const struct dpif *, uint16_t group,
+ uint16_t **ports, size_t *n_ports);
int dpif_port_group_set(struct dpif *, uint16_t group,
const uint16_t ports[], size_t n_ports);
-int dpif_port_group_get(const struct dpif *, uint16_t group,
- uint16_t ports[], size_t n_ports, size_t *n_out);
int dpif_flow_flush(struct dpif *);
int dpif_flow_put(struct dpif *, struct odp_flow_put *);
const union odp_action[], size_t n_actions,
const struct ofpbuf *);
+int dpif_recv_get_mask(const struct dpif *, int *listen_mask);
+int dpif_recv_set_mask(struct dpif *, int listen_mask);
int dpif_recv(struct dpif *, struct ofpbuf **);
+int dpif_recv_purge(struct dpif *);
void dpif_recv_wait(struct dpif *);
-static inline unsigned int
-dpif_id(const struct dpif *dpif)
-{
- return dpif->minor;
-}
-\f
-struct dpifmon;
-
-int dpifmon_create(const char *datapath_name, struct dpifmon **);
-void dpifmon_destroy(struct dpifmon *);
-
-int dpifmon_poll(struct dpifmon *, char **devnamep);
-
-void dpifmon_run(struct dpifmon *);
-void dpifmon_wait(struct dpifmon *);
+void dpif_get_netflow_ids(const struct dpif *,
+ uint8_t *engine_type, uint8_t *engine_id);
#endif /* dpif.h */
The name of the network device associated with the datapath's local
port. (\fB\*(PN\fR internally converts this into a datapath number,
as above.)
-
-.TP
-\fBnl:\fIN\fR
-This is an obsolete synonym for \fBdp\fIN\fR.
-.RE
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
+#include "shash.h"
#include "util.h"
+#define THIS_MODULE VLM_fatal_signal
+#include "vlog.h"
+
/* Signals to catch. */
static const int fatal_signals[] = { SIGTERM, SIGINT, SIGHUP, SIGALRM };
/* Registers 'hook' to be called when a process termination signal is raised.
* If 'run_at_exit' is true, 'hook' is also called during normal process
- * termination, e.g. when exit() is called or when main() returns. */
+ * termination, e.g. when exit() is called or when main() returns.
+ *
+ * 'func' will be invoked from an asynchronous signal handler, so it must be
+ * written appropriately. For example, it must not call most C library
+ * functions, including malloc() or free(). */
void
fatal_signal_add_hook(void (*func)(void *aux), void *aux, bool run_at_exit)
{
}
}
\f
-static char **files;
-static size_t n_files, max_files;
+static struct shash files = SHASH_INITIALIZER(&files);
static void unlink_files(void *aux);
static void do_unlink_files(void);
}
fatal_signal_block();
- if (n_files >= max_files) {
- files = x2nrealloc(files, &max_files, sizeof *files);
+ if (!shash_find(&files, file)) {
+ shash_add(&files, file, NULL);
}
- files[n_files++] = xstrdup(file);
fatal_signal_unblock();
}
void
fatal_signal_remove_file_to_unlink(const char *file)
{
- size_t i;
+ struct shash_node *node;
fatal_signal_block();
- for (i = 0; i < n_files; i++) {
- if (!strcmp(files[i], file)) {
- free(files[i]);
- files[i] = files[--n_files];
- break;
- }
+ node = shash_find(&files, file);
+ if (node) {
+ shash_delete(&files, node);
}
fatal_signal_unblock();
}
+/* Like fatal_signal_remove_file_to_unlink(), but also unlinks 'file'.
+ * Returns 0 if successful, otherwise a positive errno value. */
+int
+fatal_signal_unlink_file_now(const char *file)
+{
+ int error = unlink(file) ? errno : 0;
+ if (error) {
+ VLOG_WARN("could not unlink \"%s\" (%s)", file, strerror(error));
+ }
+
+ fatal_signal_remove_file_to_unlink(file);
+
+ return error;
+}
+
static void
unlink_files(void *aux UNUSED)
{
do_unlink_files();
}
+/* This is a fatal_signal_add_hook() callback (via unlink_files()). It will be
+ * invoked from an asynchronous signal handler, so it cannot call most C
+ * library functions (unlink() is an explicit exception, see
+ * http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html).
+ * That includes free(), so it doesn't try to free the 'files' data
+ * structure. */
static void
do_unlink_files(void)
{
- size_t i;
+ struct shash_node *node;
- for (i = 0; i < n_files; i++) {
- unlink(files[i]);
+ SHASH_FOR_EACH (node, &files) {
+ unlink(node->name);
}
}
\f
/*
- * Copyright (c) 2008 Nicira Networks.
+ * Copyright (c) 2008, 2009 Nicira Networks.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* exit(). */
void fatal_signal_add_file_to_unlink(const char *);
void fatal_signal_remove_file_to_unlink(const char *);
+int fatal_signal_unlink_file_now(const char *);
/* Interface for other code that catches one of our signals and needs to pass
* it through. */
return hmap->n;
}
+/* Returns the maximum number of nodes that 'hmap' may hold before it should be
+ * rehashed. */
+static inline size_t
+hmap_capacity(const struct hmap *hmap)
+{
+ return hmap->mask * 2 + 1;
+}
+
/* Returns true if 'hmap' currently contains no nodes,
* false otherwise. */
static inline bool
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <arpa/inet.h>
+#include <inttypes.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/rtnetlink.h>
+#include <linux/sockios.h>
+#include <linux/version.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <netpacket/packet.h>
+#include <net/ethernet.h>
+#include <net/if.h>
+#include <net/if_arp.h>
+#include <net/if_packet.h>
+#include <net/route.h>
+#include <netinet/in.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "coverage.h"
+#include "dynamic-string.h"
+#include "fatal-signal.h"
+#include "netdev-provider.h"
+#include "netlink.h"
+#include "ofpbuf.h"
+#include "openflow/openflow.h"
+#include "packets.h"
+#include "poll-loop.h"
+#include "rtnetlink.h"
+#include "socket-util.h"
+#include "shash.h"
+#include "svec.h"
+
+#define THIS_MODULE VLM_netdev_linux
+#include "vlog.h"
+\f
+/* These were introduced in Linux 2.6.14, so they might be missing if we have
+ * old headers. */
+#ifndef ADVERTISED_Pause
+#define ADVERTISED_Pause (1 << 13)
+#endif
+#ifndef ADVERTISED_Asym_Pause
+#define ADVERTISED_Asym_Pause (1 << 14)
+#endif
+
+struct netdev_linux {
+ struct netdev netdev;
+
+ /* File descriptors. For ordinary network devices, the two fds below are
+ * the same; for tap devices, they differ. */
+ int netdev_fd; /* Network device. */
+ int tap_fd; /* TAP character device, if any, otherwise the
+ * network device. */
+
+ struct netdev_linux_cache *cache;
+};
+
+enum {
+ VALID_IFINDEX = 1 << 0,
+ VALID_ETHERADDR = 1 << 1,
+ VALID_IN4 = 1 << 2,
+ VALID_IN6 = 1 << 3,
+ VALID_MTU = 1 << 4,
+ VALID_CARRIER = 1 << 5,
+ VALID_IS_INTERNAL = 1 << 6
+};
+
+/* Cached network device information. */
+struct netdev_linux_cache {
+ struct shash_node *shash_node;
+ unsigned int valid;
+ int ref_cnt;
+
+ int ifindex;
+ uint8_t etheraddr[ETH_ADDR_LEN];
+ struct in_addr address, netmask;
+ struct in6_addr in6;
+ int mtu;
+ int carrier;
+ bool is_internal;
+};
+
+static struct shash cache_map = SHASH_INITIALIZER(&cache_map);
+static struct rtnetlink_notifier netdev_linux_cache_notifier;
+
+/* An AF_INET socket (used for ioctl operations). */
+static int af_inet_sock = -1;
+
+struct netdev_linux_notifier {
+ struct netdev_notifier notifier;
+ struct list node;
+};
+
+static struct shash netdev_linux_notifiers =
+ SHASH_INITIALIZER(&netdev_linux_notifiers);
+static struct rtnetlink_notifier netdev_linux_poll_notifier;
+
+/* This is set pretty low because we probably won't learn anything from the
+ * additional log messages. */
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+
+static int netdev_linux_do_ethtool(struct netdev *, struct ethtool_cmd *,
+ int cmd, const char *cmd_name);
+static int netdev_linux_do_ioctl(const struct netdev *, struct ifreq *,
+ int cmd, const char *cmd_name);
+static int netdev_linux_get_ipv4(const struct netdev *, struct in_addr *,
+ int cmd, const char *cmd_name);
+static int get_flags(const struct netdev *, int *flagsp);
+static int set_flags(struct netdev *, int flags);
+static int do_get_ifindex(const char *netdev_name);
+static int get_ifindex(const struct netdev *, int *ifindexp);
+static int do_set_addr(struct netdev *netdev,
+ int ioctl_nr, const char *ioctl_name,
+ struct in_addr addr);
+static int get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN]);
+static int set_etheraddr(const char *netdev_name, int hwaddr_family,
+ const uint8_t[ETH_ADDR_LEN]);
+static int get_stats_via_netlink(int ifindex, struct netdev_stats *stats);
+static int get_stats_via_proc(const char *netdev_name, struct netdev_stats *stats);
+
+static struct netdev_linux *
+netdev_linux_cast(const struct netdev *netdev)
+{
+ netdev_assert_class(netdev, &netdev_linux_class);
+ return CONTAINER_OF(netdev, struct netdev_linux, netdev);
+}
+
+static int
+netdev_linux_init(void)
+{
+ static int status = -1;
+ if (status < 0) {
+ af_inet_sock = socket(AF_INET, SOCK_DGRAM, 0);
+ status = af_inet_sock >= 0 ? 0 : errno;
+ if (status) {
+ VLOG_ERR("failed to create inet socket: %s", strerror(status));
+ }
+ }
+ return status;
+}
+
+static void
+netdev_linux_run(void)
+{
+ rtnetlink_notifier_run();
+}
+
+static void
+netdev_linux_wait(void)
+{
+ rtnetlink_notifier_wait();
+}
+
+static void
+netdev_linux_cache_cb(const struct rtnetlink_change *change,
+ void *aux UNUSED)
+{
+ struct netdev_linux_cache *cache;
+ if (change) {
+ cache = shash_find_data(&cache_map, change->ifname);
+ if (cache) {
+ cache->valid = 0;
+ }
+ } else {
+ struct shash_node *node;
+ SHASH_FOR_EACH (node, &cache_map) {
+ cache = node->data;
+ cache->valid = 0;
+ }
+ }
+}
+
+static int
+netdev_linux_open(const char *name, char *suffix, int ethertype,
+ struct netdev **netdevp)
+{
+ struct netdev_linux *netdev;
+ enum netdev_flags flags;
+ int error;
+
+ /* Allocate network device. */
+ netdev = xcalloc(1, sizeof *netdev);
+ netdev_init(&netdev->netdev, suffix, &netdev_linux_class);
+ netdev->netdev_fd = -1;
+ netdev->tap_fd = -1;
+ netdev->cache = shash_find_data(&cache_map, suffix);
+ if (!netdev->cache) {
+ if (shash_is_empty(&cache_map)) {
+ int error = rtnetlink_notifier_register(
+ &netdev_linux_cache_notifier, netdev_linux_cache_cb, NULL);
+ if (error) {
+ netdev_close(&netdev->netdev);
+ return error;
+ }
+ }
+ netdev->cache = xmalloc(sizeof *netdev->cache);
+ netdev->cache->shash_node = shash_add(&cache_map, suffix,
+ netdev->cache);
+ netdev->cache->valid = 0;
+ netdev->cache->ref_cnt = 0;
+ }
+ netdev->cache->ref_cnt++;
+
+ if (!strncmp(name, "tap:", 4)) {
+ static const char tap_dev[] = "/dev/net/tun";
+ struct ifreq ifr;
+
+ /* Open tap device. */
+ netdev->tap_fd = open(tap_dev, O_RDWR);
+ if (netdev->tap_fd < 0) {
+ error = errno;
+ VLOG_WARN("opening \"%s\" failed: %s", tap_dev, strerror(error));
+ goto error;
+ }
+
+ /* Create tap device. */
+ ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
+ strncpy(ifr.ifr_name, suffix, sizeof ifr.ifr_name);
+ if (ioctl(netdev->tap_fd, TUNSETIFF, &ifr) == -1) {
+ VLOG_WARN("%s: creating tap device failed: %s", suffix,
+ strerror(errno));
+ error = errno;
+ goto error;
+ }
+
+ /* Make non-blocking. */
+ error = set_nonblocking(netdev->tap_fd);
+ if (error) {
+ goto error;
+ }
+ }
+
+ error = netdev_get_flags(&netdev->netdev, &flags);
+ if (error == ENODEV) {
+ goto error;
+ }
+
+ if (netdev->tap_fd >= 0 || ethertype != NETDEV_ETH_TYPE_NONE) {
+ struct sockaddr_ll sll;
+ int protocol;
+ int ifindex;
+
+ /* Create file descriptor. */
+ protocol = (ethertype == NETDEV_ETH_TYPE_ANY ? ETH_P_ALL
+ : ethertype == NETDEV_ETH_TYPE_802_2 ? ETH_P_802_2
+ : ethertype);
+ netdev->netdev_fd = socket(PF_PACKET, SOCK_RAW, htons(protocol));
+ if (netdev->netdev_fd < 0) {
+ error = errno;
+ goto error;
+ }
+ if (netdev->tap_fd < 0) {
+ netdev->tap_fd = netdev->netdev_fd;
+ }
+
+ /* Set non-blocking mode. */
+ error = set_nonblocking(netdev->netdev_fd);
+ if (error) {
+ goto error;
+ }
+
+ /* Get ethernet device index. */
+ error = get_ifindex(&netdev->netdev, &ifindex);
+ if (error) {
+ goto error;
+ }
+
+ /* Bind to specific ethernet device. */
+ memset(&sll, 0, sizeof sll);
+ sll.sll_family = AF_PACKET;
+ sll.sll_ifindex = ifindex;
+ if (bind(netdev->netdev_fd,
+ (struct sockaddr *) &sll, sizeof sll) < 0) {
+ error = errno;
+ VLOG_ERR("bind to %s failed: %s", suffix, strerror(error));
+ goto error;
+ }
+
+ /* Between the socket() and bind() calls above, the socket receives all
+ * packets of the requested type on all system interfaces. We do not
+ * want to receive that data, but there is no way to avoid it. So we
+ * must now drain out the receive queue. */
+ error = drain_rcvbuf(netdev->netdev_fd);
+ if (error) {
+ goto error;
+ }
+ }
+
+ *netdevp = &netdev->netdev;
+ return 0;
+
+error:
+ netdev_close(&netdev->netdev);
+ return error;
+}
+
+/* Closes and destroys 'netdev'. */
+static void
+netdev_linux_close(struct netdev *netdev_)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+
+ if (netdev->cache && !--netdev->cache->ref_cnt) {
+ shash_delete(&cache_map, netdev->cache->shash_node);
+ free(netdev->cache);
+
+ if (shash_is_empty(&cache_map)) {
+ rtnetlink_notifier_unregister(&netdev_linux_cache_notifier);
+ }
+ }
+ if (netdev->netdev_fd >= 0) {
+ close(netdev->netdev_fd);
+ }
+ if (netdev->tap_fd >= 0 && netdev->netdev_fd != netdev->tap_fd) {
+ close(netdev->tap_fd);
+ }
+ free(netdev);
+}
+
+/* Initializes 'svec' with a list of the names of all known network devices. */
+static int
+netdev_linux_enumerate(struct svec *svec)
+{
+ struct if_nameindex *names;
+
+ names = if_nameindex();
+ if (names) {
+ size_t i;
+
+ for (i = 0; names[i].if_name != NULL; i++) {
+ svec_add(svec, names[i].if_name);
+ }
+ if_freenameindex(names);
+ return 0;
+ } else {
+ VLOG_WARN("could not obtain list of network device names: %s",
+ strerror(errno));
+ return errno;
+ }
+}
+
+static int
+netdev_linux_recv(struct netdev *netdev_, void *data, size_t size)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+
+ if (netdev->tap_fd < 0) {
+ /* Device was opened with NETDEV_ETH_TYPE_NONE. */
+ return -EAGAIN;
+ }
+
+ for (;;) {
+ ssize_t retval = read(netdev->tap_fd, data, size);
+ if (retval >= 0) {
+ return retval;
+ } else if (errno != EINTR) {
+ if (errno != EAGAIN) {
+ VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s",
+ strerror(errno), netdev_get_name(netdev_));
+ }
+ return -errno;
+ }
+ }
+}
+
+/* Registers with the poll loop to wake up from the next call to poll_block()
+ * when a packet is ready to be received with netdev_recv() on 'netdev'. */
+static void
+netdev_linux_recv_wait(struct netdev *netdev_)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ if (netdev->tap_fd >= 0) {
+ poll_fd_wait(netdev->tap_fd, POLLIN);
+ }
+}
+
+/* Discards all packets waiting to be received from 'netdev'. */
+static int
+netdev_linux_drain(struct netdev *netdev_)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ if (netdev->tap_fd < 0 && netdev->netdev_fd < 0) {
+ return 0;
+ } else if (netdev->tap_fd != netdev->netdev_fd) {
+ struct ifreq ifr;
+ int error = netdev_linux_do_ioctl(netdev_, &ifr,
+ SIOCGIFTXQLEN, "SIOCGIFTXQLEN");
+ if (error) {
+ return error;
+ }
+ drain_fd(netdev->tap_fd, ifr.ifr_qlen);
+ return 0;
+ } else {
+ return drain_rcvbuf(netdev->netdev_fd);
+ }
+}
+
+/* Sends 'buffer' on 'netdev'. Returns 0 if successful, otherwise a positive
+ * errno value. Returns EAGAIN without blocking if the packet cannot be queued
+ * immediately. Returns EMSGSIZE if a partial packet was transmitted or if
+ * the packet is too big or too small to transmit on the device.
+ *
+ * The caller retains ownership of 'buffer' in all cases.
+ *
+ * The kernel maintains a packet transmission queue, so the caller is not
+ * expected to do additional queuing of packets. */
+static int
+netdev_linux_send(struct netdev *netdev_, const void *data, size_t size)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+
+ /* XXX should support sending even if 'ethertype' was NETDEV_ETH_TYPE_NONE.
+ */
+ if (netdev->tap_fd < 0) {
+ return EPIPE;
+ }
+
+ for (;;) {
+ ssize_t retval = write(netdev->tap_fd, data, size);
+ if (retval < 0) {
+ /* The Linux AF_PACKET implementation never blocks waiting for room
+ * for packets, instead returning ENOBUFS. Translate this into
+ * EAGAIN for the caller. */
+ if (errno == ENOBUFS) {
+ return EAGAIN;
+ } else if (errno == EINTR) {
+ continue;
+ } else if (errno != EAGAIN) {
+ VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: %s",
+ netdev_get_name(netdev_), strerror(errno));
+ }
+ return errno;
+ } else if (retval != size) {
+ VLOG_WARN_RL(&rl, "sent partial Ethernet packet (%zd bytes of "
+ "%zu) on %s", retval, size, netdev_get_name(netdev_));
+ return EMSGSIZE;
+ } else {
+ return 0;
+ }
+ }
+}
+
+/* Registers with the poll loop to wake up from the next call to poll_block()
+ * when the packet transmission queue has sufficient room to transmit a packet
+ * with netdev_send().
+ *
+ * The kernel maintains a packet transmission queue, so the client is not
+ * expected to do additional queuing of packets. Thus, this function is
+ * unlikely to ever be used. It is included for completeness. */
+static void
+netdev_linux_send_wait(struct netdev *netdev_)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ if (netdev->tap_fd < 0 && netdev->netdev_fd < 0) {
+ /* Nothing to do. */
+ } else if (netdev->tap_fd == netdev->netdev_fd) {
+ poll_fd_wait(netdev->tap_fd, POLLOUT);
+ } else {
+ /* TAP device always accepts packets.*/
+ poll_immediate_wake();
+ }
+}
+
+/* Attempts to set 'netdev''s MAC address to 'mac'. Returns 0 if successful,
+ * otherwise a positive errno value. */
+static int
+netdev_linux_set_etheraddr(struct netdev *netdev_,
+ const uint8_t mac[ETH_ADDR_LEN])
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ int error;
+
+ if (!(netdev->cache->valid & VALID_ETHERADDR)
+ || !eth_addr_equals(netdev->cache->etheraddr, mac)) {
+ error = set_etheraddr(netdev_get_name(netdev_), ARPHRD_ETHER, mac);
+ if (!error) {
+ netdev->cache->valid |= VALID_ETHERADDR;
+ memcpy(netdev->cache->etheraddr, mac, ETH_ADDR_LEN);
+ }
+ } else {
+ error = 0;
+ }
+ return error;
+}
+
+/* Returns a pointer to 'netdev''s MAC address. The caller must not modify or
+ * free the returned buffer. */
+static int
+netdev_linux_get_etheraddr(const struct netdev *netdev_,
+ uint8_t mac[ETH_ADDR_LEN])
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ if (!(netdev->cache->valid & VALID_ETHERADDR)) {
+ int error = get_etheraddr(netdev_get_name(netdev_),
+ netdev->cache->etheraddr);
+ if (error) {
+ return error;
+ }
+ netdev->cache->valid |= VALID_ETHERADDR;
+ }
+ memcpy(mac, netdev->cache->etheraddr, ETH_ADDR_LEN);
+ return 0;
+}
+
+/* Returns the maximum size of transmitted (and received) packets on 'netdev',
+ * in bytes, not including the hardware header; thus, this is typically 1500
+ * bytes for Ethernet devices. */
+static int
+netdev_linux_get_mtu(const struct netdev *netdev_, int *mtup)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ if (!(netdev->cache->valid & VALID_MTU)) {
+ struct ifreq ifr;
+ int error;
+
+ error = netdev_linux_do_ioctl(netdev_, &ifr, SIOCGIFMTU, "SIOCGIFMTU");
+ if (error) {
+ return error;
+ }
+ netdev->cache->mtu = ifr.ifr_mtu;
+ netdev->cache->valid |= VALID_MTU;
+ }
+ *mtup = netdev->cache->mtu;
+ return 0;
+}
+
+static int
+netdev_linux_get_carrier(const struct netdev *netdev_, bool *carrier)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ int error = 0;
+ char *fn = NULL;
+ int fd = -1;
+
+ if (!(netdev->cache->valid & VALID_CARRIER)) {
+ char line[8];
+ int retval;
+
+ fn = xasprintf("/sys/class/net/%s/carrier", netdev_get_name(netdev_));
+ fd = open(fn, O_RDONLY);
+ if (fd < 0) {
+ error = errno;
+ VLOG_WARN_RL(&rl, "%s: open failed: %s", fn, strerror(error));
+ goto exit;
+ }
+
+ retval = read(fd, line, sizeof line);
+ if (retval < 0) {
+ error = errno;
+ if (error == EINVAL) {
+ /* This is the normal return value when we try to check carrier
+ * if the network device is not up. */
+ } else {
+ VLOG_WARN_RL(&rl, "%s: read failed: %s", fn, strerror(error));
+ }
+ goto exit;
+ } else if (retval == 0) {
+ error = EPROTO;
+ VLOG_WARN_RL(&rl, "%s: unexpected end of file", fn);
+ goto exit;
+ }
+
+ if (line[0] != '0' && line[0] != '1') {
+ error = EPROTO;
+ VLOG_WARN_RL(&rl, "%s: value is %c (expected 0 or 1)",
+ fn, line[0]);
+ goto exit;
+ }
+ netdev->cache->carrier = line[0] != '0';
+ netdev->cache->valid |= VALID_CARRIER;
+ }
+ *carrier = netdev->cache->carrier;
+ error = 0;
+
+exit:
+ if (fd >= 0) {
+ close(fd);
+ }
+ free(fn);
+ return error;
+}
+
+/* Check whether we can we use RTM_GETLINK to get network device statistics.
+ * In pre-2.6.19 kernels, this was only available if wireless extensions were
+ * enabled. */
+static bool
+check_for_working_netlink_stats(void)
+{
+ /* Decide on the netdev_get_stats() implementation to use. Netlink is
+ * preferable, so if that works, we'll use it. */
+ int ifindex = do_get_ifindex("lo");
+ if (ifindex < 0) {
+ VLOG_WARN("failed to get ifindex for lo, "
+ "obtaining netdev stats from proc");
+ return false;
+ } else {
+ struct netdev_stats stats;
+ int error = get_stats_via_netlink(ifindex, &stats);
+ if (!error) {
+ VLOG_DBG("obtaining netdev stats via rtnetlink");
+ return true;
+ } else {
+ VLOG_INFO("RTM_GETLINK failed (%s), obtaining netdev stats "
+ "via proc (you are probably running a pre-2.6.19 "
+ "kernel)", strerror(error));
+ return false;
+ }
+ }
+}
+
+/* Retrieves current device stats for 'netdev'.
+ *
+ * XXX All of the members of struct netdev_stats are 64 bits wide, but on
+ * 32-bit architectures the Linux network stats are only 32 bits. */
+static int
+netdev_linux_get_stats(const struct netdev *netdev_, struct netdev_stats *stats)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ static int use_netlink_stats = -1;
+ int error;
+ struct netdev_stats raw_stats;
+ struct netdev_stats *collect_stats = stats;
+
+ COVERAGE_INC(netdev_get_stats);
+
+ if (!(netdev->cache->valid & VALID_IS_INTERNAL)) {
+ netdev->cache->is_internal = (netdev->tap_fd != -1);
+
+ if (!netdev->cache->is_internal) {
+ struct ethtool_drvinfo drvinfo;
+
+ memset(&drvinfo, 0, sizeof drvinfo);
+ error = netdev_linux_do_ethtool(&netdev->netdev,
+ (struct ethtool_cmd *)&drvinfo,
+ ETHTOOL_GDRVINFO,
+ "ETHTOOL_GDRVINFO");
+
+ if (!error) {
+ netdev->cache->is_internal = !strcmp(drvinfo.driver,
+ "openvswitch");
+ }
+ }
+
+ netdev->cache->valid |= VALID_IS_INTERNAL;
+ }
+
+ if (netdev->cache->is_internal) {
+ collect_stats = &raw_stats;
+ }
+
+ if (use_netlink_stats < 0) {
+ use_netlink_stats = check_for_working_netlink_stats();
+ }
+ if (use_netlink_stats) {
+ int ifindex;
+
+ error = get_ifindex(&netdev->netdev, &ifindex);
+ if (!error) {
+ error = get_stats_via_netlink(ifindex, collect_stats);
+ }
+ } else {
+ error = get_stats_via_proc(netdev->netdev.name, collect_stats);
+ }
+
+ /* If this port is an internal port then the transmit and receive stats
+ * will appear to be swapped relative to the other ports since we are the
+ * one sending the data, not a remote computer. For consistency, we swap
+ * them back here. */
+ if (netdev->cache->is_internal) {
+ stats->rx_packets = raw_stats.tx_packets;
+ stats->tx_packets = raw_stats.rx_packets;
+ stats->rx_bytes = raw_stats.tx_bytes;
+ stats->tx_bytes = raw_stats.rx_bytes;
+ stats->rx_errors = raw_stats.tx_errors;
+ stats->tx_errors = raw_stats.rx_errors;
+ stats->rx_dropped = raw_stats.tx_dropped;
+ stats->tx_dropped = raw_stats.rx_dropped;
+ stats->multicast = raw_stats.multicast;
+ stats->collisions = raw_stats.collisions;
+ stats->rx_length_errors = 0;
+ stats->rx_over_errors = 0;
+ stats->rx_crc_errors = 0;
+ stats->rx_frame_errors = 0;
+ stats->rx_fifo_errors = 0;
+ stats->rx_missed_errors = 0;
+ stats->tx_aborted_errors = 0;
+ stats->tx_carrier_errors = 0;
+ stats->tx_fifo_errors = 0;
+ stats->tx_heartbeat_errors = 0;
+ stats->tx_window_errors = 0;
+ }
+
+ return error;
+}
+
+/* Stores the features supported by 'netdev' into each of '*current',
+ * '*advertised', '*supported', and '*peer' that are non-null. Each value is a
+ * bitmap of "enum ofp_port_features" bits, in host byte order. Returns 0 if
+ * successful, otherwise a positive errno value. On failure, all of the
+ * passed-in values are set to 0. */
+static int
+netdev_linux_get_features(struct netdev *netdev,
+ uint32_t *current, uint32_t *advertised,
+ uint32_t *supported, uint32_t *peer)
+{
+ struct ethtool_cmd ecmd;
+ int error;
+
+ memset(&ecmd, 0, sizeof ecmd);
+ error = netdev_linux_do_ethtool(netdev, &ecmd,
+ ETHTOOL_GSET, "ETHTOOL_GSET");
+ if (error) {
+ return error;
+ }
+
+ /* Supported features. */
+ *supported = 0;
+ if (ecmd.supported & SUPPORTED_10baseT_Half) {
+ *supported |= OFPPF_10MB_HD;
+ }
+ if (ecmd.supported & SUPPORTED_10baseT_Full) {
+ *supported |= OFPPF_10MB_FD;
+ }
+ if (ecmd.supported & SUPPORTED_100baseT_Half) {
+ *supported |= OFPPF_100MB_HD;
+ }
+ if (ecmd.supported & SUPPORTED_100baseT_Full) {
+ *supported |= OFPPF_100MB_FD;
+ }
+ if (ecmd.supported & SUPPORTED_1000baseT_Half) {
+ *supported |= OFPPF_1GB_HD;
+ }
+ if (ecmd.supported & SUPPORTED_1000baseT_Full) {
+ *supported |= OFPPF_1GB_FD;
+ }
+ if (ecmd.supported & SUPPORTED_10000baseT_Full) {
+ *supported |= OFPPF_10GB_FD;
+ }
+ if (ecmd.supported & SUPPORTED_TP) {
+ *supported |= OFPPF_COPPER;
+ }
+ if (ecmd.supported & SUPPORTED_FIBRE) {
+ *supported |= OFPPF_FIBER;
+ }
+ if (ecmd.supported & SUPPORTED_Autoneg) {
+ *supported |= OFPPF_AUTONEG;
+ }
+ if (ecmd.supported & SUPPORTED_Pause) {
+ *supported |= OFPPF_PAUSE;
+ }
+ if (ecmd.supported & SUPPORTED_Asym_Pause) {
+ *supported |= OFPPF_PAUSE_ASYM;
+ }
+
+ /* Advertised features. */
+ *advertised = 0;
+ if (ecmd.advertising & ADVERTISED_10baseT_Half) {
+ *advertised |= OFPPF_10MB_HD;
+ }
+ if (ecmd.advertising & ADVERTISED_10baseT_Full) {
+ *advertised |= OFPPF_10MB_FD;
+ }
+ if (ecmd.advertising & ADVERTISED_100baseT_Half) {
+ *advertised |= OFPPF_100MB_HD;
+ }
+ if (ecmd.advertising & ADVERTISED_100baseT_Full) {
+ *advertised |= OFPPF_100MB_FD;
+ }
+ if (ecmd.advertising & ADVERTISED_1000baseT_Half) {
+ *advertised |= OFPPF_1GB_HD;
+ }
+ if (ecmd.advertising & ADVERTISED_1000baseT_Full) {
+ *advertised |= OFPPF_1GB_FD;
+ }
+ if (ecmd.advertising & ADVERTISED_10000baseT_Full) {
+ *advertised |= OFPPF_10GB_FD;
+ }
+ if (ecmd.advertising & ADVERTISED_TP) {
+ *advertised |= OFPPF_COPPER;
+ }
+ if (ecmd.advertising & ADVERTISED_FIBRE) {
+ *advertised |= OFPPF_FIBER;
+ }
+ if (ecmd.advertising & ADVERTISED_Autoneg) {
+ *advertised |= OFPPF_AUTONEG;
+ }
+ if (ecmd.advertising & ADVERTISED_Pause) {
+ *advertised |= OFPPF_PAUSE;
+ }
+ if (ecmd.advertising & ADVERTISED_Asym_Pause) {
+ *advertised |= OFPPF_PAUSE_ASYM;
+ }
+
+ /* Current settings. */
+ if (ecmd.speed == SPEED_10) {
+ *current = ecmd.duplex ? OFPPF_10MB_FD : OFPPF_10MB_HD;
+ } else if (ecmd.speed == SPEED_100) {
+ *current = ecmd.duplex ? OFPPF_100MB_FD : OFPPF_100MB_HD;
+ } else if (ecmd.speed == SPEED_1000) {
+ *current = ecmd.duplex ? OFPPF_1GB_FD : OFPPF_1GB_HD;
+ } else if (ecmd.speed == SPEED_10000) {
+ *current = OFPPF_10GB_FD;
+ } else {
+ *current = 0;
+ }
+
+ if (ecmd.port == PORT_TP) {
+ *current |= OFPPF_COPPER;
+ } else if (ecmd.port == PORT_FIBRE) {
+ *current |= OFPPF_FIBER;
+ }
+
+ if (ecmd.autoneg) {
+ *current |= OFPPF_AUTONEG;
+ }
+
+ /* Peer advertisements. */
+ *peer = 0; /* XXX */
+
+ return 0;
+}
+
+/* Set the features advertised by 'netdev' to 'advertise'. */
+static int
+netdev_linux_set_advertisements(struct netdev *netdev, uint32_t advertise)
+{
+ struct ethtool_cmd ecmd;
+ int error;
+
+ memset(&ecmd, 0, sizeof ecmd);
+ error = netdev_linux_do_ethtool(netdev, &ecmd,
+ ETHTOOL_GSET, "ETHTOOL_GSET");
+ if (error) {
+ return error;
+ }
+
+ ecmd.advertising = 0;
+ if (advertise & OFPPF_10MB_HD) {
+ ecmd.advertising |= ADVERTISED_10baseT_Half;
+ }
+ if (advertise & OFPPF_10MB_FD) {
+ ecmd.advertising |= ADVERTISED_10baseT_Full;
+ }
+ if (advertise & OFPPF_100MB_HD) {
+ ecmd.advertising |= ADVERTISED_100baseT_Half;
+ }
+ if (advertise & OFPPF_100MB_FD) {
+ ecmd.advertising |= ADVERTISED_100baseT_Full;
+ }
+ if (advertise & OFPPF_1GB_HD) {
+ ecmd.advertising |= ADVERTISED_1000baseT_Half;
+ }
+ if (advertise & OFPPF_1GB_FD) {
+ ecmd.advertising |= ADVERTISED_1000baseT_Full;
+ }
+ if (advertise & OFPPF_10GB_FD) {
+ ecmd.advertising |= ADVERTISED_10000baseT_Full;
+ }
+ if (advertise & OFPPF_COPPER) {
+ ecmd.advertising |= ADVERTISED_TP;
+ }
+ if (advertise & OFPPF_FIBER) {
+ ecmd.advertising |= ADVERTISED_FIBRE;
+ }
+ if (advertise & OFPPF_AUTONEG) {
+ ecmd.advertising |= ADVERTISED_Autoneg;
+ }
+ if (advertise & OFPPF_PAUSE) {
+ ecmd.advertising |= ADVERTISED_Pause;
+ }
+ if (advertise & OFPPF_PAUSE_ASYM) {
+ ecmd.advertising |= ADVERTISED_Asym_Pause;
+ }
+ return netdev_linux_do_ethtool(netdev, &ecmd,
+ ETHTOOL_SSET, "ETHTOOL_SSET");
+}
+
+/* If 'netdev_name' is the name of a VLAN network device (e.g. one created with
+ * vconfig(8)), sets '*vlan_vid' to the VLAN VID associated with that device
+ * and returns 0. Otherwise returns a errno value (specifically ENOENT if
+ * 'netdev_name' is the name of a network device that is not a VLAN device) and
+ * sets '*vlan_vid' to -1. */
+static int
+netdev_linux_get_vlan_vid(const struct netdev *netdev, int *vlan_vid)
+{
+ const char *netdev_name = netdev_get_name(netdev);
+ struct ds line = DS_EMPTY_INITIALIZER;
+ FILE *stream = NULL;
+ int error;
+ char *fn;
+
+ COVERAGE_INC(netdev_get_vlan_vid);
+ fn = xasprintf("/proc/net/vlan/%s", netdev_name);
+ stream = fopen(fn, "r");
+ if (!stream) {
+ error = errno;
+ goto done;
+ }
+
+ if (ds_get_line(&line, stream)) {
+ if (ferror(stream)) {
+ error = errno;
+ VLOG_ERR_RL(&rl, "error reading \"%s\": %s", fn, strerror(errno));
+ } else {
+ error = EPROTO;
+ VLOG_ERR_RL(&rl, "unexpected end of file reading \"%s\"", fn);
+ }
+ goto done;
+ }
+
+ if (!sscanf(ds_cstr(&line), "%*s VID: %d", vlan_vid)) {
+ error = EPROTO;
+ VLOG_ERR_RL(&rl, "parse error reading \"%s\" line 1: \"%s\"",
+ fn, ds_cstr(&line));
+ goto done;
+ }
+
+ error = 0;
+
+done:
+ free(fn);
+ if (stream) {
+ fclose(stream);
+ }
+ ds_destroy(&line);
+ if (error) {
+ *vlan_vid = -1;
+ }
+ return error;
+}
+
+#define POLICE_ADD_CMD "/sbin/tc qdisc add dev %s handle ffff: ingress"
+#define POLICE_CONFIG_CMD "/sbin/tc filter add dev %s parent ffff: protocol ip prio 50 u32 match ip src 0.0.0.0/0 police rate %dkbit burst %dk mtu 65535 drop flowid :1"
+/* We redirect stderr to /dev/null because we often want to remove all
+ * traffic control configuration on a port so its in a known state. If
+ * this done when there is no such configuration, tc complains, so we just
+ * always ignore it.
+ */
+#define POLICE_DEL_CMD "/sbin/tc qdisc del dev %s handle ffff: ingress 2>/dev/null"
+
+/* Attempts to set input rate limiting (policing) policy. */
+static int
+netdev_linux_set_policing(struct netdev *netdev,
+ uint32_t kbits_rate, uint32_t kbits_burst)
+{
+ const char *netdev_name = netdev_get_name(netdev);
+ char command[1024];
+
+ COVERAGE_INC(netdev_set_policing);
+ if (kbits_rate) {
+ if (!kbits_burst) {
+ /* Default to 10 kilobits if not specified. */
+ kbits_burst = 10;
+ }
+
+ /* xxx This should be more careful about only adding if it
+ * xxx actually exists, as opposed to always deleting it. */
+ snprintf(command, sizeof(command), POLICE_DEL_CMD, netdev_name);
+ if (system(command) == -1) {
+ VLOG_WARN_RL(&rl, "%s: problem removing policing", netdev_name);
+ }
+
+ snprintf(command, sizeof(command), POLICE_ADD_CMD, netdev_name);
+ if (system(command) != 0) {
+ VLOG_WARN_RL(&rl, "%s: problem adding policing", netdev_name);
+ return -1;
+ }
+
+ snprintf(command, sizeof(command), POLICE_CONFIG_CMD, netdev_name,
+ kbits_rate, kbits_burst);
+ if (system(command) != 0) {
+ VLOG_WARN_RL(&rl, "%s: problem configuring policing",
+ netdev_name);
+ return -1;
+ }
+ } else {
+ snprintf(command, sizeof(command), POLICE_DEL_CMD, netdev_name);
+ if (system(command) == -1) {
+ VLOG_WARN_RL(&rl, "%s: problem removing policing", netdev_name);
+ }
+ }
+
+ return 0;
+}
+
+static int
+netdev_linux_get_in4(const struct netdev *netdev_,
+ struct in_addr *address, struct in_addr *netmask)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ if (!(netdev->cache->valid & VALID_IN4)) {
+ int error;
+
+ error = netdev_linux_get_ipv4(netdev_, &netdev->cache->address,
+ SIOCGIFADDR, "SIOCGIFADDR");
+ if (error) {
+ return error;
+ }
+
+ error = netdev_linux_get_ipv4(netdev_, &netdev->cache->netmask,
+ SIOCGIFNETMASK, "SIOCGIFNETMASK");
+ if (error) {
+ return error;
+ }
+
+ netdev->cache->valid |= VALID_IN4;
+ }
+ *address = netdev->cache->address;
+ *netmask = netdev->cache->netmask;
+ return address->s_addr == INADDR_ANY ? EADDRNOTAVAIL : 0;
+}
+
+static int
+netdev_linux_set_in4(struct netdev *netdev_, struct in_addr address,
+ struct in_addr netmask)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ int error;
+
+ error = do_set_addr(netdev_, SIOCSIFADDR, "SIOCSIFADDR", address);
+ if (!error) {
+ netdev->cache->valid |= VALID_IN4;
+ netdev->cache->address = address;
+ netdev->cache->netmask = netmask;
+ if (address.s_addr != INADDR_ANY) {
+ error = do_set_addr(netdev_, SIOCSIFNETMASK,
+ "SIOCSIFNETMASK", netmask);
+ }
+ }
+ return error;
+}
+
+static bool
+parse_if_inet6_line(const char *line,
+ struct in6_addr *in6, char ifname[16 + 1])
+{
+ uint8_t *s6 = in6->s6_addr;
+#define X8 "%2"SCNx8
+ return sscanf(line,
+ " "X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8
+ "%*x %*x %*x %*x %16s\n",
+ &s6[0], &s6[1], &s6[2], &s6[3],
+ &s6[4], &s6[5], &s6[6], &s6[7],
+ &s6[8], &s6[9], &s6[10], &s6[11],
+ &s6[12], &s6[13], &s6[14], &s6[15],
+ ifname) == 17;
+}
+
+/* If 'netdev' has an assigned IPv6 address, sets '*in6' to that address (if
+ * 'in6' is non-null) and returns true. Otherwise, returns false. */
+static int
+netdev_linux_get_in6(const struct netdev *netdev_, struct in6_addr *in6)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ if (!(netdev->cache->valid & VALID_IN6)) {
+ FILE *file;
+ char line[128];
+
+ netdev->cache->in6 = in6addr_any;
+
+ file = fopen("/proc/net/if_inet6", "r");
+ if (file != NULL) {
+ const char *name = netdev_get_name(netdev_);
+ while (fgets(line, sizeof line, file)) {
+ struct in6_addr in6;
+ char ifname[16 + 1];
+ if (parse_if_inet6_line(line, &in6, ifname)
+ && !strcmp(name, ifname))
+ {
+ netdev->cache->in6 = in6;
+ break;
+ }
+ }
+ fclose(file);
+ }
+ netdev->cache->valid |= VALID_IN6;
+ }
+ *in6 = netdev->cache->in6;
+ return 0;
+}
+
+static void
+make_in4_sockaddr(struct sockaddr *sa, struct in_addr addr)
+{
+ struct sockaddr_in sin;
+ memset(&sin, 0, sizeof sin);
+ sin.sin_family = AF_INET;
+ sin.sin_addr = addr;
+ sin.sin_port = 0;
+
+ memset(sa, 0, sizeof *sa);
+ memcpy(sa, &sin, sizeof sin);
+}
+
+static int
+do_set_addr(struct netdev *netdev,
+ int ioctl_nr, const char *ioctl_name, struct in_addr addr)
+{
+ struct ifreq ifr;
+ strncpy(ifr.ifr_name, netdev->name, sizeof ifr.ifr_name);
+ make_in4_sockaddr(&ifr.ifr_addr, addr);
+ return netdev_linux_do_ioctl(netdev, &ifr, ioctl_nr, ioctl_name);
+}
+
+/* Adds 'router' as a default IP gateway. */
+static int
+netdev_linux_add_router(struct netdev *netdev UNUSED, struct in_addr router)
+{
+ struct in_addr any = { INADDR_ANY };
+ struct rtentry rt;
+ int error;
+
+ memset(&rt, 0, sizeof rt);
+ make_in4_sockaddr(&rt.rt_dst, any);
+ make_in4_sockaddr(&rt.rt_gateway, router);
+ make_in4_sockaddr(&rt.rt_genmask, any);
+ rt.rt_flags = RTF_UP | RTF_GATEWAY;
+ COVERAGE_INC(netdev_add_router);
+ error = ioctl(af_inet_sock, SIOCADDRT, &rt) < 0 ? errno : 0;
+ if (error) {
+ VLOG_WARN("ioctl(SIOCADDRT): %s", strerror(error));
+ }
+ return error;
+}
+
+static int
+netdev_linux_get_next_hop(const struct in_addr *host, struct in_addr *next_hop,
+ char **netdev_name)
+{
+ static const char fn[] = "/proc/net/route";
+ FILE *stream;
+ char line[256];
+ int ln;
+
+ *netdev_name = NULL;
+ stream = fopen(fn, "r");
+ if (stream == NULL) {
+ VLOG_WARN_RL(&rl, "%s: open failed: %s", fn, strerror(errno));
+ return errno;
+ }
+
+ ln = 0;
+ while (fgets(line, sizeof line, stream)) {
+ if (++ln >= 2) {
+ char iface[17];
+ uint32_t dest, gateway, mask;
+ int refcnt, metric, mtu;
+ unsigned int flags, use, window, irtt;
+
+ if (sscanf(line,
+ "%16s %"SCNx32" %"SCNx32" %04X %d %u %d %"SCNx32
+ " %d %u %u\n",
+ iface, &dest, &gateway, &flags, &refcnt,
+ &use, &metric, &mask, &mtu, &window, &irtt) != 11) {
+
+ VLOG_WARN_RL(&rl, "%s: could not parse line %d: %s",
+ fn, ln, line);
+ continue;
+ }
+ if (!(flags & RTF_UP)) {
+ /* Skip routes that aren't up. */
+ continue;
+ }
+
+ /* The output of 'dest', 'mask', and 'gateway' were given in
+ * network byte order, so we don't need need any endian
+ * conversions here. */
+ if ((dest & mask) == (host->s_addr & mask)) {
+ if (!gateway) {
+ /* The host is directly reachable. */
+ next_hop->s_addr = 0;
+ } else {
+ /* To reach the host, we must go through a gateway. */
+ next_hop->s_addr = gateway;
+ }
+ *netdev_name = xstrdup(iface);
+ fclose(stream);
+ return 0;
+ }
+ }
+ }
+
+ fclose(stream);
+ return ENXIO;
+}
+
+/* Looks up the ARP table entry for 'ip' on 'netdev'. If one exists and can be
+ * successfully retrieved, it stores the corresponding MAC address in 'mac' and
+ * returns 0. Otherwise, it returns a positive errno value; in particular,
+ * ENXIO indicates that there is not ARP table entry for 'ip' on 'netdev'. */
+static int
+netdev_linux_arp_lookup(const struct netdev *netdev,
+ uint32_t ip, uint8_t mac[ETH_ADDR_LEN])
+{
+ struct arpreq r;
+ struct sockaddr_in *pa;
+ int retval;
+
+ memset(&r, 0, sizeof r);
+ pa = (struct sockaddr_in *) &r.arp_pa;
+ pa->sin_family = AF_INET;
+ pa->sin_addr.s_addr = ip;
+ pa->sin_port = 0;
+ r.arp_ha.sa_family = ARPHRD_ETHER;
+ r.arp_flags = 0;
+ strncpy(r.arp_dev, netdev->name, sizeof r.arp_dev);
+ COVERAGE_INC(netdev_arp_lookup);
+ retval = ioctl(af_inet_sock, SIOCGARP, &r) < 0 ? errno : 0;
+ if (!retval) {
+ memcpy(mac, r.arp_ha.sa_data, ETH_ADDR_LEN);
+ } else if (retval != ENXIO) {
+ VLOG_WARN_RL(&rl, "%s: could not look up ARP entry for "IP_FMT": %s",
+ netdev->name, IP_ARGS(&ip), strerror(retval));
+ }
+ return retval;
+}
+
+static int
+nd_to_iff_flags(enum netdev_flags nd)
+{
+ int iff = 0;
+ if (nd & NETDEV_UP) {
+ iff |= IFF_UP;
+ }
+ if (nd & NETDEV_PROMISC) {
+ iff |= IFF_PROMISC;
+ }
+ return iff;
+}
+
+static int
+iff_to_nd_flags(int iff)
+{
+ enum netdev_flags nd = 0;
+ if (iff & IFF_UP) {
+ nd |= NETDEV_UP;
+ }
+ if (iff & IFF_PROMISC) {
+ nd |= NETDEV_PROMISC;
+ }
+ return nd;
+}
+
+static int
+netdev_linux_update_flags(struct netdev *netdev, enum netdev_flags off,
+ enum netdev_flags on, enum netdev_flags *old_flagsp)
+{
+ int old_flags, new_flags;
+ int error;
+
+ error = get_flags(netdev, &old_flags);
+ if (!error) {
+ *old_flagsp = iff_to_nd_flags(old_flags);
+ new_flags = (old_flags & ~nd_to_iff_flags(off)) | nd_to_iff_flags(on);
+ if (new_flags != old_flags) {
+ error = set_flags(netdev, new_flags);
+ }
+ }
+ return error;
+}
+
+static void
+poll_notify(struct list *list)
+{
+ struct netdev_linux_notifier *notifier;
+ LIST_FOR_EACH (notifier, struct netdev_linux_notifier, node, list) {
+ struct netdev_notifier *n = ¬ifier->notifier;
+ n->cb(n);
+ }
+}
+
+static void
+netdev_linux_poll_cb(const struct rtnetlink_change *change,
+ void *aux UNUSED)
+{
+ if (change) {
+ struct list *list = shash_find_data(&netdev_linux_notifiers,
+ change->ifname);
+ if (list) {
+ poll_notify(list);
+ }
+ } else {
+ struct shash_node *node;
+ SHASH_FOR_EACH (node, &netdev_linux_notifiers) {
+ poll_notify(node->data);
+ }
+ }
+}
+
+static int
+netdev_linux_poll_add(struct netdev *netdev,
+ void (*cb)(struct netdev_notifier *), void *aux,
+ struct netdev_notifier **notifierp)
+{
+ const char *netdev_name = netdev_get_name(netdev);
+ struct netdev_linux_notifier *notifier;
+ struct list *list;
+
+ if (shash_is_empty(&netdev_linux_notifiers)) {
+ int error = rtnetlink_notifier_register(&netdev_linux_poll_notifier,
+ netdev_linux_poll_cb, NULL);
+ if (error) {
+ return error;
+ }
+ }
+
+ list = shash_find_data(&netdev_linux_notifiers, netdev_name);
+ if (!list) {
+ list = xmalloc(sizeof *list);
+ list_init(list);
+ shash_add(&netdev_linux_notifiers, netdev_name, list);
+ }
+
+ notifier = xmalloc(sizeof *notifier);
+ netdev_notifier_init(¬ifier->notifier, netdev, cb, aux);
+ list_push_back(list, ¬ifier->node);
+ *notifierp = ¬ifier->notifier;
+ return 0;
+}
+
+static void
+netdev_linux_poll_remove(struct netdev_notifier *notifier_)
+{
+ struct netdev_linux_notifier *notifier =
+ CONTAINER_OF(notifier_, struct netdev_linux_notifier, notifier);
+ struct list *list;
+
+ /* Remove 'notifier' from its list. */
+ list = list_remove(¬ifier->node);
+ if (list_is_empty(list)) {
+ /* The list is now empty. Remove it from the hash and free it. */
+ const char *netdev_name = netdev_get_name(notifier->notifier.netdev);
+ shash_delete(&netdev_linux_notifiers,
+ shash_find(&netdev_linux_notifiers, netdev_name));
+ free(list);
+ }
+ free(notifier);
+
+ /* If that was the last notifier, unregister. */
+ if (shash_is_empty(&netdev_linux_notifiers)) {
+ rtnetlink_notifier_unregister(&netdev_linux_poll_notifier);
+ }
+}
+
+const struct netdev_class netdev_linux_class = {
+ "", /* prefix */
+ "linux", /* name */
+
+ netdev_linux_init,
+ netdev_linux_run,
+ netdev_linux_wait,
+
+ netdev_linux_open,
+ netdev_linux_close,
+
+ netdev_linux_enumerate,
+
+ netdev_linux_recv,
+ netdev_linux_recv_wait,
+ netdev_linux_drain,
+
+ netdev_linux_send,
+ netdev_linux_send_wait,
+
+ netdev_linux_set_etheraddr,
+ netdev_linux_get_etheraddr,
+ netdev_linux_get_mtu,
+ netdev_linux_get_carrier,
+ netdev_linux_get_stats,
+
+ netdev_linux_get_features,
+ netdev_linux_set_advertisements,
+ netdev_linux_get_vlan_vid,
+ netdev_linux_set_policing,
+
+ netdev_linux_get_in4,
+ netdev_linux_set_in4,
+ netdev_linux_get_in6,
+ netdev_linux_add_router,
+ netdev_linux_get_next_hop,
+ netdev_linux_arp_lookup,
+
+ netdev_linux_update_flags,
+
+ netdev_linux_poll_add,
+ netdev_linux_poll_remove,
+};
+
+const struct netdev_class netdev_tap_class = {
+ "tap", /* prefix */
+ "tap", /* name */
+
+ netdev_linux_init,
+ NULL, /* run */
+ NULL, /* wait */
+
+ netdev_linux_open,
+ netdev_linux_close,
+
+ netdev_linux_enumerate,
+
+ netdev_linux_recv,
+ netdev_linux_recv_wait,
+ netdev_linux_drain,
+
+ netdev_linux_send,
+ netdev_linux_send_wait,
+
+ netdev_linux_set_etheraddr,
+ netdev_linux_get_etheraddr,
+ netdev_linux_get_mtu,
+ netdev_linux_get_carrier,
+ netdev_linux_get_stats,
+
+ netdev_linux_get_features,
+ netdev_linux_set_advertisements,
+ netdev_linux_get_vlan_vid,
+ netdev_linux_set_policing,
+
+ netdev_linux_get_in4,
+ netdev_linux_set_in4,
+ netdev_linux_get_in6,
+ netdev_linux_add_router,
+ netdev_linux_get_next_hop,
+ netdev_linux_arp_lookup,
+
+ netdev_linux_update_flags,
+
+ netdev_linux_poll_add,
+ netdev_linux_poll_remove,
+};
+\f
+static int
+get_stats_via_netlink(int ifindex, struct netdev_stats *stats)
+{
+ /* Policy for RTNLGRP_LINK messages.
+ *
+ * There are *many* more fields in these messages, but currently we only
+ * care about these fields. */
+ static const struct nl_policy rtnlgrp_link_policy[] = {
+ [IFLA_IFNAME] = { .type = NL_A_STRING, .optional = false },
+ [IFLA_STATS] = { .type = NL_A_UNSPEC, .optional = true,
+ .min_len = sizeof(struct rtnl_link_stats) },
+ };
+
+
+ static struct nl_sock *rtnl_sock;
+ struct ofpbuf request;
+ struct ofpbuf *reply;
+ struct ifinfomsg *ifi;
+ const struct rtnl_link_stats *rtnl_stats;
+ struct nlattr *attrs[ARRAY_SIZE(rtnlgrp_link_policy)];
+ int error;
+
+ if (!rtnl_sock) {
+ error = nl_sock_create(NETLINK_ROUTE, 0, 0, 0, &rtnl_sock);
+ if (error) {
+ VLOG_ERR_RL(&rl, "failed to create rtnetlink socket: %s",
+ strerror(error));
+ return error;
+ }
+ }
+
+ ofpbuf_init(&request, 0);
+ nl_msg_put_nlmsghdr(&request, rtnl_sock, sizeof *ifi,
+ RTM_GETLINK, NLM_F_REQUEST);
+ ifi = ofpbuf_put_zeros(&request, sizeof *ifi);
+ ifi->ifi_family = PF_UNSPEC;
+ ifi->ifi_index = ifindex;
+ error = nl_sock_transact(rtnl_sock, &request, &reply);
+ ofpbuf_uninit(&request);
+ if (error) {
+ return error;
+ }
+
+ if (!nl_policy_parse(reply, NLMSG_HDRLEN + sizeof(struct ifinfomsg),
+ rtnlgrp_link_policy,
+ attrs, ARRAY_SIZE(rtnlgrp_link_policy))) {
+ ofpbuf_delete(reply);
+ return EPROTO;
+ }
+
+ if (!attrs[IFLA_STATS]) {
+ VLOG_WARN_RL(&rl, "RTM_GETLINK reply lacks stats");
+ ofpbuf_delete(reply);
+ return EPROTO;
+ }
+
+ rtnl_stats = nl_attr_get(attrs[IFLA_STATS]);
+ stats->rx_packets = rtnl_stats->rx_packets;
+ stats->tx_packets = rtnl_stats->tx_packets;
+ stats->rx_bytes = rtnl_stats->rx_bytes;
+ stats->tx_bytes = rtnl_stats->tx_bytes;
+ stats->rx_errors = rtnl_stats->rx_errors;
+ stats->tx_errors = rtnl_stats->tx_errors;
+ stats->rx_dropped = rtnl_stats->rx_dropped;
+ stats->tx_dropped = rtnl_stats->tx_dropped;
+ stats->multicast = rtnl_stats->multicast;
+ stats->collisions = rtnl_stats->collisions;
+ stats->rx_length_errors = rtnl_stats->rx_length_errors;
+ stats->rx_over_errors = rtnl_stats->rx_over_errors;
+ stats->rx_crc_errors = rtnl_stats->rx_crc_errors;
+ stats->rx_frame_errors = rtnl_stats->rx_frame_errors;
+ stats->rx_fifo_errors = rtnl_stats->rx_fifo_errors;
+ stats->rx_missed_errors = rtnl_stats->rx_missed_errors;
+ stats->tx_aborted_errors = rtnl_stats->tx_aborted_errors;
+ stats->tx_carrier_errors = rtnl_stats->tx_carrier_errors;
+ stats->tx_fifo_errors = rtnl_stats->tx_fifo_errors;
+ stats->tx_heartbeat_errors = rtnl_stats->tx_heartbeat_errors;
+ stats->tx_window_errors = rtnl_stats->tx_window_errors;
+
+ ofpbuf_delete(reply);
+
+ return 0;
+}
+
+static int
+get_stats_via_proc(const char *netdev_name, struct netdev_stats *stats)
+{
+ static const char fn[] = "/proc/net/dev";
+ char line[1024];
+ FILE *stream;
+ int ln;
+
+ stream = fopen(fn, "r");
+ if (!stream) {
+ VLOG_WARN_RL(&rl, "%s: open failed: %s", fn, strerror(errno));
+ return errno;
+ }
+
+ ln = 0;
+ while (fgets(line, sizeof line, stream)) {
+ if (++ln >= 3) {
+ char devname[16];
+#define X64 "%"SCNu64
+ if (sscanf(line,
+ " %15[^:]:"
+ X64 X64 X64 X64 X64 X64 X64 "%*u"
+ X64 X64 X64 X64 X64 X64 X64 "%*u",
+ devname,
+ &stats->rx_bytes,
+ &stats->rx_packets,
+ &stats->rx_errors,
+ &stats->rx_dropped,
+ &stats->rx_fifo_errors,
+ &stats->rx_frame_errors,
+ &stats->multicast,
+ &stats->tx_bytes,
+ &stats->tx_packets,
+ &stats->tx_errors,
+ &stats->tx_dropped,
+ &stats->tx_fifo_errors,
+ &stats->collisions,
+ &stats->tx_carrier_errors) != 15) {
+ VLOG_WARN_RL(&rl, "%s:%d: parse error", fn, ln);
+ } else if (!strcmp(devname, netdev_name)) {
+ stats->rx_length_errors = UINT64_MAX;
+ stats->rx_over_errors = UINT64_MAX;
+ stats->rx_crc_errors = UINT64_MAX;
+ stats->rx_missed_errors = UINT64_MAX;
+ stats->tx_aborted_errors = UINT64_MAX;
+ stats->tx_heartbeat_errors = UINT64_MAX;
+ stats->tx_window_errors = UINT64_MAX;
+ fclose(stream);
+ return 0;
+ }
+ }
+ }
+ VLOG_WARN_RL(&rl, "%s: no stats for %s", fn, netdev_name);
+ fclose(stream);
+ return ENODEV;
+}
+\f
+static int
+get_flags(const struct netdev *netdev, int *flags)
+{
+ struct ifreq ifr;
+ int error;
+
+ error = netdev_linux_do_ioctl(netdev, &ifr, SIOCGIFFLAGS, "SIOCGIFFLAGS");
+ *flags = ifr.ifr_flags;
+ return error;
+}
+
+static int
+set_flags(struct netdev *netdev, int flags)
+{
+ struct ifreq ifr;
+
+ ifr.ifr_flags = flags;
+ return netdev_linux_do_ioctl(netdev, &ifr, SIOCSIFFLAGS, "SIOCSIFFLAGS");
+}
+
+static int
+do_get_ifindex(const char *netdev_name)
+{
+ struct ifreq ifr;
+
+ strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
+ COVERAGE_INC(netdev_get_ifindex);
+ if (ioctl(af_inet_sock, SIOCGIFINDEX, &ifr) < 0) {
+ VLOG_WARN_RL(&rl, "ioctl(SIOCGIFINDEX) on %s device failed: %s",
+ netdev_name, strerror(errno));
+ return -errno;
+ }
+ return ifr.ifr_ifindex;
+}
+
+static int
+get_ifindex(const struct netdev *netdev_, int *ifindexp)
+{
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+ *ifindexp = 0;
+ if (!(netdev->cache->valid & VALID_IFINDEX)) {
+ int ifindex = do_get_ifindex(netdev_get_name(netdev_));
+ if (ifindex < 0) {
+ return -ifindex;
+ }
+ netdev->cache->valid |= VALID_IFINDEX;
+ netdev->cache->ifindex = ifindex;
+ }
+ *ifindexp = netdev->cache->ifindex;
+ return 0;
+}
+
+static int
+get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN])
+{
+ struct ifreq ifr;
+ int hwaddr_family;
+
+ memset(&ifr, 0, sizeof ifr);
+ strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
+ COVERAGE_INC(netdev_get_hwaddr);
+ if (ioctl(af_inet_sock, SIOCGIFHWADDR, &ifr) < 0) {
+ VLOG_ERR("ioctl(SIOCGIFHWADDR) on %s device failed: %s",
+ netdev_name, strerror(errno));
+ return errno;
+ }
+ hwaddr_family = ifr.ifr_hwaddr.sa_family;
+ if (hwaddr_family != AF_UNSPEC && hwaddr_family != ARPHRD_ETHER) {
+ VLOG_WARN("%s device has unknown hardware address family %d",
+ netdev_name, hwaddr_family);
+ }
+ memcpy(ea, ifr.ifr_hwaddr.sa_data, ETH_ADDR_LEN);
+ return 0;
+}
+
+static int
+set_etheraddr(const char *netdev_name, int hwaddr_family,
+ const uint8_t mac[ETH_ADDR_LEN])
+{
+ struct ifreq ifr;
+
+ memset(&ifr, 0, sizeof ifr);
+ strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
+ ifr.ifr_hwaddr.sa_family = hwaddr_family;
+ memcpy(ifr.ifr_hwaddr.sa_data, mac, ETH_ADDR_LEN);
+ COVERAGE_INC(netdev_set_hwaddr);
+ if (ioctl(af_inet_sock, SIOCSIFHWADDR, &ifr) < 0) {
+ VLOG_ERR("ioctl(SIOCSIFHWADDR) on %s device failed: %s",
+ netdev_name, strerror(errno));
+ return errno;
+ }
+ return 0;
+}
+
+static int
+netdev_linux_do_ethtool(struct netdev *netdev, struct ethtool_cmd *ecmd,
+ int cmd, const char *cmd_name)
+{
+ struct ifreq ifr;
+
+ memset(&ifr, 0, sizeof ifr);
+ strncpy(ifr.ifr_name, netdev->name, sizeof ifr.ifr_name);
+ ifr.ifr_data = (caddr_t) ecmd;
+
+ ecmd->cmd = cmd;
+ COVERAGE_INC(netdev_ethtool);
+ if (ioctl(af_inet_sock, SIOCETHTOOL, &ifr) == 0) {
+ return 0;
+ } else {
+ if (errno != EOPNOTSUPP) {
+ VLOG_WARN_RL(&rl, "ethtool command %s on network device %s "
+ "failed: %s", cmd_name, netdev->name,
+ strerror(errno));
+ } else {
+ /* The device doesn't support this operation. That's pretty
+ * common, so there's no point in logging anything. */
+ }
+ return errno;
+ }
+}
+
+static int
+netdev_linux_do_ioctl(const struct netdev *netdev, struct ifreq *ifr,
+ int cmd, const char *cmd_name)
+{
+ strncpy(ifr->ifr_name, netdev_get_name(netdev), sizeof ifr->ifr_name);
+ if (ioctl(af_inet_sock, cmd, ifr) == -1) {
+ VLOG_DBG_RL(&rl, "%s: ioctl(%s) failed: %s",
+ netdev_get_name(netdev), cmd_name, strerror(errno));
+ return errno;
+ }
+ return 0;
+}
+
+static int
+netdev_linux_get_ipv4(const struct netdev *netdev, struct in_addr *ip,
+ int cmd, const char *cmd_name)
+{
+ struct ifreq ifr;
+ int error;
+
+ ifr.ifr_addr.sa_family = AF_INET;
+ error = netdev_linux_do_ioctl(netdev, &ifr, cmd, cmd_name);
+ if (!error) {
+ const struct sockaddr_in *sin = (struct sockaddr_in *) &ifr.ifr_addr;
+ *ip = sin->sin_addr;
+ }
+ return error;
+}
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_PROVIDER_H
+#define NETDEV_PROVIDER_H 1
+
+/* Generic interface to network devices. */
+
+#include <assert.h>
+#include "netdev.h"
+#include "list.h"
+
+/* A network device (e.g. an Ethernet device).
+ *
+ * This structure should be treated as opaque by network device
+ * implementations. */
+struct netdev {
+ const struct netdev_class *class;
+ char *name; /* e.g. "eth0" */
+ enum netdev_flags save_flags; /* Initial device flags. */
+ enum netdev_flags changed_flags; /* Flags that we changed. */
+ struct list node; /* Element in global list. */
+};
+
+void netdev_init(struct netdev *, const char *name,
+ const struct netdev_class *);
+static inline void netdev_assert_class(const struct netdev *netdev,
+ const struct netdev_class *class)
+{
+ assert(netdev->class == class);
+}
+
+/* A network device notifier.
+ *
+ * Network device implementations should use netdev_notifier_init() to
+ * initialize this structure, but they may freely read its members after
+ * initialization. */
+struct netdev_notifier {
+ struct netdev *netdev;
+ void (*cb)(struct netdev_notifier *);
+ void *aux;
+};
+void netdev_notifier_init(struct netdev_notifier *, struct netdev *,
+ void (*cb)(struct netdev_notifier *), void *aux);
+
+/* Network device class structure, to be defined by each implementation of a
+ * network device.
+ *
+ * These functions return 0 if successful or a positive errno value on failure,
+ * except where otherwise noted. */
+struct netdev_class {
+ /* Prefix for names of netdevs in this class, e.g. "ndunix:".
+ *
+ * One netdev class may have the empty string "" as its prefix, in which
+ * case that netdev class is associated with netdev names that do not
+ * contain a colon. */
+ const char *prefix;
+
+ /* Class name, for use in error messages. */
+ const char *name;
+
+ /* Called only once, at program startup. Returning an error from this
+ * function will prevent any network device in this class from being
+ * opened.
+ *
+ * This function may be set to null if a network device class needs no
+ * initialization at program startup. */
+ int (*init)(void);
+
+ /* Performs periodic work needed by netdevs of this class. May be null if
+ * no periodic work is necessary. */
+ void (*run)(void);
+
+ /* Arranges for poll_block() to wake up if the "run" member function needs
+ * to be called. May be null if nothing is needed here. */
+ void (*wait)(void);
+
+ /* Attempts to open a network device. On success, sets '*netdevp' to the
+ * new network device. 'name' is the full network device name provided by
+ * the user. This name is useful for error messages but must not be
+ * modified.
+ *
+ * 'suffix' is a copy of 'name' following the netdev's 'prefix'.
+ *
+ * 'ethertype' may be a 16-bit Ethernet protocol value in host byte order
+ * to capture frames of that type received on the device. It may also be
+ * one of the 'enum netdev_pseudo_ethertype' values to receive frames in
+ * one of those categories. */
+ int (*open)(const char *name, char *suffix, int ethertype,
+ struct netdev **netdevp);
+
+ /* Closes 'netdev'. */
+ void (*close)(struct netdev *netdev);
+
+ /* Enumerates the names of all network devices of this class.
+ *
+ * The caller has already initialized 'all_names' and might already have
+ * added some names to it. This function should not disturb any existing
+ * names in 'all_names'.
+ *
+ * If this netdev class does not support enumeration, this may be a null
+ * pointer. */
+ int (*enumerate)(struct svec *all_names);
+
+ /* Attempts to receive a packet from 'netdev' into the 'size' bytes in
+ * 'buffer'. If successful, returns the number of bytes in the received
+ * packet, otherwise a negative errno value. Returns -EAGAIN immediately
+ * if no packet is ready to be received. */
+ int (*recv)(struct netdev *netdev, void *buffer, size_t size);
+
+ /* Registers with the poll loop to wake up from the next call to
+ * poll_block() when a packet is ready to be received with netdev_recv() on
+ * 'netdev'. */
+ void (*recv_wait)(struct netdev *netdev);
+
+ /* Discards all packets waiting to be received from 'netdev'. */
+ int (*drain)(struct netdev *netdev);
+
+ /* Sends the 'size'-byte packet in 'buffer' on 'netdev'. Returns 0 if
+ * successful, otherwise a positive errno value. Returns EAGAIN without
+ * blocking if the packet cannot be queued immediately. Returns EMSGSIZE
+ * if a partial packet was transmitted or if the packet is too big or too
+ * small to transmit on the device.
+ *
+ * The caller retains ownership of 'buffer' in all cases.
+ *
+ * The network device is expected to maintain a packet transmission queue,
+ * so that the caller does not ordinarily have to do additional queuing of
+ * packets. */
+ int (*send)(struct netdev *netdev, const void *buffer, size_t size);
+
+ /* Registers with the poll loop to wake up from the next call to
+ * poll_block() when the packet transmission queue for 'netdev' has
+ * sufficient room to transmit a packet with netdev_send().
+ *
+ * The network device is expected to maintain a packet transmission queue,
+ * so that the caller does not ordinarily have to do additional queuing of
+ * packets. Thus, this function is unlikely to ever be useful. */
+ void (*send_wait)(struct netdev *netdev);
+
+ /* Sets 'netdev''s Ethernet address to 'mac' */
+ int (*set_etheraddr)(struct netdev *netdev, const uint8_t mac[6]);
+
+ /* Retrieves 'netdev''s Ethernet address into 'mac'. */
+ int (*get_etheraddr)(const struct netdev *netdev, uint8_t mac[6]);
+
+ /* Retrieves 'netdev''s MTU into '*mtup'.
+ *
+ * The MTU is the maximum size of transmitted (and received) packets, in
+ * bytes, not including the hardware header; thus, this is typically 1500
+ * bytes for Ethernet devices.*/
+ int (*get_mtu)(const struct netdev *, int *mtup);
+
+ /* Sets 'carrier' to true if carrier is active (link light is on) on
+ * 'netdev'. */
+ int (*get_carrier)(const struct netdev *netdev, bool *carrier);
+
+ /* Retrieves current device stats for 'netdev' into 'stats'.
+ *
+ * A network device that supports some statistics but not others, it should
+ * set the values of the unsupported statistics to all-1-bits
+ * (UINT64_MAX). */
+ int (*get_stats)(const struct netdev *netdev, struct netdev_stats *stats);
+
+ /* Stores the features supported by 'netdev' into each of '*current',
+ * '*advertised', '*supported', and '*peer'. Each value is a bitmap of
+ * "enum ofp_port_features" bits, in host byte order. */
+ int (*get_features)(struct netdev *netdev,
+ uint32_t *current, uint32_t *advertised,
+ uint32_t *supported, uint32_t *peer);
+
+ /* Set the features advertised by 'netdev' to 'advertise', which is a
+ * bitmap of "enum ofp_port_features" bits, in host byte order.
+ *
+ * This function may be set to null for a network device that does not
+ * support configuring advertisements. */
+ int (*set_advertisements)(struct netdev *, uint32_t advertise);
+
+ /* If 'netdev' is a VLAN network device (e.g. one created with vconfig(8)),
+ * sets '*vlan_vid' to the VLAN VID associated with that device and returns
+ * 0.
+ *
+ * Returns ENOENT if 'netdev_name' is the name of a network device that is
+ * not a VLAN device.
+ *
+ * This function should be set to null if it doesn't make any sense for
+ * your network device (it probably doesn't). */
+ int (*get_vlan_vid)(const struct netdev *netdev, int *vlan_vid);
+
+ /* Attempts to set input rate limiting (policing) policy, such that up to
+ * 'kbits_rate' kbps of traffic is accepted, with a maximum accumulative
+ * burst size of 'kbits' kb.
+ *
+ * This function may be set to null if policing is not supported. */
+ int (*set_policing)(struct netdev *netdev, unsigned int kbits_rate,
+ unsigned int kbits_burst);
+
+ /* If 'netdev' has an assigned IPv4 address, sets '*address' to that
+ * address and '*netmask' to the associated netmask.
+ *
+ * The following error values have well-defined meanings:
+ *
+ * - EADDRNOTAVAIL: 'netdev' has no assigned IPv4 address.
+ *
+ * - EOPNOTSUPP: No IPv4 network stack attached to 'netdev'.
+ *
+ * This function may be set to null if it would always return EOPNOTSUPP
+ * anyhow. */
+ int (*get_in4)(const struct netdev *netdev, struct in_addr *address,
+ struct in_addr *netmask);
+
+ /* Assigns 'addr' as 'netdev''s IPv4 address and 'mask' as its netmask. If
+ * 'addr' is INADDR_ANY, 'netdev''s IPv4 address is cleared.
+ *
+ * This function may be set to null if it would always return EOPNOTSUPP
+ * anyhow. */
+ int (*set_in4)(struct netdev *, struct in_addr addr, struct in_addr mask);
+
+ /* If 'netdev' has an assigned IPv6 address, sets '*in6' to that address.
+ *
+ * The following error values have well-defined meanings:
+ *
+ * - EADDRNOTAVAIL: 'netdev' has no assigned IPv6 address.
+ *
+ * - EOPNOTSUPP: No IPv6 network stack attached to 'netdev'.
+ *
+ * This function may be set to null if it would always return EOPNOTSUPP
+ * anyhow. */
+ int (*get_in6)(const struct netdev *netdev, struct in6_addr *in6);
+
+ /* Adds 'router' as a default IP gateway for the TCP/IP stack that
+ * corresponds to 'netdev'.
+ *
+ * This function may be set to null if it would always return EOPNOTSUPP
+ * anyhow. */
+ int (*add_router)(struct netdev *netdev, struct in_addr router);
+
+ /* Looks up the next hop for 'host'. If succesful, stores the next hop
+ * gateway's address (0 if 'host' is on a directly connected network) in
+ * '*next_hop' and a copy of the name of the device to reach 'host' in
+ * '*netdev_name', and returns 0. The caller is responsible for freeing
+ * '*netdev_name' (by calling free()).
+ *
+ * This function may be set to null if it would always return EOPNOTSUPP
+ * anyhow. */
+ int (*get_next_hop)(const struct in_addr *host, struct in_addr *next_hop,
+ char **netdev_name);
+
+ /* Looks up the ARP table entry for 'ip' on 'netdev' and stores the
+ * corresponding MAC address in 'mac'. A return value of ENXIO, in
+ * particular, indicates that there is no ARP table entry for 'ip' on
+ * 'netdev'.
+ *
+ * This function may be set to null if it would always return EOPNOTSUPP
+ * anyhow. */
+ int (*arp_lookup)(const struct netdev *, uint32_t ip, uint8_t mac[6]);
+
+ /* Retrieves the current set of flags on 'netdev' into '*old_flags'. Then,
+ * turns off the flags that are set to 1 in 'off' and turns on the flags
+ * that are set to 1 in 'on'. (No bit will be set to 1 in both 'off' and
+ * 'on'; that is, off & on == 0.)
+ *
+ * This function may be invoked from a signal handler. Therefore, it
+ * should not do anything that is not signal-safe (such as logging). */
+ int (*update_flags)(struct netdev *netdev, enum netdev_flags off,
+ enum netdev_flags on, enum netdev_flags *old_flags);
+
+ /* Arranges for 'cb' to be called whenever one of the attributes of
+ * 'netdev' changes and sets '*notifierp' to a newly created
+ * netdev_notifier that represents this arrangement. The created notifier
+ * will have its 'netdev', 'cb', and 'aux' members set to the values of the
+ * corresponding parameters. */
+ int (*poll_add)(struct netdev *netdev,
+ void (*cb)(struct netdev_notifier *), void *aux,
+ struct netdev_notifier **notifierp);
+
+ /* Cancels poll notification for 'notifier'. */
+ void (*poll_remove)(struct netdev_notifier *notifier);
+};
+
+extern const struct netdev_class netdev_linux_class;
+extern const struct netdev_class netdev_tap_class;
+
+#endif /* netdev.h */
#include <assert.h>
#include <errno.h>
-#include <fcntl.h>
-#include <arpa/inet.h>
#include <inttypes.h>
-#include <linux/if_tun.h>
-#include <linux/types.h>
-#include <linux/ethtool.h>
-#include <linux/rtnetlink.h>
-#include <linux/sockios.h>
-#include <linux/version.h>
-#include <sys/types.h>
-#include <sys/ioctl.h>
-#include <sys/socket.h>
-#include <netpacket/packet.h>
-#include <net/ethernet.h>
-#include <net/if.h>
-#include <net/if_arp.h>
-#include <net/if_packet.h>
-#include <net/route.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <string.h>
#include "dynamic-string.h"
#include "fatal-signal.h"
#include "list.h"
-#include "netlink.h"
+#include "netdev-provider.h"
#include "ofpbuf.h"
-#include "openflow/openflow.h"
#include "packets.h"
#include "poll-loop.h"
-#include "socket-util.h"
+#include "shash.h"
#include "svec.h"
-/* linux/if.h defines IFF_LOWER_UP, net/if.h doesn't.
- * net/if.h defines if_nameindex(), linux/if.h doesn't.
- * We can't include both headers, so define IFF_LOWER_UP ourselves. */
-#ifndef IFF_LOWER_UP
-#define IFF_LOWER_UP 0x10000
-#endif
-
-/* These were introduced in Linux 2.6.14, so they might be missing if we have
- * old headers. */
-#ifndef ADVERTISED_Pause
-#define ADVERTISED_Pause (1 << 13)
-#endif
-#ifndef ADVERTISED_Asym_Pause
-#define ADVERTISED_Asym_Pause (1 << 14)
-#endif
-
#define THIS_MODULE VLM_netdev
#include "vlog.h"
-struct netdev {
- struct list node;
- char *name;
-
- /* File descriptors. For ordinary network devices, the two fds below are
- * the same; for tap devices, they differ. */
- int netdev_fd; /* Network device. */
- int tap_fd; /* TAP character device, if any, otherwise the
- * network device. */
-
- /* Cached network device information. */
- int ifindex; /* -1 if not known. */
- uint8_t etheraddr[ETH_ADDR_LEN];
- struct in6_addr in6;
- int speed;
- int mtu;
- int txqlen;
- int hwaddr_family;
-
- int save_flags; /* Initial device flags. */
- int changed_flags; /* Flags that we changed. */
-};
-
-/* Policy for RTNLGRP_LINK messages.
- *
- * There are *many* more fields in these messages, but currently we only care
- * about interface names. */
-static const struct nl_policy rtnlgrp_link_policy[] = {
- [IFLA_IFNAME] = { .type = NL_A_STRING, .optional = false },
- [IFLA_STATS] = { .type = NL_A_UNSPEC, .optional = true,
- .min_len = sizeof(struct rtnl_link_stats) },
+static const struct netdev_class *netdev_classes[] = {
+ &netdev_linux_class,
+ &netdev_tap_class,
};
+static int n_netdev_classes = ARRAY_SIZE(netdev_classes);
/* All open network devices. */
static struct list netdev_list = LIST_INITIALIZER(&netdev_list);
-/* An AF_INET socket (used for ioctl operations). */
-static int af_inet_sock = -1;
-
-/* NETLINK_ROUTE socket. */
-static struct nl_sock *rtnl_sock;
-
-/* Can we use RTM_GETLINK to get network device statistics? (In pre-2.6.19
- * kernels, this was only available if wireless extensions were enabled.) */
-static bool use_netlink_stats;
-
/* This is set pretty low because we probably won't learn anything from the
* additional log messages. */
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
-static void init_netdev(void);
-static int do_open_netdev(const char *name, int ethertype, int tap_fd,
- struct netdev **netdev_);
+static void restore_all_flags(void *aux);
static int restore_flags(struct netdev *netdev);
-static int get_flags(const char *netdev_name, int *flagsp);
-static int set_flags(const char *netdev_name, int flags);
-static int do_get_ifindex(const char *netdev_name);
-static int get_ifindex(const struct netdev *, int *ifindexp);
-static int get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN],
- int *hwaddr_familyp);
-static int set_etheraddr(const char *netdev_name, int hwaddr_family,
- const uint8_t[ETH_ADDR_LEN]);
-
-/* Obtains the IPv6 address for 'name' into 'in6'. */
-static void
-get_ipv6_address(const char *name, struct in6_addr *in6)
+
+/* Attempts to initialize the netdev module. Returns 0 if successful,
+ * otherwise a positive errno value.
+ *
+ * Calling this function is optional. If not called explicitly, it will
+ * automatically be called upon the first attempt to open a network device. */
+int
+netdev_initialize(void)
{
- FILE *file;
- char line[128];
-
- file = fopen("/proc/net/if_inet6", "r");
- if (file == NULL) {
- /* This most likely indicates that the host doesn't have IPv6 support,
- * so it's not really a failure condition.*/
- *in6 = in6addr_any;
- return;
- }
+ static int status = -1;
+ if (status < 0) {
+ int i, j;
+
+ fatal_signal_add_hook(restore_all_flags, NULL, true);
- while (fgets(line, sizeof line, file)) {
- uint8_t *s6 = in6->s6_addr;
- char ifname[16 + 1];
-
-#define X8 "%2"SCNx8
- if (sscanf(line, " "X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8
- "%*x %*x %*x %*x %16s\n",
- &s6[0], &s6[1], &s6[2], &s6[3],
- &s6[4], &s6[5], &s6[6], &s6[7],
- &s6[8], &s6[9], &s6[10], &s6[11],
- &s6[12], &s6[13], &s6[14], &s6[15],
- ifname) == 17
- && !strcmp(name, ifname))
- {
- fclose(file);
- return;
+ status = 0;
+ for (i = j = 0; i < n_netdev_classes; i++) {
+ const struct netdev_class *class = netdev_classes[i];
+ if (class->init) {
+ int retval = class->init();
+ if (!retval) {
+ netdev_classes[j++] = class;
+ } else {
+ VLOG_ERR("failed to initialize %s network device "
+ "class: %s", class->name, strerror(retval));
+ if (!status) {
+ status = retval;
+ }
+ }
+ } else {
+ netdev_classes[j++] = class;
+ }
}
+ n_netdev_classes = j;
}
- *in6 = in6addr_any;
-
- fclose(file);
+ return status;
}
-static int
-do_ethtool(struct netdev *netdev, struct ethtool_cmd *ecmd,
- int cmd, const char *cmd_name)
+/* Performs periodic work needed by all the various kinds of netdevs.
+ *
+ * If your program opens any netdevs, it must call this function within its
+ * main poll loop. */
+void
+netdev_run(void)
{
- struct ifreq ifr;
-
- memset(&ifr, 0, sizeof ifr);
- strncpy(ifr.ifr_name, netdev->name, sizeof ifr.ifr_name);
- ifr.ifr_data = (caddr_t) ecmd;
-
- ecmd->cmd = cmd;
- COVERAGE_INC(netdev_ethtool);
- if (ioctl(netdev->netdev_fd, SIOCETHTOOL, &ifr) == 0) {
- return 0;
- } else {
- if (errno != EOPNOTSUPP) {
- VLOG_WARN_RL(&rl, "ethtool command %s on network device %s "
- "failed: %s", cmd_name, netdev->name,
- strerror(errno));
- } else {
- /* The device doesn't support this operation. That's pretty
- * common, so there's no point in logging anything. */
+ int i;
+ for (i = 0; i < n_netdev_classes; i++) {
+ const struct netdev_class *class = netdev_classes[i];
+ if (class->run) {
+ class->run();
}
- return errno;
}
}
-static int
-do_get_features(struct netdev *netdev,
- uint32_t *current, uint32_t *advertised,
- uint32_t *supported, uint32_t *peer)
+/* Arranges for poll_block() to wake up when netdev_run() needs to be called.
+ *
+ * If your program opens any netdevs, it must call this function within its
+ * main poll loop. */
+void
+netdev_wait(void)
{
- struct ethtool_cmd ecmd;
- int error;
-
- *current = 0;
- *supported = 0;
- *advertised = 0;
- *peer = 0;
-
- memset(&ecmd, 0, sizeof ecmd);
- error = do_ethtool(netdev, &ecmd, ETHTOOL_GSET, "ETHTOOL_GSET");
- if (error) {
- return error;
- }
-
- if (ecmd.supported & SUPPORTED_10baseT_Half) {
- *supported |= OFPPF_10MB_HD;
- }
- if (ecmd.supported & SUPPORTED_10baseT_Full) {
- *supported |= OFPPF_10MB_FD;
- }
- if (ecmd.supported & SUPPORTED_100baseT_Half) {
- *supported |= OFPPF_100MB_HD;
- }
- if (ecmd.supported & SUPPORTED_100baseT_Full) {
- *supported |= OFPPF_100MB_FD;
- }
- if (ecmd.supported & SUPPORTED_1000baseT_Half) {
- *supported |= OFPPF_1GB_HD;
- }
- if (ecmd.supported & SUPPORTED_1000baseT_Full) {
- *supported |= OFPPF_1GB_FD;
- }
- if (ecmd.supported & SUPPORTED_10000baseT_Full) {
- *supported |= OFPPF_10GB_FD;
- }
- if (ecmd.supported & SUPPORTED_TP) {
- *supported |= OFPPF_COPPER;
- }
- if (ecmd.supported & SUPPORTED_FIBRE) {
- *supported |= OFPPF_FIBER;
- }
- if (ecmd.supported & SUPPORTED_Autoneg) {
- *supported |= OFPPF_AUTONEG;
- }
- if (ecmd.supported & SUPPORTED_Pause) {
- *supported |= OFPPF_PAUSE;
- }
- if (ecmd.supported & SUPPORTED_Asym_Pause) {
- *supported |= OFPPF_PAUSE_ASYM;
- }
-
- /* Set the advertised features */
- if (ecmd.advertising & ADVERTISED_10baseT_Half) {
- *advertised |= OFPPF_10MB_HD;
- }
- if (ecmd.advertising & ADVERTISED_10baseT_Full) {
- *advertised |= OFPPF_10MB_FD;
- }
- if (ecmd.advertising & ADVERTISED_100baseT_Half) {
- *advertised |= OFPPF_100MB_HD;
- }
- if (ecmd.advertising & ADVERTISED_100baseT_Full) {
- *advertised |= OFPPF_100MB_FD;
- }
- if (ecmd.advertising & ADVERTISED_1000baseT_Half) {
- *advertised |= OFPPF_1GB_HD;
- }
- if (ecmd.advertising & ADVERTISED_1000baseT_Full) {
- *advertised |= OFPPF_1GB_FD;
- }
- if (ecmd.advertising & ADVERTISED_10000baseT_Full) {
- *advertised |= OFPPF_10GB_FD;
- }
- if (ecmd.advertising & ADVERTISED_TP) {
- *advertised |= OFPPF_COPPER;
- }
- if (ecmd.advertising & ADVERTISED_FIBRE) {
- *advertised |= OFPPF_FIBER;
- }
- if (ecmd.advertising & ADVERTISED_Autoneg) {
- *advertised |= OFPPF_AUTONEG;
- }
- if (ecmd.advertising & ADVERTISED_Pause) {
- *advertised |= OFPPF_PAUSE;
- }
- if (ecmd.advertising & ADVERTISED_Asym_Pause) {
- *advertised |= OFPPF_PAUSE_ASYM;
- }
-
- /* Set the current features */
- if (ecmd.speed == SPEED_10) {
- *current = (ecmd.duplex) ? OFPPF_10MB_FD : OFPPF_10MB_HD;
- }
- else if (ecmd.speed == SPEED_100) {
- *current = (ecmd.duplex) ? OFPPF_100MB_FD : OFPPF_100MB_HD;
- }
- else if (ecmd.speed == SPEED_1000) {
- *current = (ecmd.duplex) ? OFPPF_1GB_FD : OFPPF_1GB_HD;
- }
- else if (ecmd.speed == SPEED_10000) {
- *current = OFPPF_10GB_FD;
- }
-
- if (ecmd.port == PORT_TP) {
- *current |= OFPPF_COPPER;
- }
- else if (ecmd.port == PORT_FIBRE) {
- *current |= OFPPF_FIBER;
- }
-
- if (ecmd.autoneg) {
- *current |= OFPPF_AUTONEG;
+ int i;
+ for (i = 0; i < n_netdev_classes; i++) {
+ const struct netdev_class *class = netdev_classes[i];
+ if (class->wait) {
+ class->wait();
+ }
}
- return 0;
}
/* Opens the network device named 'name' (e.g. "eth0") and returns zero if
* the 'enum netdev_pseudo_ethertype' values to receive frames in one of those
* categories. */
int
-netdev_open(const char *name, int ethertype, struct netdev **netdevp)
-{
- if (!strncmp(name, "tap:", 4)) {
- return netdev_open_tap(name + 4, netdevp);
- } else {
- return do_open_netdev(name, ethertype, -1, netdevp);
- }
-}
-
-/* Opens a TAP virtual network device. If 'name' is a nonnull, non-empty
- * string, attempts to assign that name to the TAP device (failing if the name
- * is already in use); otherwise, a name is automatically assigned. Returns
- * zero if successful, otherwise a positive errno value. On success, sets
- * '*netdevp' to the new network device, otherwise to null. */
-int
-netdev_open_tap(const char *name, struct netdev **netdevp)
-{
- static const char tap_dev[] = "/dev/net/tun";
- struct ifreq ifr;
- int error;
- int tap_fd;
-
- tap_fd = open(tap_dev, O_RDWR);
- if (tap_fd < 0) {
- ovs_error(errno, "opening \"%s\" failed", tap_dev);
- return errno;
- }
-
- memset(&ifr, 0, sizeof ifr);
- ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
- if (name) {
- strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name);
- }
- if (ioctl(tap_fd, TUNSETIFF, &ifr) < 0) {
- int error = errno;
- ovs_error(error, "ioctl(TUNSETIFF) on \"%s\" failed", tap_dev);
- close(tap_fd);
- return error;
- }
-
- error = set_nonblocking(tap_fd);
- if (error) {
- ovs_error(error, "set_nonblocking on \"%s\" failed", tap_dev);
- close(tap_fd);
- return error;
- }
-
- error = do_open_netdev(ifr.ifr_name, NETDEV_ETH_TYPE_NONE, tap_fd,
- netdevp);
- if (error) {
- close(tap_fd);
- }
- return error;
-}
-
-static int
-do_open_netdev(const char *name, int ethertype, int tap_fd,
- struct netdev **netdev_)
+netdev_open(const char *name_, int ethertype, struct netdev **netdevp)
{
- int netdev_fd;
- struct sockaddr_ll sll;
- struct ifreq ifr;
- int ifindex = -1;
- uint8_t etheraddr[ETH_ADDR_LEN];
- struct in6_addr in6;
- int mtu;
- int txqlen;
- int hwaddr_family;
+ char *name = xstrdup(name_);
+ char *prefix, *suffix, *colon;
+ struct netdev *netdev = NULL;
int error;
- struct netdev *netdev;
+ int i;
- init_netdev();
- *netdev_ = NULL;
- COVERAGE_INC(netdev_open);
-
- /* Create raw socket. */
- netdev_fd = socket(PF_PACKET, SOCK_RAW,
- htons(ethertype == NETDEV_ETH_TYPE_NONE ? 0
- : ethertype == NETDEV_ETH_TYPE_ANY ? ETH_P_ALL
- : ethertype == NETDEV_ETH_TYPE_802_2 ? ETH_P_802_2
- : ethertype));
- if (netdev_fd < 0) {
- return errno;
+ netdev_initialize();
+ colon = strchr(name, ':');
+ if (colon) {
+ *colon = '\0';
+ prefix = name;
+ suffix = colon + 1;
+ } else {
+ prefix = "";
+ suffix = name;
}
- if (ethertype != NETDEV_ETH_TYPE_NONE) {
- /* Set non-blocking mode. */
- error = set_nonblocking(netdev_fd);
- if (error) {
- goto error_already_set;
- }
-
- /* Get ethernet device index. */
- ifindex = do_get_ifindex(name);
- if (ifindex < 0) {
- return -ifindex;
- }
-
- /* Bind to specific ethernet device. */
- memset(&sll, 0, sizeof sll);
- sll.sll_family = AF_PACKET;
- sll.sll_ifindex = ifindex;
- if (bind(netdev_fd, (struct sockaddr *) &sll, sizeof sll) < 0) {
- VLOG_ERR("bind to %s failed: %s", name, strerror(errno));
- goto error;
- }
-
- /* Between the socket() and bind() calls above, the socket receives all
- * packets of the requested type on all system interfaces. We do not
- * want to receive that data, but there is no way to avoid it. So we
- * must now drain out the receive queue. */
- error = drain_rcvbuf(netdev_fd);
- if (error) {
- goto error_already_set;
+ for (i = 0; i < n_netdev_classes; i++) {
+ const struct netdev_class *class = netdev_classes[i];
+ if (!strcmp(prefix, class->prefix)) {
+ error = class->open(name_, suffix, ethertype, &netdev);
+ goto exit;
}
}
+ error = EAFNOSUPPORT;
- /* Get MAC address. */
- error = get_etheraddr(name, etheraddr, &hwaddr_family);
- if (error) {
- goto error_already_set;
- }
-
- /* Get MTU. */
- strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name);
- if (ioctl(netdev_fd, SIOCGIFMTU, &ifr) < 0) {
- VLOG_ERR("ioctl(SIOCGIFMTU) on %s device failed: %s",
- name, strerror(errno));
- goto error;
- }
- mtu = ifr.ifr_mtu;
-
- /* Get TX queue length. */
- if (ioctl(netdev_fd, SIOCGIFTXQLEN, &ifr) < 0) {
- VLOG_ERR("ioctl(SIOCGIFTXQLEN) on %s device failed: %s",
- name, strerror(errno));
- goto error;
- }
- txqlen = ifr.ifr_qlen;
-
- get_ipv6_address(name, &in6);
-
- /* Allocate network device. */
- netdev = xmalloc(sizeof *netdev);
- netdev->name = xstrdup(name);
- netdev->ifindex = ifindex;
- netdev->txqlen = txqlen;
- netdev->hwaddr_family = hwaddr_family;
- netdev->netdev_fd = netdev_fd;
- netdev->tap_fd = tap_fd < 0 ? netdev_fd : tap_fd;
- memcpy(netdev->etheraddr, etheraddr, sizeof etheraddr);
- netdev->mtu = mtu;
- netdev->in6 = in6;
-
- /* Save flags to restore at close or exit. */
- error = get_flags(netdev->name, &netdev->save_flags);
- if (error) {
- goto error_already_set;
- }
- netdev->changed_flags = 0;
- fatal_signal_block();
- list_push_back(&netdev_list, &netdev->node);
- fatal_signal_unblock();
-
- /* Success! */
- *netdev_ = netdev;
- return 0;
-
-error:
- error = errno;
-error_already_set:
- close(netdev_fd);
- if (tap_fd >= 0) {
- close(tap_fd);
- }
+exit:
+ *netdevp = error ? NULL : netdev;
+ free(name);
return error;
}
netdev_close(struct netdev *netdev)
{
if (netdev) {
- /* Bring down interface and drop promiscuous mode, if we brought up
- * the interface or enabled promiscuous mode. */
+ char *name;
int error;
+
+ /* Restore flags that we changed, if any. */
fatal_signal_block();
error = restore_flags(netdev);
list_remove(&netdev->node);
}
/* Free. */
- free(netdev->name);
- close(netdev->netdev_fd);
- if (netdev->netdev_fd != netdev->tap_fd) {
- close(netdev->tap_fd);
+ name = netdev->name;
+ netdev->class->close(netdev);
+ free(name);
+ }
+}
+
+/* Returns true if a network device named 'name' exists and may be opened,
+ * otherwise false. */
+bool
+netdev_exists(const char *name)
+{
+ struct netdev *netdev;
+ int error;
+
+ error = netdev_open(name, NETDEV_ETH_TYPE_NONE, &netdev);
+ if (!error) {
+ netdev_close(netdev);
+ return true;
+ } else {
+ if (error != ENODEV) {
+ VLOG_WARN("failed to open network device %s: %s",
+ name, strerror(error));
}
- free(netdev);
+ return false;
}
}
-/* Pads 'buffer' out with zero-bytes to the minimum valid length of an
- * Ethernet packet, if necessary. */
-static void
-pad_to_minimum_length(struct ofpbuf *buffer)
+/* Initializes 'svec' with a list of the names of all known network devices. */
+int
+netdev_enumerate(struct svec *svec)
{
- if (buffer->size < ETH_TOTAL_MIN) {
- ofpbuf_put_zeros(buffer, ETH_TOTAL_MIN - buffer->size);
+ int error;
+ int i;
+
+ svec_init(svec);
+
+ netdev_initialize();
+
+ error = 0;
+ for (i = 0; i < n_netdev_classes; i++) {
+ const struct netdev_class *class = netdev_classes[i];
+ if (class->enumerate) {
+ int retval = class->enumerate(svec);
+ if (retval) {
+ VLOG_WARN("failed to enumerate %s network devices: %s",
+ class->name, strerror(retval));
+ if (!error) {
+ error = retval;
+ }
+ }
+ }
}
+ return error;
}
/* Attempts to receive a packet from 'netdev' into 'buffer', which the caller
int
netdev_recv(struct netdev *netdev, struct ofpbuf *buffer)
{
- ssize_t n_bytes;
+ int retval;
assert(buffer->size == 0);
assert(ofpbuf_tailroom(buffer) >= ETH_TOTAL_MIN);
- do {
- n_bytes = read(netdev->tap_fd,
- ofpbuf_tail(buffer), ofpbuf_tailroom(buffer));
- } while (n_bytes < 0 && errno == EINTR);
- if (n_bytes < 0) {
- if (errno != EAGAIN) {
- VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s",
- netdev->name, strerror(errno));
- }
- return errno;
- } else {
+
+ retval = netdev->class->recv(netdev,
+ buffer->data, ofpbuf_tailroom(buffer));
+ if (retval >= 0) {
COVERAGE_INC(netdev_received);
- buffer->size += n_bytes;
-
- /* When the kernel internally sends out an Ethernet frame on an
- * interface, it gives us a copy *before* padding the frame to the
- * minimum length. Thus, when it sends out something like an ARP
- * request, we see a too-short frame. So pad it out to the minimum
- * length. */
- pad_to_minimum_length(buffer);
+ buffer->size += retval;
+ if (buffer->size < ETH_TOTAL_MIN) {
+ ofpbuf_put_zeros(buffer, ETH_TOTAL_MIN - buffer->size);
+ }
return 0;
+ } else {
+ return -retval;
}
}
void
netdev_recv_wait(struct netdev *netdev)
{
- poll_fd_wait(netdev->tap_fd, POLLIN);
+ netdev->class->recv_wait(netdev);
}
/* Discards all packets waiting to be received from 'netdev'. */
int
netdev_drain(struct netdev *netdev)
{
- if (netdev->tap_fd != netdev->netdev_fd) {
- drain_fd(netdev->tap_fd, netdev->txqlen);
- return 0;
- } else {
- return drain_rcvbuf(netdev->netdev_fd);
- }
+ return netdev->class->drain(netdev);
}
/* Sends 'buffer' on 'netdev'. Returns 0 if successful, otherwise a positive
int
netdev_send(struct netdev *netdev, const struct ofpbuf *buffer)
{
- ssize_t n_bytes;
-
- do {
- n_bytes = write(netdev->tap_fd, buffer->data, buffer->size);
- } while (n_bytes < 0 && errno == EINTR);
-
- if (n_bytes < 0) {
- /* The Linux AF_PACKET implementation never blocks waiting for room
- * for packets, instead returning ENOBUFS. Translate this into EAGAIN
- * for the caller. */
- if (errno == ENOBUFS) {
- return EAGAIN;
- } else if (errno != EAGAIN) {
- VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: %s",
- netdev->name, strerror(errno));
- }
- return errno;
- } else if (n_bytes != buffer->size) {
- VLOG_WARN_RL(&rl,
- "send partial Ethernet packet (%d bytes of %zu) on %s",
- (int) n_bytes, buffer->size, netdev->name);
- return EMSGSIZE;
- } else {
+ int error = netdev->class->send(netdev, buffer->data, buffer->size);
+ if (!error) {
COVERAGE_INC(netdev_sent);
- return 0;
}
+ return error;
}
/* Registers with the poll loop to wake up from the next call to poll_block()
void
netdev_send_wait(struct netdev *netdev)
{
- if (netdev->tap_fd == netdev->netdev_fd) {
- poll_fd_wait(netdev->tap_fd, POLLOUT);
- } else {
- /* TAP device always accepts packets.*/
- poll_immediate_wake();
- }
+ return netdev->class->send_wait(netdev);
}
/* Attempts to set 'netdev''s MAC address to 'mac'. Returns 0 if successful,
int
netdev_set_etheraddr(struct netdev *netdev, const uint8_t mac[ETH_ADDR_LEN])
{
- int error = set_etheraddr(netdev->name, netdev->hwaddr_family, mac);
- if (!error) {
- memcpy(netdev->etheraddr, mac, ETH_ADDR_LEN);
- }
- return error;
+ return netdev->class->set_etheraddr(netdev, mac);
}
+/* Retrieves 'netdev''s MAC address. If successful, returns 0 and copies the
+ * the MAC address into 'mac'. On failure, returns a positive errno value and
+ * clears 'mac' to all-zeros. */
int
-netdev_nodev_set_etheraddr(const char *name, const uint8_t mac[ETH_ADDR_LEN])
+netdev_get_etheraddr(const struct netdev *netdev, uint8_t mac[ETH_ADDR_LEN])
{
- init_netdev();
- return set_etheraddr(name, ARPHRD_ETHER, mac);
-}
-
-/* Returns a pointer to 'netdev''s MAC address. The caller must not modify or
- * free the returned buffer. */
-const uint8_t *
-netdev_get_etheraddr(const struct netdev *netdev)
-{
- return netdev->etheraddr;
+ return netdev->class->get_etheraddr(netdev, mac);
}
/* Returns the name of the network device that 'netdev' represents,
return netdev->name;
}
-/* Returns the maximum size of transmitted (and received) packets on 'netdev',
- * in bytes, not including the hardware header; thus, this is typically 1500
- * bytes for Ethernet devices. */
+/* Retrieves the MTU of 'netdev'. The MTU is the maximum size of transmitted
+ * (and received) packets, in bytes, not including the hardware header; thus,
+ * this is typically 1500 bytes for Ethernet devices.
+ *
+ * If successful, returns 0 and stores the MTU size in '*mtup'. On failure,
+ * returns a positive errno value and stores ETH_PAYLOAD_MAX (1500) in
+ * '*mtup'. */
int
-netdev_get_mtu(const struct netdev *netdev)
+netdev_get_mtu(const struct netdev *netdev, int *mtup)
{
- return netdev->mtu;
+ int error = netdev->class->get_mtu(netdev, mtup);
+ if (error) {
+ VLOG_WARN_RL(&rl, "failed to retrieve MTU for network device %s: %s",
+ netdev_get_name(netdev), strerror(error));
+ *mtup = ETH_PAYLOAD_MAX;
+ }
+ return error;
}
/* Stores the features supported by 'netdev' into each of '*current',
uint32_t *supported, uint32_t *peer)
{
uint32_t dummy[4];
- return do_get_features(netdev,
- current ? current : &dummy[0],
- advertised ? advertised : &dummy[1],
- supported ? supported : &dummy[2],
- peer ? peer : &dummy[3]);
+ return netdev->class->get_features(netdev,
+ current ? current : &dummy[0],
+ advertised ? advertised : &dummy[1],
+ supported ? supported : &dummy[2],
+ peer ? peer : &dummy[3]);
}
+/* Set the features advertised by 'netdev' to 'advertise'. Returns 0 if
+ * successful, otherwise a positive errno value. */
int
netdev_set_advertisements(struct netdev *netdev, uint32_t advertise)
{
- struct ethtool_cmd ecmd;
- int error;
-
- memset(&ecmd, 0, sizeof ecmd);
- error = do_ethtool(netdev, &ecmd, ETHTOOL_GSET, "ETHTOOL_GSET");
- if (error) {
- return error;
- }
-
- ecmd.advertising = 0;
- if (advertise & OFPPF_10MB_HD) {
- ecmd.advertising |= ADVERTISED_10baseT_Half;
- }
- if (advertise & OFPPF_10MB_FD) {
- ecmd.advertising |= ADVERTISED_10baseT_Full;
- }
- if (advertise & OFPPF_100MB_HD) {
- ecmd.advertising |= ADVERTISED_100baseT_Half;
- }
- if (advertise & OFPPF_100MB_FD) {
- ecmd.advertising |= ADVERTISED_100baseT_Full;
- }
- if (advertise & OFPPF_1GB_HD) {
- ecmd.advertising |= ADVERTISED_1000baseT_Half;
- }
- if (advertise & OFPPF_1GB_FD) {
- ecmd.advertising |= ADVERTISED_1000baseT_Full;
- }
- if (advertise & OFPPF_10GB_FD) {
- ecmd.advertising |= ADVERTISED_10000baseT_Full;
- }
- if (advertise & OFPPF_COPPER) {
- ecmd.advertising |= ADVERTISED_TP;
- }
- if (advertise & OFPPF_FIBER) {
- ecmd.advertising |= ADVERTISED_FIBRE;
- }
- if (advertise & OFPPF_AUTONEG) {
- ecmd.advertising |= ADVERTISED_Autoneg;
- }
- if (advertise & OFPPF_PAUSE) {
- ecmd.advertising |= ADVERTISED_Pause;
- }
- if (advertise & OFPPF_PAUSE_ASYM) {
- ecmd.advertising |= ADVERTISED_Asym_Pause;
- }
- return do_ethtool(netdev, &ecmd, ETHTOOL_SSET, "ETHTOOL_SSET");
-}
-
-/* If 'netdev' has an assigned IPv4 address, sets '*in4' to that address
- * and '*mask' to the netmask (if they are non-null) and returns true.
- * Otherwise, returns false. */
-bool
-netdev_nodev_get_in4(const char *netdev_name, struct in_addr *in4,
- struct in_addr *mask)
-{
- struct ifreq ifr;
- struct in_addr ip = { INADDR_ANY };
-
- init_netdev();
-
- strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
- ifr.ifr_addr.sa_family = AF_INET;
- COVERAGE_INC(netdev_get_in4);
- if (ioctl(af_inet_sock, SIOCGIFADDR, &ifr) == 0) {
- struct sockaddr_in *sin = (struct sockaddr_in *) &ifr.ifr_addr;
- ip = sin->sin_addr;
- } else {
- VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFADDR) failed: %s",
- netdev_name, strerror(errno));
- }
- if (in4) {
- *in4 = ip;
- }
-
- if (mask) {
- if (ioctl(af_inet_sock, SIOCGIFNETMASK, &ifr) == 0) {
- struct sockaddr_in *sin = (struct sockaddr_in *) &ifr.ifr_addr;
- *mask = sin->sin_addr;
- } else {
- VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFNETMASK) failed: %s",
- netdev_name, strerror(errno));
- }
- }
-
- return ip.s_addr != INADDR_ANY;
+ return (netdev->class->set_advertisements
+ ? netdev->class->set_advertisements(netdev, advertise)
+ : EOPNOTSUPP);
}
-bool
-netdev_get_in4(const struct netdev *netdev, struct in_addr *in4, struct
- in_addr *mask)
-{
- return netdev_nodev_get_in4(netdev->name, in4, mask);
-}
-
-static void
-make_in4_sockaddr(struct sockaddr *sa, struct in_addr addr)
-{
- struct sockaddr_in sin;
- memset(&sin, 0, sizeof sin);
- sin.sin_family = AF_INET;
- sin.sin_addr = addr;
- sin.sin_port = 0;
-
- memset(sa, 0, sizeof *sa);
- memcpy(sa, &sin, sizeof sin);
-}
-
-static int
-do_set_addr(struct netdev *netdev, int sock,
- int ioctl_nr, const char *ioctl_name, struct in_addr addr)
+/* If 'netdev' has an assigned IPv4 address, sets '*address' to that address
+ * and '*netmask' to its netmask and returns 0. Otherwise, returns a positive
+ * errno value and sets '*address' to 0 (INADDR_ANY).
+ *
+ * The following error values have well-defined meanings:
+ *
+ * - EADDRNOTAVAIL: 'netdev' has no assigned IPv4 address.
+ *
+ * - EOPNOTSUPP: No IPv4 network stack attached to 'netdev'.
+ *
+ * 'address' or 'netmask' or both may be null, in which case the address or netmask
+ * is not reported. */
+int
+netdev_get_in4(const struct netdev *netdev,
+ struct in_addr *address_, struct in_addr *netmask_)
{
- struct ifreq ifr;
+ struct in_addr address;
+ struct in_addr netmask;
int error;
- strncpy(ifr.ifr_name, netdev->name, sizeof ifr.ifr_name);
- make_in4_sockaddr(&ifr.ifr_addr, addr);
- COVERAGE_INC(netdev_set_in4);
- error = ioctl(sock, ioctl_nr, &ifr) < 0 ? errno : 0;
- if (error) {
- VLOG_WARN("ioctl(%s): %s", ioctl_name, strerror(error));
+ error = (netdev->class->get_in4
+ ? netdev->class->get_in4(netdev, &address, &netmask)
+ : EOPNOTSUPP);
+ if (address_) {
+ address_->s_addr = error ? 0 : address.s_addr;
+ }
+ if (netmask_) {
+ netmask_->s_addr = error ? 0 : netmask.s_addr;
}
return error;
}
int
netdev_set_in4(struct netdev *netdev, struct in_addr addr, struct in_addr mask)
{
- int error;
-
- error = do_set_addr(netdev, af_inet_sock,
- SIOCSIFADDR, "SIOCSIFADDR", addr);
- if (!error && addr.s_addr != INADDR_ANY) {
- error = do_set_addr(netdev, af_inet_sock,
- SIOCSIFNETMASK, "SIOCSIFNETMASK", mask);
- }
- return error;
+ return (netdev->class->set_in4
+ ? netdev->class->set_in4(netdev, addr, mask)
+ : EOPNOTSUPP);
}
-/* Adds 'router' as a default IP gateway. */
+/* Adds 'router' as a default IP gateway for the TCP/IP stack that corresponds
+ * to 'netdev'. */
int
-netdev_add_router(struct in_addr router)
+netdev_add_router(struct netdev *netdev, struct in_addr router)
{
- struct in_addr any = { INADDR_ANY };
- struct rtentry rt;
- int error;
-
- memset(&rt, 0, sizeof rt);
- make_in4_sockaddr(&rt.rt_dst, any);
- make_in4_sockaddr(&rt.rt_gateway, router);
- make_in4_sockaddr(&rt.rt_genmask, any);
- rt.rt_flags = RTF_UP | RTF_GATEWAY;
COVERAGE_INC(netdev_add_router);
- error = ioctl(af_inet_sock, SIOCADDRT, &rt) < 0 ? errno : 0;
+ return (netdev->class->add_router
+ ? netdev->class->add_router(netdev, router)
+ : EOPNOTSUPP);
+}
+
+/* Looks up the next hop for 'host' for the TCP/IP stack that corresponds to
+ * 'netdev'. If a route cannot not be determined, sets '*next_hop' to 0,
+ * '*netdev_name' to null, and returns a positive errno value. Otherwise, if a
+ * next hop is found, stores the next hop gateway's address (0 if 'host' is on
+ * a directly connected network) in '*next_hop' and a copy of the name of the
+ * device to reach 'host' in '*netdev_name', and returns 0. The caller is
+ * responsible for freeing '*netdev_name' (by calling free()). */
+int
+netdev_get_next_hop(const struct netdev *netdev,
+ const struct in_addr *host, struct in_addr *next_hop,
+ char **netdev_name)
+{
+ int error = (netdev->class->get_next_hop
+ ? netdev->class->get_next_hop(host, next_hop, netdev_name)
+ : EOPNOTSUPP);
if (error) {
- VLOG_WARN("ioctl(SIOCADDRT): %s", strerror(error));
+ next_hop->s_addr = 0;
+ *netdev_name = NULL;
}
return error;
}
-/* If 'netdev' has an assigned IPv6 address, sets '*in6' to that address (if
- * 'in6' is non-null) and returns true. Otherwise, returns false. */
-bool
-netdev_get_in6(const struct netdev *netdev, struct in6_addr *in6)
-{
- if (in6) {
- *in6 = netdev->in6;
- }
- return memcmp(&netdev->in6, &in6addr_any, sizeof netdev->in6) != 0;
-}
-
-/* Obtains the current flags for 'netdev' and stores them into '*flagsp'.
- * Returns 0 if successful, otherwise a positive errno value. On failure,
- * stores 0 into '*flagsp'. */
+/* If 'netdev' has an assigned IPv6 address, sets '*in6' to that address and
+ * returns 0. Otherwise, returns a positive errno value and sets '*in6' to
+ * all-zero-bits (in6addr_any).
+ *
+ * The following error values have well-defined meanings:
+ *
+ * - EADDRNOTAVAIL: 'netdev' has no assigned IPv6 address.
+ *
+ * - EOPNOTSUPP: No IPv6 network stack attached to 'netdev'.
+ *
+ * 'in6' may be null, in which case the address itself is not reported. */
int
-netdev_get_flags(const struct netdev *netdev, enum netdev_flags *flagsp)
+netdev_get_in6(const struct netdev *netdev, struct in6_addr *in6)
{
- return netdev_nodev_get_flags(netdev->name, flagsp);
-}
+ struct in6_addr dummy;
+ int error;
-static int
-nd_to_iff_flags(enum netdev_flags nd)
-{
- int iff = 0;
- if (nd & NETDEV_UP) {
- iff |= IFF_UP;
+ error = (netdev->class->get_in6
+ ? netdev->class->get_in6(netdev, in6 ? in6 : &dummy)
+ : EOPNOTSUPP);
+ if (error && in6) {
+ memset(in6, 0, sizeof *in6);
}
- if (nd & NETDEV_PROMISC) {
- iff |= IFF_PROMISC;
- }
- return iff;
+ return error;
}
/* On 'netdev', turns off the flags in 'off' and then turns on the flags in
* successful, otherwise a positive errno value. */
static int
do_update_flags(struct netdev *netdev, enum netdev_flags off,
- enum netdev_flags on, bool permanent)
+ enum netdev_flags on, enum netdev_flags *old_flagsp,
+ bool permanent)
{
- int old_flags, new_flags;
+ enum netdev_flags old_flags;
int error;
- error = get_flags(netdev->name, &old_flags);
+ error = netdev->class->update_flags(netdev, off & ~on, on, &old_flags);
if (error) {
- return error;
- }
-
- new_flags = (old_flags & ~nd_to_iff_flags(off)) | nd_to_iff_flags(on);
- if (!permanent) {
- netdev->changed_flags |= new_flags ^ old_flags;
+ VLOG_WARN_RL(&rl, "failed to %s flags for network device %s: %s",
+ off || on ? "set" : "get", netdev_get_name(netdev),
+ strerror(error));
+ old_flags = 0;
+ } else if ((off || on) && !permanent) {
+ enum netdev_flags new_flags = (old_flags & ~off) | on;
+ enum netdev_flags changed_flags = old_flags ^ new_flags;
+ if (changed_flags) {
+ if (!netdev->changed_flags) {
+ netdev->save_flags = old_flags;
+ }
+ netdev->changed_flags |= changed_flags;
+ }
}
- if (new_flags != old_flags) {
- error = set_flags(netdev->name, new_flags);
+ if (old_flagsp) {
+ *old_flagsp = old_flags;
}
return error;
}
+/* Obtains the current flags for 'netdev' and stores them into '*flagsp'.
+ * Returns 0 if successful, otherwise a positive errno value. On failure,
+ * stores 0 into '*flagsp'. */
+int
+netdev_get_flags(const struct netdev *netdev_, enum netdev_flags *flagsp)
+{
+ struct netdev *netdev = (struct netdev *) netdev_;
+ return do_update_flags(netdev, 0, 0, flagsp, false);
+}
+
/* Sets the flags for 'netdev' to 'flags'.
* If 'permanent' is true, the changes will persist; otherwise, they
* will be reverted when 'netdev' is closed or the program exits.
netdev_set_flags(struct netdev *netdev, enum netdev_flags flags,
bool permanent)
{
- return do_update_flags(netdev, -1, flags, permanent);
+ return do_update_flags(netdev, -1, flags, NULL, permanent);
}
/* Turns on the specified 'flags' on 'netdev'.
netdev_turn_flags_on(struct netdev *netdev, enum netdev_flags flags,
bool permanent)
{
- return do_update_flags(netdev, 0, flags, permanent);
+ return do_update_flags(netdev, 0, flags, NULL, permanent);
}
/* Turns off the specified 'flags' on 'netdev'.
netdev_turn_flags_off(struct netdev *netdev, enum netdev_flags flags,
bool permanent)
{
- return do_update_flags(netdev, flags, 0, permanent);
+ return do_update_flags(netdev, flags, 0, NULL, permanent);
}
/* Looks up the ARP table entry for 'ip' on 'netdev'. If one exists and can be
* successfully retrieved, it stores the corresponding MAC address in 'mac' and
* returns 0. Otherwise, it returns a positive errno value; in particular,
- * ENXIO indicates that there is not ARP table entry for 'ip' on 'netdev'. */
+ * ENXIO indicates that there is no ARP table entry for 'ip' on 'netdev'. */
int
-netdev_nodev_arp_lookup(const char *netdev_name, uint32_t ip,
- uint8_t mac[ETH_ADDR_LEN])
+netdev_arp_lookup(const struct netdev *netdev,
+ uint32_t ip, uint8_t mac[ETH_ADDR_LEN])
{
- struct arpreq r;
- struct sockaddr_in *pa;
- int retval;
-
- init_netdev();
-
- memset(&r, 0, sizeof r);
- pa = (struct sockaddr_in *) &r.arp_pa;
- pa->sin_family = AF_INET;
- pa->sin_addr.s_addr = ip;
- pa->sin_port = 0;
- r.arp_ha.sa_family = ARPHRD_ETHER;
- r.arp_flags = 0;
- strncpy(r.arp_dev, netdev_name, sizeof r.arp_dev);
- COVERAGE_INC(netdev_arp_lookup);
- retval = ioctl(af_inet_sock, SIOCGARP, &r) < 0 ? errno : 0;
- if (!retval) {
- memcpy(mac, r.arp_ha.sa_data, ETH_ADDR_LEN);
- } else if (retval != ENXIO) {
- VLOG_WARN_RL(&rl, "%s: could not look up ARP entry for "IP_FMT": %s",
- netdev_name, IP_ARGS(&ip), strerror(retval));
- }
- return retval;
-}
-
-int
-netdev_arp_lookup(const struct netdev *netdev, uint32_t ip,
- uint8_t mac[ETH_ADDR_LEN])
-{
- return netdev_nodev_arp_lookup(netdev->name, ip, mac);
-}
-
-static int
-get_stats_via_netlink(int ifindex, struct netdev_stats *stats)
-{
- struct ofpbuf request;
- struct ofpbuf *reply;
- struct ifinfomsg *ifi;
- const struct rtnl_link_stats *rtnl_stats;
- struct nlattr *attrs[ARRAY_SIZE(rtnlgrp_link_policy)];
- int error;
-
- ofpbuf_init(&request, 0);
- nl_msg_put_nlmsghdr(&request, rtnl_sock, sizeof *ifi,
- RTM_GETLINK, NLM_F_REQUEST);
- ifi = ofpbuf_put_zeros(&request, sizeof *ifi);
- ifi->ifi_family = PF_UNSPEC;
- ifi->ifi_index = ifindex;
- error = nl_sock_transact(rtnl_sock, &request, &reply);
- ofpbuf_uninit(&request);
+ int error = (netdev->class->arp_lookup
+ ? netdev->class->arp_lookup(netdev, ip, mac)
+ : EOPNOTSUPP);
if (error) {
- return error;
- }
-
- if (!nl_policy_parse(reply, NLMSG_HDRLEN + sizeof(struct ifinfomsg),
- rtnlgrp_link_policy,
- attrs, ARRAY_SIZE(rtnlgrp_link_policy))) {
- ofpbuf_delete(reply);
- return EPROTO;
- }
-
- if (!attrs[IFLA_STATS]) {
- VLOG_WARN_RL(&rl, "RTM_GETLINK reply lacks stats");
- ofpbuf_delete(reply);
- return EPROTO;
- }
-
- rtnl_stats = nl_attr_get(attrs[IFLA_STATS]);
- stats->rx_packets = rtnl_stats->rx_packets;
- stats->tx_packets = rtnl_stats->tx_packets;
- stats->rx_bytes = rtnl_stats->rx_bytes;
- stats->tx_bytes = rtnl_stats->tx_bytes;
- stats->rx_errors = rtnl_stats->rx_errors;
- stats->tx_errors = rtnl_stats->tx_errors;
- stats->rx_dropped = rtnl_stats->rx_dropped;
- stats->tx_dropped = rtnl_stats->tx_dropped;
- stats->multicast = rtnl_stats->multicast;
- stats->collisions = rtnl_stats->collisions;
- stats->rx_length_errors = rtnl_stats->rx_length_errors;
- stats->rx_over_errors = rtnl_stats->rx_over_errors;
- stats->rx_crc_errors = rtnl_stats->rx_crc_errors;
- stats->rx_frame_errors = rtnl_stats->rx_frame_errors;
- stats->rx_fifo_errors = rtnl_stats->rx_fifo_errors;
- stats->rx_missed_errors = rtnl_stats->rx_missed_errors;
- stats->tx_aborted_errors = rtnl_stats->tx_aborted_errors;
- stats->tx_carrier_errors = rtnl_stats->tx_carrier_errors;
- stats->tx_fifo_errors = rtnl_stats->tx_fifo_errors;
- stats->tx_heartbeat_errors = rtnl_stats->tx_heartbeat_errors;
- stats->tx_window_errors = rtnl_stats->tx_window_errors;
-
- ofpbuf_delete(reply);
-
- return 0;
-}
-
-static int
-get_stats_via_proc(const char *netdev_name, struct netdev_stats *stats)
-{
- static const char fn[] = "/proc/net/dev";
- char line[1024];
- FILE *stream;
- int ln;
-
- stream = fopen(fn, "r");
- if (!stream) {
- VLOG_WARN_RL(&rl, "%s: open failed: %s", fn, strerror(errno));
- return errno;
- }
-
- ln = 0;
- while (fgets(line, sizeof line, stream)) {
- if (++ln >= 3) {
- char devname[16];
-#define X64 "%"SCNu64
- if (sscanf(line,
- " %15[^:]:"
- X64 X64 X64 X64 X64 X64 X64 "%*u"
- X64 X64 X64 X64 X64 X64 X64 "%*u",
- devname,
- &stats->rx_bytes,
- &stats->rx_packets,
- &stats->rx_errors,
- &stats->rx_dropped,
- &stats->rx_fifo_errors,
- &stats->rx_frame_errors,
- &stats->multicast,
- &stats->tx_bytes,
- &stats->tx_packets,
- &stats->tx_errors,
- &stats->tx_dropped,
- &stats->tx_fifo_errors,
- &stats->collisions,
- &stats->tx_carrier_errors) != 15) {
- VLOG_WARN_RL(&rl, "%s:%d: parse error", fn, ln);
- } else if (!strcmp(devname, netdev_name)) {
- stats->rx_length_errors = UINT64_MAX;
- stats->rx_over_errors = UINT64_MAX;
- stats->rx_crc_errors = UINT64_MAX;
- stats->rx_missed_errors = UINT64_MAX;
- stats->tx_aborted_errors = UINT64_MAX;
- stats->tx_heartbeat_errors = UINT64_MAX;
- stats->tx_window_errors = UINT64_MAX;
- fclose(stream);
- return 0;
- }
- }
+ memset(mac, 0, ETH_ADDR_LEN);
}
- VLOG_WARN_RL(&rl, "%s: no stats for %s", fn, netdev_name);
- fclose(stream);
- return ENODEV;
+ return error;
}
+/* Sets 'carrier' to true if carrier is active (link light is on) on
+ * 'netdev'. */
int
netdev_get_carrier(const struct netdev *netdev, bool *carrier)
{
- return netdev_nodev_get_carrier(netdev->name, carrier);
-}
-
-int
-netdev_nodev_get_carrier(const char *netdev_name, bool *carrier)
-{
- char line[8];
- int retval;
- int error;
- char *fn;
- int fd;
-
- *carrier = false;
-
- fn = xasprintf("/sys/class/net/%s/carrier", netdev_name);
- fd = open(fn, O_RDONLY);
- if (fd < 0) {
- error = errno;
- VLOG_WARN_RL(&rl, "%s: open failed: %s", fn, strerror(error));
- goto exit;
- }
-
- retval = read(fd, line, sizeof line);
- if (retval < 0) {
- error = errno;
- if (error == EINVAL) {
- /* This is the normal return value when we try to check carrier if
- * the network device is not up. */
- } else {
- VLOG_WARN_RL(&rl, "%s: read failed: %s", fn, strerror(error));
- }
- goto exit_close;
- } else if (retval == 0) {
- error = EPROTO;
- VLOG_WARN_RL(&rl, "%s: unexpected end of file", fn);
- goto exit_close;
- }
-
- if (line[0] != '0' && line[0] != '1') {
- error = EPROTO;
- VLOG_WARN_RL(&rl, "%s: value is %c (expected 0 or 1)", fn, line[0]);
- goto exit_close;
+ int error = (netdev->class->get_carrier
+ ? netdev->class->get_carrier(netdev, carrier)
+ : EOPNOTSUPP);
+ if (error) {
+ *carrier = false;
}
- *carrier = line[0] != '0';
- error = 0;
-
-exit_close:
- close(fd);
-exit:
- free(fn);
return error;
}
+/* Retrieves current device stats for 'netdev'. */
int
netdev_get_stats(const struct netdev *netdev, struct netdev_stats *stats)
{
int error;
COVERAGE_INC(netdev_get_stats);
- if (use_netlink_stats) {
- int ifindex;
-
- error = get_ifindex(netdev, &ifindex);
- if (!error) {
- error = get_stats_via_netlink(ifindex, stats);
- }
- } else {
- error = get_stats_via_proc(netdev->name, stats);
- }
-
+ error = (netdev->class->get_stats
+ ? netdev->class->get_stats(netdev, stats)
+ : EOPNOTSUPP);
if (error) {
memset(stats, 0xff, sizeof *stats);
}
return error;
}
-#define POLICE_ADD_CMD "/sbin/tc qdisc add dev %s handle ffff: ingress"
-#define POLICE_CONFIG_CMD "/sbin/tc filter add dev %s parent ffff: protocol ip prio 50 u32 match ip src 0.0.0.0/0 police rate %dkbit burst %dk mtu 65535 drop flowid :1"
-/* We redirect stderr to /dev/null because we often want to remove all
- * traffic control configuration on a port so its in a known state. If
- * this done when there is no such configuration, tc complains, so we just
- * always ignore it.
- */
-#define POLICE_DEL_CMD "/sbin/tc qdisc del dev %s handle ffff: ingress 2>/dev/null"
-
-/* Attempts to set input rate limiting (policing) policy. */
-int
-netdev_nodev_set_policing(const char *netdev_name, uint32_t kbits_rate,
- uint32_t kbits_burst)
-{
- char command[1024];
-
- init_netdev();
-
- COVERAGE_INC(netdev_set_policing);
- if (kbits_rate) {
- if (!kbits_burst) {
- /* Default to 10 kilobits if not specified. */
- kbits_burst = 10;
- }
-
- /* xxx This should be more careful about only adding if it
- * xxx actually exists, as opposed to always deleting it. */
- snprintf(command, sizeof(command), POLICE_DEL_CMD, netdev_name);
- if (system(command) == -1) {
- VLOG_WARN_RL(&rl, "%s: problem removing policing", netdev_name);
- }
-
- snprintf(command, sizeof(command), POLICE_ADD_CMD, netdev_name);
- if (system(command) != 0) {
- VLOG_WARN_RL(&rl, "%s: problem adding policing", netdev_name);
- return -1;
- }
-
- snprintf(command, sizeof(command), POLICE_CONFIG_CMD, netdev_name,
- kbits_rate, kbits_burst);
- if (system(command) != 0) {
- VLOG_WARN_RL(&rl, "%s: problem configuring policing",
- netdev_name);
- return -1;
- }
- } else {
- snprintf(command, sizeof(command), POLICE_DEL_CMD, netdev_name);
- if (system(command) == -1) {
- VLOG_WARN_RL(&rl, "%s: problem removing policing", netdev_name);
- }
- }
-
- return 0;
-}
-
+/* Attempts to set input rate limiting (policing) policy, such that up to
+ * 'kbits_rate' kbps of traffic is accepted, with a maximum accumulative burst
+ * size of 'kbits' kb. */
int
netdev_set_policing(struct netdev *netdev, uint32_t kbits_rate,
uint32_t kbits_burst)
{
- return netdev_nodev_set_policing(netdev->name, kbits_rate, kbits_burst);
+ return (netdev->class->set_policing
+ ? netdev->class->set_policing(netdev, kbits_rate, kbits_burst)
+ : EOPNOTSUPP);
}
-/* Initializes 'svec' with a list of the names of all known network devices. */
-void
-netdev_enumerate(struct svec *svec)
+/* If 'netdev' is a VLAN network device (e.g. one created with vconfig(8)),
+ * sets '*vlan_vid' to the VLAN VID associated with that device and returns 0.
+ * Otherwise returns a errno value (specifically ENOENT if 'netdev_name' is the
+ * name of a network device that is not a VLAN device) and sets '*vlan_vid' to
+ * -1. */
+int
+netdev_get_vlan_vid(const struct netdev *netdev, int *vlan_vid)
{
- struct if_nameindex *names;
-
- svec_init(svec);
- names = if_nameindex();
- if (names) {
- size_t i;
-
- for (i = 0; names[i].if_name != NULL; i++) {
- svec_add(svec, names[i].if_name);
- }
- if_freenameindex(names);
- } else {
- VLOG_WARN("could not obtain list of network device names: %s",
- strerror(errno));
+ int error = (netdev->class->get_vlan_vid
+ ? netdev->class->get_vlan_vid(netdev, vlan_vid)
+ : ENOENT);
+ if (error) {
+ *vlan_vid = 0;
}
+ return error;
}
-/* Attempts to locate a device based on its IPv4 address. The caller
- * may provide a hint as to the device by setting 'netdev_name' to a
- * likely device name. This string must be malloc'd, since if it is
- * not correct then it will be freed. If there is no hint, then
- * 'netdev_name' must be the NULL pointer.
- *
- * If the device is found, the return value will be true and 'netdev_name'
- * contains the device's name as a string, which the caller is responsible
- * for freeing. If the device is not found, the return value is false. */
-bool
-netdev_find_dev_by_in4(const struct in_addr *in4, char **netdev_name)
+/* Returns a network device that has 'in4' as its IP address, if one exists,
+ * otherwise a null pointer. */
+struct netdev *
+netdev_find_dev_by_in4(const struct in_addr *in4)
{
- int i;
- struct in_addr dev_in4;
+ struct netdev *netdev;
struct svec dev_list;
+ size_t i;
- /* Check the hint first. */
- if (*netdev_name && (netdev_nodev_get_in4(*netdev_name, &dev_in4, NULL))
- && (dev_in4.s_addr == in4->s_addr)) {
- return true;
- }
-
- free(*netdev_name);
- *netdev_name = NULL;
netdev_enumerate(&dev_list);
-
- for (i=0; i<dev_list.n; i++) {
- if ((netdev_nodev_get_in4(dev_list.names[i], &dev_in4, NULL))
- && (dev_in4.s_addr == in4->s_addr)) {
- *netdev_name = xstrdup(dev_list.names[i]);
- svec_destroy(&dev_list);
- return true;
+ for (i = 0; i < dev_list.n; i++) {
+ const char *name = dev_list.names[i];
+ struct in_addr dev_in4;
+
+ if (!netdev_open(name, NETDEV_ETH_TYPE_NONE, &netdev)
+ && !netdev_get_in4(netdev, &dev_in4, NULL)
+ && dev_in4.s_addr == in4->s_addr) {
+ goto exit;
}
+ netdev_close(netdev);
}
+ netdev = NULL;
+exit:
svec_destroy(&dev_list);
- return false;
+ return netdev;
}
-
-/* Looks up the next hop for 'ip'. If the next hop can be found, the
- * address is stored in 'next_hop'. If a gateway is not required to
- * reach 'ip', zero is stored in 'next_hop'. In either case, zero is
- * returned and a copy of the name of the device to reach 'ip' is stored
- * in 'netdev_name', which the caller is responsible for freeing. If a
- * route could not be determined, a positive errno is returned. */
-int
-netdev_get_next_hop(const struct in_addr *host, struct in_addr *next_hop,
- char **netdev_name)
+\f
+/* Initializes 'netdev' as a netdev named 'name' of the specified 'class'.
+ *
+ * This function adds 'netdev' to a netdev-owned linked list, so it is very
+ * important that 'netdev' only be freed after calling netdev_close(). */
+void
+netdev_init(struct netdev *netdev, const char *name,
+ const struct netdev_class *class)
{
- static const char fn[] = "/proc/net/route";
- FILE *stream;
- char line[256];
- int ln;
-
- *netdev_name = NULL;
- stream = fopen(fn, "r");
- if (stream == NULL) {
- VLOG_WARN_RL(&rl, "%s: open failed: %s", fn, strerror(errno));
- return errno;
- }
-
- ln = 0;
- while (fgets(line, sizeof line, stream)) {
- if (++ln >= 2) {
- char iface[17];
- uint32_t dest, gateway, mask;
- int refcnt, metric, mtu;
- unsigned int flags, use, window, irtt;
-
- if (sscanf(line,
- "%16s %"SCNx32" %"SCNx32" %04X %d %u %d %"SCNx32
- " %d %u %u\n",
- iface, &dest, &gateway, &flags, &refcnt,
- &use, &metric, &mask, &mtu, &window, &irtt) != 11) {
-
- VLOG_WARN_RL(&rl, "%s: could not parse line %d: %s",
- fn, ln, line);
- continue;
- }
- if (!(flags & RTF_UP)) {
- /* Skip routes that aren't up. */
- continue;
- }
-
- /* The output of 'dest', 'mask', and 'gateway' were given in
- * network byte order, so we don't need need any endian
- * conversions here. */
- if ((dest & mask) == (host->s_addr & mask)) {
- if (!gateway) {
- /* The host is directly reachable. */
- next_hop->s_addr = 0;
- } else {
- /* To reach the host, we must go through a gateway. */
- next_hop->s_addr = gateway;
- }
- *netdev_name = xstrdup(iface);
- fclose(stream);
- return 0;
- }
- }
- }
-
- fclose(stream);
- return ENXIO;
+ netdev->class = class;
+ netdev->name = xstrdup(name);
+ netdev->save_flags = 0;
+ netdev->changed_flags = 0;
+ list_push_back(&netdev_list, &netdev->node);
}
-/* Obtains the current flags for the network device named 'netdev_name' and
- * stores them into '*flagsp'. Returns 0 if successful, otherwise a positive
- * errno value. On error, stores 0 into '*flagsp'.
- *
- * If only device flags are needed, this is more efficient than calling
- * netdev_open(), netdev_get_flags(), netdev_close(). */
-int
-netdev_nodev_get_flags(const char *netdev_name, enum netdev_flags *flagsp)
+/* Initializes 'notifier' as a netdev notifier for 'netdev', for which
+ * notification will consist of calling 'cb', with auxiliary data 'aux'. */
+void
+netdev_notifier_init(struct netdev_notifier *notifier, struct netdev *netdev,
+ void (*cb)(struct netdev_notifier *), void *aux)
{
- int error, flags;
-
- init_netdev();
-
- *flagsp = 0;
- error = get_flags(netdev_name, &flags);
- if (error) {
- return error;
- }
-
- if (flags & IFF_UP) {
- *flagsp |= NETDEV_UP;
- }
- if (flags & IFF_PROMISC) {
- *flagsp |= NETDEV_PROMISC;
- }
- return 0;
+ notifier->netdev = netdev;
+ notifier->cb = cb;
+ notifier->aux = aux;
}
+\f
+/* Tracks changes in the status of a set of network devices. */
+struct netdev_monitor {
+ struct shash polled_netdevs;
+ struct shash changed_netdevs;
+};
-int
-netdev_nodev_get_etheraddr(const char *netdev_name, uint8_t mac[6])
+/* Creates and returns a new structure for monitor changes in the status of
+ * network devices. */
+struct netdev_monitor *
+netdev_monitor_create(void)
{
- init_netdev();
-
- return get_etheraddr(netdev_name, mac, NULL);
+ struct netdev_monitor *monitor = xmalloc(sizeof *monitor);
+ shash_init(&monitor->polled_netdevs);
+ shash_init(&monitor->changed_netdevs);
+ return monitor;
}
-/* If 'netdev_name' is the name of a VLAN network device (e.g. one created with
- * vconfig(8)), sets '*vlan_vid' to the VLAN VID associated with that device
- * and returns 0. Otherwise returns a errno value (specifically ENOENT if
- * 'netdev_name' is the name of a network device that is not a VLAN device) and
- * sets '*vlan_vid' to -1. */
-int
-netdev_get_vlan_vid(const char *netdev_name, int *vlan_vid)
+/* Destroys 'monitor'. */
+void
+netdev_monitor_destroy(struct netdev_monitor *monitor)
{
- struct ds line = DS_EMPTY_INITIALIZER;
- FILE *stream = NULL;
- int error;
- char *fn;
-
- COVERAGE_INC(netdev_get_vlan_vid);
- fn = xasprintf("/proc/net/vlan/%s", netdev_name);
- stream = fopen(fn, "r");
- if (!stream) {
- error = errno;
- goto done;
- }
+ if (monitor) {
+ struct shash_node *node;
- if (ds_get_line(&line, stream)) {
- if (ferror(stream)) {
- error = errno;
- VLOG_ERR_RL(&rl, "error reading \"%s\": %s", fn, strerror(errno));
- } else {
- error = EPROTO;
- VLOG_ERR_RL(&rl, "unexpected end of file reading \"%s\"", fn);
+ SHASH_FOR_EACH (node, &monitor->polled_netdevs) {
+ struct netdev_notifier *notifier = node->data;
+ notifier->netdev->class->poll_remove(notifier);
}
- goto done;
- }
- if (!sscanf(ds_cstr(&line), "%*s VID: %d", vlan_vid)) {
- error = EPROTO;
- VLOG_ERR_RL(&rl, "parse error reading \"%s\" line 1: \"%s\"",
- fn, ds_cstr(&line));
- goto done;
+ shash_destroy(&monitor->polled_netdevs);
+ shash_destroy(&monitor->changed_netdevs);
+ free(monitor);
}
-
- error = 0;
-
-done:
- free(fn);
- if (stream) {
- fclose(stream);
- }
- ds_destroy(&line);
- if (error) {
- *vlan_vid = -1;
- }
- return error;
}
-\f
-static void restore_all_flags(void *aux);
-/* Set up a signal hook to restore network device flags on program
- * termination. */
static void
-init_netdev(void)
+netdev_monitor_cb(struct netdev_notifier *notifier)
{
- static bool inited;
- if (!inited) {
- int ifindex;
- int error;
-
- inited = true;
-
- fatal_signal_add_hook(restore_all_flags, NULL, true);
+ struct netdev_monitor *monitor = notifier->aux;
+ const char *name = netdev_get_name(notifier->netdev);
+ if (!shash_find(&monitor->changed_netdevs, name)) {
+ shash_add(&monitor->changed_netdevs, name, NULL);
+ }
+}
- af_inet_sock = socket(AF_INET, SOCK_DGRAM, 0);
- if (af_inet_sock < 0) {
- ovs_fatal(errno, "socket(AF_INET)");
+/* Attempts to add 'netdev' as a netdev monitored by 'monitor'. Returns 0 if
+ * successful, otherwise a positive errno value.
+ *
+ * Adding a given 'netdev' to a monitor multiple times is equivalent to adding
+ * it once. */
+int
+netdev_monitor_add(struct netdev_monitor *monitor, struct netdev *netdev)
+{
+ const char *netdev_name = netdev_get_name(netdev);
+ int error = 0;
+ if (!shash_find(&monitor->polled_netdevs, netdev_name)
+ && netdev->class->poll_add)
+ {
+ struct netdev_notifier *notifier;
+ error = netdev->class->poll_add(netdev, netdev_monitor_cb, monitor,
+ ¬ifier);
+ if (!error) {
+ assert(notifier->netdev == netdev);
+ shash_add(&monitor->polled_netdevs, netdev_name, notifier);
}
+ }
+ return error;
+}
- error = nl_sock_create(NETLINK_ROUTE, 0, 0, 0, &rtnl_sock);
- if (error) {
- ovs_fatal(error, "socket(AF_NETLINK, NETLINK_ROUTE)");
+/* Removes 'netdev' from the set of netdevs monitored by 'monitor'. (This has
+ * no effect if 'netdev' is not in the set of devices monitored by
+ * 'monitor'.) */
+void
+netdev_monitor_remove(struct netdev_monitor *monitor, struct netdev *netdev)
+{
+ const char *netdev_name = netdev_get_name(netdev);
+ struct shash_node *node;
+
+ node = shash_find(&monitor->polled_netdevs, netdev_name);
+ if (node) {
+ /* Cancel future notifications. */
+ struct netdev_notifier *notifier = node->data;
+ netdev->class->poll_remove(notifier);
+ shash_delete(&monitor->polled_netdevs, node);
+
+ /* Drop any pending notification. */
+ node = shash_find(&monitor->changed_netdevs, netdev_name);
+ if (node) {
+ shash_delete(&monitor->changed_netdevs, node);
}
+ }
+}
- /* Decide on the netdev_get_stats() implementation to use. Netlink is
- * preferable, so if that works, we'll use it. */
- ifindex = do_get_ifindex("lo");
- if (ifindex < 0) {
- VLOG_WARN("failed to get ifindex for lo, "
- "obtaining netdev stats from proc");
- use_netlink_stats = false;
- } else {
- struct netdev_stats stats;
- error = get_stats_via_netlink(ifindex, &stats);
- if (!error) {
- VLOG_DBG("obtaining netdev stats via rtnetlink");
- use_netlink_stats = true;
- } else {
- VLOG_INFO("RTM_GETLINK failed (%s), obtaining netdev stats "
- "via proc (you are probably running a pre-2.6.19 "
- "kernel)", strerror(error));
- use_netlink_stats = false;
- }
- }
+/* Checks for changes to netdevs in the set monitored by 'monitor'. If any of
+ * the attributes (Ethernet address, carrier status, speed or peer-advertised
+ * speed, flags, etc.) of a network device monitored by 'monitor' has changed,
+ * sets '*devnamep' to the name of a device that has changed and returns 0.
+ * The caller is responsible for freeing '*devnamep' (with free()).
+ *
+ * If no devices have changed, sets '*devnamep' to NULL and returns EAGAIN.
+ */
+int
+netdev_monitor_poll(struct netdev_monitor *monitor, char **devnamep)
+{
+ struct shash_node *node = shash_first(&monitor->changed_netdevs);
+ if (!node) {
+ *devnamep = NULL;
+ return EAGAIN;
+ } else {
+ *devnamep = xstrdup(node->name);
+ shash_delete(&monitor->changed_netdevs, node);
+ return 0;
}
}
+/* Registers with the poll loop to wake up from the next call to poll_block()
+ * when netdev_monitor_poll(monitor) would indicate that a device has
+ * changed. */
+void
+netdev_monitor_poll_wait(const struct netdev_monitor *monitor)
+{
+ if (!shash_is_empty(&monitor->changed_netdevs)) {
+ poll_immediate_wake();
+ } else {
+ /* XXX Nothing needed here for netdev_linux, but maybe other netdev
+ * classes need help. */
+ }
+}
+\f
/* Restore the network device flags on 'netdev' to those that were active
* before we changed them. Returns 0 if successful, otherwise a positive
* errno value.
static int
restore_flags(struct netdev *netdev)
{
- struct ifreq ifr;
- int restore_flags;
-
- /* Get current flags. */
- strncpy(ifr.ifr_name, netdev->name, sizeof ifr.ifr_name);
- COVERAGE_INC(netdev_get_flags);
- if (ioctl(netdev->netdev_fd, SIOCGIFFLAGS, &ifr) < 0) {
- return errno;
- }
-
- /* Restore flags that we might have changed, if necessary. */
- restore_flags = netdev->changed_flags & (IFF_PROMISC | IFF_UP);
- if ((ifr.ifr_flags ^ netdev->save_flags) & restore_flags) {
- ifr.ifr_flags &= ~restore_flags;
- ifr.ifr_flags |= netdev->save_flags & restore_flags;
- COVERAGE_INC(netdev_set_flags);
- if (ioctl(netdev->netdev_fd, SIOCSIFFLAGS, &ifr) < 0) {
- return errno;
- }
+ if (netdev->changed_flags) {
+ enum netdev_flags restore = netdev->save_flags & netdev->changed_flags;
+ enum netdev_flags old_flags;
+ return netdev->class->update_flags(netdev,
+ netdev->changed_flags & ~restore,
+ restore, &old_flags);
}
-
return 0;
}
restore_flags(netdev);
}
}
-
-static int
-get_flags(const char *netdev_name, int *flags)
-{
- struct ifreq ifr;
- strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
- COVERAGE_INC(netdev_get_flags);
- if (ioctl(af_inet_sock, SIOCGIFFLAGS, &ifr) < 0) {
- VLOG_ERR("ioctl(SIOCGIFFLAGS) on %s device failed: %s",
- netdev_name, strerror(errno));
- return errno;
- }
- *flags = ifr.ifr_flags;
- return 0;
-}
-
-static int
-set_flags(const char *netdev_name, int flags)
-{
- struct ifreq ifr;
- strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
- ifr.ifr_flags = flags;
- COVERAGE_INC(netdev_set_flags);
- if (ioctl(af_inet_sock, SIOCSIFFLAGS, &ifr) < 0) {
- VLOG_ERR("ioctl(SIOCSIFFLAGS) on %s device failed: %s",
- netdev_name, strerror(errno));
- return errno;
- }
- return 0;
-}
-
-static int
-do_get_ifindex(const char *netdev_name)
-{
- struct ifreq ifr;
-
- strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
- COVERAGE_INC(netdev_get_ifindex);
- if (ioctl(af_inet_sock, SIOCGIFINDEX, &ifr) < 0) {
- VLOG_WARN_RL(&rl, "ioctl(SIOCGIFINDEX) on %s device failed: %s",
- netdev_name, strerror(errno));
- return -errno;
- }
- return ifr.ifr_ifindex;
-}
-
-static int
-get_ifindex(const struct netdev *netdev, int *ifindexp)
-{
- *ifindexp = 0;
- if (netdev->ifindex < 0) {
- int ifindex = do_get_ifindex(netdev->name);
- if (ifindex < 0) {
- return -ifindex;
- }
- ((struct netdev *) netdev)->ifindex = ifindex;
- }
- *ifindexp = netdev->ifindex;
- return 0;
-}
-
-static int
-get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN],
- int *hwaddr_familyp)
-{
- struct ifreq ifr;
-
- memset(&ifr, 0, sizeof ifr);
- strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
- COVERAGE_INC(netdev_get_hwaddr);
- if (ioctl(af_inet_sock, SIOCGIFHWADDR, &ifr) < 0) {
- VLOG_ERR("ioctl(SIOCGIFHWADDR) on %s device failed: %s",
- netdev_name, strerror(errno));
- return errno;
- }
- if (hwaddr_familyp) {
- int hwaddr_family = ifr.ifr_hwaddr.sa_family;
- *hwaddr_familyp = hwaddr_family;
- if (hwaddr_family != AF_UNSPEC && hwaddr_family != ARPHRD_ETHER) {
- VLOG_WARN("%s device has unknown hardware address family %d",
- netdev_name, hwaddr_family);
- }
- }
- memcpy(ea, ifr.ifr_hwaddr.sa_data, ETH_ADDR_LEN);
- return 0;
-}
-
-static int
-set_etheraddr(const char *netdev_name, int hwaddr_family,
- const uint8_t mac[ETH_ADDR_LEN])
-{
- struct ifreq ifr;
-
- memset(&ifr, 0, sizeof ifr);
- strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name);
- ifr.ifr_hwaddr.sa_family = hwaddr_family;
- memcpy(ifr.ifr_hwaddr.sa_data, mac, ETH_ADDR_LEN);
- COVERAGE_INC(netdev_set_hwaddr);
- if (ioctl(af_inet_sock, SIOCSIFHWADDR, &ifr) < 0) {
- VLOG_ERR("ioctl(SIOCSIFHWADDR) on %s device failed: %s",
- netdev_name, strerror(errno));
- return errno;
- }
- return 0;
-}
enum netdev_flags {
NETDEV_UP = 0x0001, /* Device enabled? */
- NETDEV_PROMISC = 0x0002 /* Promiscuous mode? */
+ NETDEV_PROMISC = 0x0002, /* Promiscuous mode? */
+ NETDEV_LOOPBACK = 0x0004 /* This is a loopback device. */
};
enum netdev_pseudo_ethertype {
NETDEV_ETH_TYPE_802_2 /* Receive all IEEE 802.2 frames. */
};
+/* Network device statistics.
+ *
+ * Values of unsupported statistics are set to all-1-bits (UINT64_MAX). */
struct netdev_stats {
uint64_t rx_packets; /* Total packets received. */
uint64_t tx_packets; /* Total packets transmitted. */
struct netdev;
+int netdev_initialize(void);
+void netdev_run(void);
+void netdev_wait(void);
+
int netdev_open(const char *name, int ethertype, struct netdev **);
-int netdev_open_tap(const char *name, struct netdev **);
void netdev_close(struct netdev *);
+bool netdev_exists(const char *name);
+
+int netdev_enumerate(struct svec *);
+
+const char *netdev_get_name(const struct netdev *);
+int netdev_get_mtu(const struct netdev *, int *mtup);
+
int netdev_recv(struct netdev *, struct ofpbuf *);
void netdev_recv_wait(struct netdev *);
int netdev_drain(struct netdev *);
+
int netdev_send(struct netdev *, const struct ofpbuf *);
void netdev_send_wait(struct netdev *);
+
int netdev_set_etheraddr(struct netdev *, const uint8_t mac[6]);
-const uint8_t *netdev_get_etheraddr(const struct netdev *);
-const char *netdev_get_name(const struct netdev *);
-int netdev_get_mtu(const struct netdev *);
+int netdev_get_etheraddr(const struct netdev *, uint8_t mac[6]);
+
+int netdev_get_carrier(const struct netdev *, bool *carrier);
int netdev_get_features(struct netdev *,
uint32_t *current, uint32_t *advertised,
uint32_t *supported, uint32_t *peer);
int netdev_set_advertisements(struct netdev *, uint32_t advertise);
-bool netdev_get_in4(const struct netdev *, struct in_addr *addr,
- struct in_addr *mask);
-int netdev_set_in4(struct netdev *, struct in_addr in4, struct in_addr mask);
-int netdev_add_router(struct in_addr router);
-bool netdev_get_in6(const struct netdev *, struct in6_addr *);
+
+int netdev_get_in4(const struct netdev *, struct in_addr *address,
+ struct in_addr *netmask);
+int netdev_set_in4(struct netdev *, struct in_addr addr, struct in_addr mask);
+int netdev_get_in6(const struct netdev *, struct in6_addr *);
+int netdev_add_router(struct netdev *, struct in_addr router);
+int netdev_get_next_hop(const struct netdev *, const struct in_addr *host,
+ struct in_addr *next_hop, char **);
+int netdev_arp_lookup(const struct netdev *, uint32_t ip, uint8_t mac[6]);
+
int netdev_get_flags(const struct netdev *, enum netdev_flags *);
int netdev_set_flags(struct netdev *, enum netdev_flags, bool permanent);
int netdev_turn_flags_on(struct netdev *, enum netdev_flags, bool permanent);
int netdev_turn_flags_off(struct netdev *, enum netdev_flags, bool permanent);
-int netdev_arp_lookup(const struct netdev *, uint32_t ip, uint8_t mac[6]);
-int netdev_get_carrier(const struct netdev *, bool *carrier);
+
int netdev_get_stats(const struct netdev *, struct netdev_stats *);
int netdev_set_policing(struct netdev *, uint32_t kbits_rate,
uint32_t kbits_burst);
-void netdev_enumerate(struct svec *);
-bool netdev_find_dev_by_in4(const struct in_addr *in4, char **netdev_name);
-int netdev_get_next_hop(const struct in_addr *host, struct in_addr *next_hop,
- char **netdev_name);
-int netdev_nodev_get_flags(const char *netdev_name, enum netdev_flags *);
-bool netdev_nodev_get_in4(const char *netdev_name, struct in_addr *in4,
- struct in_addr *mask);
-int netdev_nodev_set_etheraddr(const char *name, const uint8_t mac[6]);
-int netdev_nodev_get_etheraddr(const char *netdev_name, uint8_t mac[6]);
-int netdev_nodev_set_policing(const char *netdev_name, uint32_t kbits_rate,
- uint32_t kbits_burst);
-int netdev_nodev_arp_lookup(const char *netdev_name, uint32_t ip,
- uint8_t mac[6]);
-int netdev_nodev_get_carrier(const char *netdev_name, bool *carrier);
-
-int netdev_get_vlan_vid(const char *netdev_name, int *vlan_vid);
+int netdev_get_vlan_vid(const struct netdev *, int *vlan_vid);
+struct netdev *netdev_find_dev_by_in4(const struct in_addr *);
+
+struct netdev_monitor *netdev_monitor_create(void);
+void netdev_monitor_destroy(struct netdev_monitor *);
+int netdev_monitor_add(struct netdev_monitor *, struct netdev *);
+void netdev_monitor_remove(struct netdev_monitor *, struct netdev *);
+int netdev_monitor_poll(struct netdev_monitor *, char **devnamep);
+void netdev_monitor_poll_wait(const struct netdev_monitor *);
#endif /* netdev.h */
time_t last_admitted;
/* These values are simply for statistics reporting, not used directly by
- * anything internal to the rconn (or the secchan for that matter). */
+ * anything internal to the rconn (or ofproto for that matter). */
unsigned int packets_received;
unsigned int n_attempted_connections, n_successful_connections;
time_t creation_time;
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+
+#include "rtnetlink.h"
+
+#include <errno.h>
+#include <sys/socket.h>
+#include <linux/rtnetlink.h>
+#include <net/if.h>
+#include <poll.h>
+
+#include "coverage.h"
+#include "netlink.h"
+#include "ofpbuf.h"
+
+#define THIS_MODULE VLM_rtnetlink
+#include "vlog.h"
+
+/* rtnetlink socket. */
+static struct nl_sock *notify_sock;
+
+/* All registered notifiers. */
+static struct list all_notifiers = LIST_INITIALIZER(&all_notifiers);
+
+static void rtnetlink_report_change(const struct nlmsghdr *,
+ const struct ifinfomsg *,
+ struct nlattr *attrs[]);
+static void rtnetlink_report_notify_error(void);
+
+/* Registers 'cb' to be called with auxiliary data 'aux' with network device
+ * change notifications. The notifier is stored in 'notifier', which the
+ * caller must not modify or free.
+ *
+ * This is probably not the function that you want. You should probably be
+ * using dpif_port_poll() or netdev_monitor_create(), which unlike this
+ * function are not Linux-specific.
+ *
+ * Returns 0 if successful, otherwise a positive errno value. */
+int
+rtnetlink_notifier_register(struct rtnetlink_notifier *notifier,
+ rtnetlink_notify_func *cb, void *aux)
+{
+ if (!notify_sock) {
+ int error = nl_sock_create(NETLINK_ROUTE, RTNLGRP_LINK, 0, 0,
+ ¬ify_sock);
+ if (error) {
+ VLOG_WARN("could not create rtnetlink socket: %s",
+ strerror(error));
+ return error;
+ }
+ } else {
+ /* Catch up on notification work so that the new notifier won't
+ * receive any stale notifications. */
+ rtnetlink_notifier_run();
+ }
+
+ list_push_back(&all_notifiers, ¬ifier->node);
+ notifier->cb = cb;
+ notifier->aux = aux;
+ return 0;
+}
+
+/* Cancels notification on 'notifier', which must have previously been
+ * registered with lxnetdev_notifier_register(). */
+void
+rtnetlink_notifier_unregister(struct rtnetlink_notifier *notifier)
+{
+ list_remove(¬ifier->node);
+ if (list_is_empty(&all_notifiers)) {
+ nl_sock_destroy(notify_sock);
+ notify_sock = NULL;
+ }
+}
+
+/* Calls all of the registered notifiers, passing along any as-yet-unreported
+ * netdev change events. */
+void
+rtnetlink_notifier_run(void)
+{
+ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+
+ if (!notify_sock) {
+ return;
+ }
+
+ for (;;) {
+ /* Policy for RTNLGRP_LINK messages.
+ *
+ * There are *many* more fields in these messages, but currently we
+ * only care about these fields. */
+ static const struct nl_policy rtnetlink_policy[] = {
+ [IFLA_IFNAME] = { .type = NL_A_STRING, .optional = false },
+ [IFLA_MASTER] = { .type = NL_A_U32, .optional = true },
+ };
+
+ struct nlattr *attrs[ARRAY_SIZE(rtnetlink_policy)];
+ struct ofpbuf *buf;
+ int error;
+
+ error = nl_sock_recv(notify_sock, &buf, false);
+ if (!error) {
+ if (nl_policy_parse(buf, NLMSG_HDRLEN + sizeof(struct ifinfomsg),
+ rtnetlink_policy,
+ attrs, ARRAY_SIZE(rtnetlink_policy))) {
+ struct ifinfomsg *ifinfo;
+
+ ifinfo = (void *) ((char *) buf->data + NLMSG_HDRLEN);
+ rtnetlink_report_change(buf->data, ifinfo, attrs);
+ } else {
+ VLOG_WARN_RL(&rl, "received bad rtnl message");
+ rtnetlink_report_notify_error();
+ }
+ ofpbuf_delete(buf);
+ } else if (error == EAGAIN) {
+ return;
+ } else {
+ if (error == ENOBUFS) {
+ VLOG_WARN_RL(&rl, "rtnetlink receive buffer overflowed");
+ } else {
+ VLOG_WARN_RL(&rl, "error reading rtnetlink socket: %s",
+ strerror(error));
+ }
+ rtnetlink_report_notify_error();
+ }
+ }
+}
+
+/* Causes poll_block() to wake up when network device change notifications are
+ * ready. */
+void
+rtnetlink_notifier_wait(void)
+{
+ if (notify_sock) {
+ nl_sock_wait(notify_sock, POLLIN);
+ }
+}
+
+static void
+rtnetlink_report_change(const struct nlmsghdr *nlmsg,
+ const struct ifinfomsg *ifinfo,
+ struct nlattr *attrs[])
+{
+ struct rtnetlink_notifier *notifier;
+ struct rtnetlink_change change;
+
+ COVERAGE_INC(rtnetlink_changed);
+
+ change.nlmsg_type = nlmsg->nlmsg_type;
+ change.ifi_index = ifinfo->ifi_index;
+ change.ifname = nl_attr_get_string(attrs[IFLA_IFNAME]);
+ change.master_ifindex = (attrs[IFLA_MASTER]
+ ? nl_attr_get_u32(attrs[IFLA_MASTER]) : 0);
+
+ LIST_FOR_EACH (notifier, struct rtnetlink_notifier, node,
+ &all_notifiers) {
+ notifier->cb(&change, notifier->aux);
+ }
+}
+
+static void
+rtnetlink_report_notify_error(void)
+{
+ struct rtnetlink_notifier *notifier;
+
+ LIST_FOR_EACH (notifier, struct rtnetlink_notifier, node,
+ &all_notifiers) {
+ notifier->cb(NULL, notifier->aux);
+ }
+}
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef RTNETLINK_H
+#define RTNETLINK_H 1
+
+/* These functions are Linux specific, so they should be used directly only by
+ * Linux-specific code. */
+
+#include "list.h"
+
+/* A digested version of an rtnetlink message sent down by the kernel to
+ * indicate that a network device has been created or destroyed or changed. */
+struct rtnetlink_change {
+ /* Copied from struct nlmsghdr. */
+ int nlmsg_type; /* e.g. RTM_NEWLINK, RTM_DELLINK. */
+
+ /* Copied from struct ifinfomsg. */
+ int ifi_index; /* Index of network device. */
+
+ /* Extracted from Netlink attributes. */
+ const char *ifname; /* Name of network device. */
+ int master_ifindex; /* Ifindex of datapath master (0 if none). */
+};
+
+/* Function called to report that a netdev has changed. 'change' describes the
+ * specific change. It may be null if the buffer of change information
+ * overflowed, in which case the function must assume that every device may
+ * have changed. 'aux' is as specified in the call to
+ * lxnetdev_notifier_register(). */
+typedef void rtnetlink_notify_func(const struct rtnetlink_change *, void *aux);
+
+struct rtnetlink_notifier {
+ struct list node;
+ rtnetlink_notify_func *cb;
+ void *aux;
+};
+
+int rtnetlink_notifier_register(struct rtnetlink_notifier *,
+ rtnetlink_notify_func *, void *aux);
+void rtnetlink_notifier_unregister(struct rtnetlink_notifier *);
+void rtnetlink_notifier_run(void);
+void rtnetlink_notifier_wait(void);
+
+#endif /* rtnetlink.h */
{
struct shash_node *node, *next;
- HMAP_FOR_EACH_SAFE (node, next, struct shash_node, node, &sh->map) {
+ SHASH_FOR_EACH_SAFE (node, next, sh) {
hmap_remove(&sh->map, &node->node);
free(node->name);
free(node);
}
}
-/* It is the caller's responsible to avoid duplicate names, if that is
+bool
+shash_is_empty(const struct shash *shash)
+{
+ return hmap_is_empty(&shash->map);
+}
+
+/* It is the caller's responsibility to avoid duplicate names, if that is
* desirable. */
-void
+struct shash_node *
shash_add(struct shash *sh, const char *name, void *data)
{
struct shash_node *node = xmalloc(sizeof *node);
node->name = xstrdup(name);
node->data = data;
hmap_insert(&sh->map, &node->node, hash_name(name));
+ return node;
}
void
struct shash_node *node = shash_find(sh, name);
return node ? node->data : NULL;
}
+
+struct shash_node *
+shash_first(const struct shash *shash)
+{
+ struct hmap_node *node = hmap_first(&shash->map);
+ return node ? CONTAINER_OF(node, struct shash_node, node) : NULL;
+}
+
#define SHASH_INITIALIZER(SHASH) { HMAP_INITIALIZER(&(SHASH)->map) }
+#define SHASH_FOR_EACH(SHASH_NODE, SHASH) \
+ HMAP_FOR_EACH (SHASH_NODE, struct shash_node, node, &(SHASH)->map)
+
+#define SHASH_FOR_EACH_SAFE(SHASH_NODE, NEXT, SHASH) \
+ HMAP_FOR_EACH_SAFE (SHASH_NODE, NEXT, struct shash_node, node, \
+ &(SHASH)->map)
+
void shash_init(struct shash *);
void shash_destroy(struct shash *);
void shash_clear(struct shash *);
-void shash_add(struct shash *, const char *, void *);
+bool shash_is_empty(const struct shash *);
+struct shash_node *shash_add(struct shash *, const char *, void *);
void shash_delete(struct shash *, struct shash_node *);
struct shash_node *shash_find(const struct shash *, const char *);
void *shash_find_data(const struct shash *, const char *);
+struct shash_node *shash_first(const struct shash *);
#endif /* shash.h */
#include <poll.h>
#include <stddef.h>
#include <stdio.h>
+#include <stdlib.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/un.h>
: htonl(0)); /* ??? */
}
+/* Opens a non-blocking TCP socket and connects to 'target', which should be a
+ * string in the format "<host>[:<port>]", where <host> is required and <port>
+ * is optional, with 'default_port' assumed if <port> is omitted.
+ *
+ * On success, returns 0 (indicating connection complete) or EAGAIN (indicating
+ * connection in progress), in which case the new file descriptor is stored
+ * into '*fdp'. On failure, returns a positive errno value other than EAGAIN
+ * and stores -1 into '*fdp'.
+ *
+ * If 'sinp' is non-null, then on success the target address is stored into
+ * '*sinp'. */
+int
+tcp_open_active(const char *target_, uint16_t default_port,
+ struct sockaddr_in *sinp, int *fdp)
+{
+ char *target = xstrdup(target_);
+ char *save_ptr = NULL;
+ const char *host_name;
+ const char *port_string;
+ struct sockaddr_in sin;
+ int fd = -1;
+ int error;
+
+ /* Defaults. */
+ memset(&sin, 0, sizeof sin);
+ sin.sin_family = AF_INET;
+ sin.sin_port = htons(default_port);
+
+ /* Tokenize. */
+ host_name = strtok_r(target, ":", &save_ptr);
+ port_string = strtok_r(NULL, ":", &save_ptr);
+ if (!host_name) {
+ ovs_error(0, "%s: bad peer name format", target_);
+ error = EAFNOSUPPORT;
+ goto exit;
+ }
+
+ /* Look up IP, port. */
+ error = lookup_ip(host_name, &sin.sin_addr);
+ if (error) {
+ goto exit;
+ }
+ if (port_string && atoi(port_string)) {
+ sin.sin_port = htons(atoi(port_string));
+ }
+
+ /* Create non-blocking socket. */
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ if (fd < 0) {
+ VLOG_ERR("%s: socket: %s", target_, strerror(errno));
+ error = errno;
+ goto exit;
+ }
+ error = set_nonblocking(fd);
+ if (error) {
+ goto exit_close;
+ }
+
+ /* Connect. */
+ error = connect(fd, (struct sockaddr *) &sin, sizeof sin) == 0 ? 0 : errno;
+ if (error == EINPROGRESS) {
+ error = EAGAIN;
+ } else if (error && error != EAGAIN) {
+ goto exit_close;
+ }
+
+ /* Success: error is 0 or EAGAIN. */
+ goto exit;
+
+exit_close:
+ close(fd);
+exit:
+ if (!error || error == EAGAIN) {
+ if (sinp) {
+ *sinp = sin;
+ }
+ *fdp = fd;
+ } else {
+ *fdp = -1;
+ }
+ free(target);
+ return error;
+}
+
+/* Opens a non-blocking TCP socket, binds to 'target', and listens for incoming
+ * connections. 'target' should be a string in the format "[<port>][:<ip>]",
+ * where both <port> and <ip> are optional. If <port> is omitted, it defaults
+ * to 'default_port'; if <ip> is omitted it defaults to the wildcard IP
+ * address.
+ *
+ * The socket will have SO_REUSEADDR turned on.
+ *
+ * On success, returns a non-negative file descriptor. On failure, returns a
+ * negative errno value. */
+int
+tcp_open_passive(const char *target_, uint16_t default_port)
+{
+ char *target = xstrdup(target_);
+ char *string_ptr = target;
+ struct sockaddr_in sin;
+ const char *host_name;
+ const char *port_string;
+ int fd, error;
+ unsigned int yes = 1;
+
+ /* Address defaults. */
+ memset(&sin, 0, sizeof sin);
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = htonl(INADDR_ANY);
+ sin.sin_port = htons(default_port);
+
+ /* Parse optional port number. */
+ port_string = strsep(&string_ptr, ":");
+ if (port_string && atoi(port_string)) {
+ sin.sin_port = htons(atoi(port_string));
+ }
+
+ /* Parse optional bind IP. */
+ host_name = strsep(&string_ptr, ":");
+ if (host_name && host_name[0]) {
+ error = lookup_ip(host_name, &sin.sin_addr);
+ if (error) {
+ goto exit;
+ }
+ }
+
+ /* Create non-blocking socket, set SO_REUSEADDR. */
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ if (fd < 0) {
+ error = errno;
+ VLOG_ERR("%s: socket: %s", target_, strerror(error));
+ goto exit;
+ }
+ error = set_nonblocking(fd);
+ if (error) {
+ goto exit_close;
+ }
+ if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof yes) < 0) {
+ error = errno;
+ VLOG_ERR("%s: setsockopt(SO_REUSEADDR): %s", target_, strerror(error));
+ goto exit_close;
+ }
+
+ /* Bind. */
+ if (bind(fd, (struct sockaddr *) &sin, sizeof sin) < 0) {
+ error = errno;
+ VLOG_ERR("%s: bind: %s", target_, strerror(error));
+ goto exit_close;
+ }
+
+ /* Listen. */
+ if (listen(fd, 10) < 0) {
+ error = errno;
+ VLOG_ERR("%s: listen: %s", target_, strerror(error));
+ goto exit_close;
+ }
+ error = 0;
+ goto exit;
+
+exit_close:
+ close(fd);
+exit:
+ free(target);
+ return error ? -error : fd;
+}
+
/* Returns a readable and writable fd for /dev/null, if successful, otherwise
* a negative errno value. The caller must not close the returned fd (because
* the same fd will be handed out to subsequent callers). */
uint32_t guess_netmask(uint32_t ip);
int get_null_fd(void);
+int tcp_open_active(const char *target, uint16_t default_port,
+ struct sockaddr_in *sinp, int *fdp);
+int tcp_open_passive(const char *target, uint16_t default_port);
+
int read_fully(int fd, void *, size_t, size_t *bytes_read);
int write_fully(int fd, const void *, size_t, size_t *bytes_written);
struct shash_node *node;
ds_put_cstr(&ds, "The available commands are:\n");
- HMAP_FOR_EACH (node, struct shash_node, node, &commands.map) {
+ SHASH_FOR_EACH (node, &commands) {
ds_put_format(&ds, "\t%s\n", node->name);
}
unixctl_command_reply(conn, 214, ds_cstr(&ds));
* A program that (optionally) daemonizes itself should call this function
* *after* daemonization, so that the socket name contains the pid of the
* daemon instead of the pid of the program that exited. (Otherwise,
- * "ovs-appctl --target <program>.pid" will fail.)
+ * "ovs-appctl --target=<program>" will fail.)
*
* Returns 0 if successful, otherwise a positive errno value. If successful,
* sets '*serverp' to the new unixctl_server, otherwise to NULL. */
}
close(server->fd);
- unlink(server->path);
- fatal_signal_remove_file_to_unlink(server->path);
+ fatal_signal_unlink_file_now(server->path);
free(server->path);
free(server);
}
unixctl_client_destroy(struct unixctl_client *client)
{
if (client) {
- unlink(client->bind_path);
- fatal_signal_remove_file_to_unlink(client->bind_path);
+ fatal_signal_unlink_file_now(client->bind_path);
free(client->bind_path);
free(client->connect_path);
fclose(client->stream);
s = ds_cstr(&line);
if (*reply_code == -1) {
- if (!isdigit(s[0]) || !isdigit(s[1]) || !isdigit(s[2])) {
+ if (!isdigit((unsigned char)s[0])
+ || !isdigit((unsigned char)s[1])
+ || !isdigit((unsigned char)s[2])) {
VLOG_WARN("reply from %s does not start with 3-digit code",
client->connect_path);
error = EPROTO;
extern const char *program_name;
+/* Returns the number of elements in ARRAY. */
#define ARRAY_SIZE(ARRAY) (sizeof ARRAY / sizeof *ARRAY)
-#define ROUND_UP(X, Y) (((X) + ((Y) - 1)) / (Y) * (Y))
+
+/* Returns X / Y, rounding up. X must be nonnegative to round correctly. */
+#define DIV_ROUND_UP(X, Y) (((X) + ((Y) - 1)) / (Y))
+
+/* Returns X rounded up to the nearest multiple of Y. */
+#define ROUND_UP(X, Y) (DIV_ROUND_UP(X, Y) * (Y))
+
+/* Returns X rounded down to the nearest multiple of Y. */
#define ROUND_DOWN(X, Y) ((X) / (Y) * (Y))
+
+/* Returns true if X is a power of 2, otherwise false. */
#define IS_POW2(X) ((X) && !((X) & ((X) - 1)))
#ifndef MIN
uint32_t local_ip;
uint16_t local_port;
char *name;
- bool reconnectable;
};
void vconn_init(struct vconn *, struct vconn_class *, int connect_status,
- const char *name, bool reconnectable);
+ const char *name);
void vconn_set_remote_ip(struct vconn *, uint32_t remote_ip);
void vconn_set_remote_port(struct vconn *, uint16_t remote_port);
void vconn_set_local_ip(struct vconn *, uint32_t local_ip);
/* Create and return the ssl_vconn. */
sslv = xmalloc(sizeof *sslv);
- vconn_init(&sslv->vconn, &ssl_vconn_class, EAGAIN, name, true);
+ vconn_init(&sslv->vconn, &ssl_vconn_class, EAGAIN, name);
vconn_set_remote_ip(&sslv->vconn, remote->sin_addr.s_addr);
vconn_set_remote_port(&sslv->vconn, remote->sin_port);
vconn_set_local_ip(&sslv->vconn, local.sin_addr.s_addr);
static int
ssl_open(const char *name, char *suffix, struct vconn **vconnp)
{
- char *save_ptr = NULL;
- char *host_name, *port_string;
struct sockaddr_in sin;
- int retval;
- int fd;
-
- retval = ssl_init();
- if (retval) {
- return retval;
- }
-
- host_name = strtok_r(suffix, ":", &save_ptr);
- port_string = strtok_r(NULL, ":", &save_ptr);
- if (!host_name) {
- ovs_error(0, "%s: bad peer name format", name);
- return EAFNOSUPPORT;
- }
-
- memset(&sin, 0, sizeof sin);
- sin.sin_family = AF_INET;
- if (lookup_ip(host_name, &sin.sin_addr)) {
- return ENOENT;
- }
- sin.sin_port = htons(port_string && *port_string ? atoi(port_string)
- : OFP_SSL_PORT);
+ int error, fd;
- /* Create socket. */
- fd = socket(AF_INET, SOCK_STREAM, 0);
- if (fd < 0) {
- VLOG_ERR("%s: socket: %s", name, strerror(errno));
- return errno;
- }
- retval = set_nonblocking(fd);
- if (retval) {
- close(fd);
- return retval;
+ error = ssl_init();
+ if (error) {
+ return error;
}
- /* Connect socket. */
- retval = connect(fd, (struct sockaddr *) &sin, sizeof sin);
- if (retval < 0) {
- if (errno == EINPROGRESS) {
- return new_ssl_vconn(name, fd, CLIENT, STATE_TCP_CONNECTING,
- &sin, vconnp);
- } else {
- int error = errno;
- VLOG_ERR("%s: connect: %s", name, strerror(error));
- close(fd);
- return error;
- }
+ error = tcp_open_active(suffix, OFP_SSL_PORT, &sin, &fd);
+ if (fd >= 0) {
+ int state = error ? STATE_TCP_CONNECTING : STATE_SSL_CONNECTING;
+ return new_ssl_vconn(name, fd, CLIENT, state, &sin, vconnp);
} else {
- return new_ssl_vconn(name, fd, CLIENT, STATE_SSL_CONNECTING,
- &sin, vconnp);
+ VLOG_ERR("%s: connect: %s", name, strerror(error));
+ return error;
}
}
static int
pssl_open(const char *name, char *suffix, struct pvconn **pvconnp)
{
- struct sockaddr_in sin;
struct pssl_pvconn *pssl;
int retval;
int fd;
- unsigned int yes = 1;
retval = ssl_init();
if (retval) {
return retval;
}
- /* Create socket. */
- fd = socket(AF_INET, SOCK_STREAM, 0);
+ fd = tcp_open_passive(suffix, OFP_SSL_PORT);
if (fd < 0) {
- int error = errno;
- VLOG_ERR("%s: socket: %s", name, strerror(error));
- return error;
- }
-
- if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof yes) < 0) {
- int error = errno;
- VLOG_ERR("%s: setsockopt(SO_REUSEADDR): %s", name, strerror(errno));
- return error;
- }
-
- memset(&sin, 0, sizeof sin);
- sin.sin_family = AF_INET;
- sin.sin_addr.s_addr = htonl(INADDR_ANY);
- sin.sin_port = htons(atoi(suffix) ? atoi(suffix) : OFP_SSL_PORT);
- retval = bind(fd, (struct sockaddr *) &sin, sizeof sin);
- if (retval < 0) {
- int error = errno;
- VLOG_ERR("%s: bind: %s", name, strerror(error));
- close(fd);
- return error;
- }
-
- retval = listen(fd, 10);
- if (retval < 0) {
- int error = errno;
- VLOG_ERR("%s: listen: %s", name, strerror(error));
- close(fd);
- return error;
- }
-
- retval = set_nonblocking(fd);
- if (retval) {
- close(fd);
- return retval;
+ return -fd;
}
pssl = xmalloc(sizeof *pssl);
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
+#include "fatal-signal.h"
#include "leak-checker.h"
#include "ofpbuf.h"
#include "openflow/openflow.h"
struct ofpbuf *rxbuf;
struct ofpbuf *txbuf;
struct poll_waiter *tx_waiter;
+ char *unlink_path;
};
static struct vconn_class stream_vconn_class;
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(10, 25);
static void stream_clear_txbuf(struct stream_vconn *);
+static void maybe_unlink_and_free(char *path);
+/* Creates a new vconn named 'name' that will send and receive data on 'fd' and
+ * stores a pointer to the vconn in '*vconnp'. Initial connection status
+ * 'connect_status' is interpreted as described for vconn_init().
+ *
+ * When '*vconnp' is closed, then 'unlink_path' (if nonnull) will be passed to
+ * fatal_signal_unlink_file_now() and then freed with free().
+ *
+ * Returns 0 if successful, otherwise a positive errno value. (The current
+ * implementation never fails.) */
int
new_stream_vconn(const char *name, int fd, int connect_status,
- bool reconnectable, struct vconn **vconnp)
+ char *unlink_path, struct vconn **vconnp)
{
struct stream_vconn *s;
s = xmalloc(sizeof *s);
- vconn_init(&s->vconn, &stream_vconn_class, connect_status,
- name, reconnectable);
+ vconn_init(&s->vconn, &stream_vconn_class, connect_status, name);
s->fd = fd;
s->txbuf = NULL;
s->tx_waiter = NULL;
s->rxbuf = NULL;
+ s->unlink_path = unlink_path;
*vconnp = &s->vconn;
return 0;
}
stream_clear_txbuf(s);
ofpbuf_delete(s->rxbuf);
close(s->fd);
+ maybe_unlink_and_free(s->unlink_path);
free(s);
}
int fd;
int (*accept_cb)(int fd, const struct sockaddr *, size_t sa_len,
struct vconn **);
+ char *unlink_path;
};
static struct pvconn_class pstream_pvconn_class;
return CONTAINER_OF(pvconn, struct pstream_pvconn, pvconn);
}
+/* Creates a new pvconn named 'name' that will accept new socket connections on
+ * 'fd' and stores a pointer to the vconn in '*pvconnp'.
+ *
+ * When a connection has been accepted, 'accept_cb' will be called with the new
+ * socket fd 'fd' and the remote address of the connection 'sa' and 'sa_len'.
+ * accept_cb must return 0 if the connection is successful, in which case it
+ * must initialize '*vconnp' to the new vconn, or a positive errno value on
+ * error. In either case accept_cb takes ownership of the 'fd' passed in.
+ *
+ * When '*pvconnp' is closed, then 'unlink_path' (if nonnull) will be passed to
+ * fatal_signal_unlink_file_now() and freed with free().
+ *
+ * Returns 0 if successful, otherwise a positive errno value. (The current
+ * implementation never fails.) */
int
new_pstream_pvconn(const char *name, int fd,
- int (*accept_cb)(int fd, const struct sockaddr *,
- size_t sa_len, struct vconn **),
- struct pvconn **pvconnp)
+ int (*accept_cb)(int fd, const struct sockaddr *sa,
+ size_t sa_len, struct vconn **vconnp),
+ char *unlink_path, struct pvconn **pvconnp)
{
- struct pstream_pvconn *ps;
- int retval;
-
- retval = set_nonblocking(fd);
- if (retval) {
- close(fd);
- return retval;
- }
-
- if (listen(fd, 10) < 0) {
- int error = errno;
- VLOG_ERR("%s: listen: %s", name, strerror(error));
- close(fd);
- return error;
- }
-
- ps = xmalloc(sizeof *ps);
+ struct pstream_pvconn *ps = xmalloc(sizeof *ps);
pvconn_init(&ps->pvconn, &pstream_pvconn_class, name);
ps->fd = fd;
ps->accept_cb = accept_cb;
+ ps->unlink_path = unlink_path;
*pvconnp = &ps->pvconn;
return 0;
}
{
struct pstream_pvconn *ps = pstream_pvconn_cast(pvconn);
close(ps->fd);
+ maybe_unlink_and_free(ps->unlink_path);
free(ps);
}
pstream_accept,
pstream_wait
};
+\f
+/* Helper functions. */
+static void
+maybe_unlink_and_free(char *path)
+{
+ if (path) {
+ fatal_signal_unlink_file_now(path);
+ free(path);
+ }
+}
struct sockaddr;
int new_stream_vconn(const char *name, int fd, int connect_status,
- bool reconnectable, struct vconn **vconnp);
+ char *unlink_path, struct vconn **vconnp);
int new_pstream_pvconn(const char *name, int fd,
int (*accept_cb)(int fd, const struct sockaddr *,
size_t sa_len, struct vconn **),
+ char *unlink_path,
struct pvconn **pvconnp);
#endif /* vconn-stream.h */
return errno;
}
- retval = new_stream_vconn(name, fd, connect_status, true, vconnp);
+ retval = new_stream_vconn(name, fd, connect_status, NULL, vconnp);
if (!retval) {
struct vconn *vconn = *vconnp;
vconn_set_remote_ip(vconn, remote->sin_addr.s_addr);
static int
tcp_open(const char *name, char *suffix, struct vconn **vconnp)
{
- char *save_ptr = NULL;
- const char *host_name;
- const char *port_string;
struct sockaddr_in sin;
- int retval;
- int fd;
-
- host_name = strtok_r(suffix, ":", &save_ptr);
- port_string = strtok_r(NULL, ":", &save_ptr);
- if (!host_name) {
- ovs_error(0, "%s: bad peer name format", name);
- return EAFNOSUPPORT;
- }
-
- memset(&sin, 0, sizeof sin);
- sin.sin_family = AF_INET;
- if (lookup_ip(host_name, &sin.sin_addr)) {
- return ENOENT;
- }
- sin.sin_port = htons(port_string ? atoi(port_string) : OFP_TCP_PORT);
-
- fd = socket(AF_INET, SOCK_STREAM, 0);
- if (fd < 0) {
- VLOG_ERR("%s: socket: %s", name, strerror(errno));
- return errno;
- }
-
- retval = set_nonblocking(fd);
- if (retval) {
- close(fd);
- return retval;
- }
+ int fd, error;
- retval = connect(fd, (struct sockaddr *) &sin, sizeof sin);
- if (retval < 0) {
- if (errno == EINPROGRESS) {
- return new_tcp_vconn(name, fd, EAGAIN, &sin, vconnp);
- } else {
- int error = errno;
- VLOG_ERR("%s: connect: %s", name, strerror(error));
- close(fd);
- return error;
- }
+ error = tcp_open_active(suffix, OFP_TCP_PORT, &sin, &fd);
+ if (fd >= 0) {
+ return new_tcp_vconn(name, fd, error, &sin, vconnp);
} else {
- return new_tcp_vconn(name, fd, 0, &sin, vconnp);
+ VLOG_ERR("%s: connect: %s", name, strerror(error));
+ return error;
}
}
struct vconn **vconnp);
static int
-ptcp_open(const char *name, char *suffix, struct pvconn **pvconnp)
+ptcp_open(const char *name UNUSED, char *suffix, struct pvconn **pvconnp)
{
- struct sockaddr_in sin;
- int retval;
int fd;
- unsigned int yes = 1;
- fd = socket(AF_INET, SOCK_STREAM, 0);
+ fd = tcp_open_passive(suffix, OFP_TCP_PORT);
if (fd < 0) {
- VLOG_ERR("%s: socket: %s", name, strerror(errno));
- return errno;
- }
-
- if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof yes) < 0) {
- VLOG_ERR("%s: setsockopt(SO_REUSEADDR): %s", name, strerror(errno));
- return errno;
- }
-
- memset(&sin, 0, sizeof sin);
- sin.sin_family = AF_INET;
- sin.sin_addr.s_addr = htonl(INADDR_ANY);
- sin.sin_port = htons(atoi(suffix) ? atoi(suffix) : OFP_TCP_PORT);
- retval = bind(fd, (struct sockaddr *) &sin, sizeof sin);
- if (retval < 0) {
- int error = errno;
- VLOG_ERR("%s: bind: %s", name, strerror(error));
- close(fd);
- return error;
+ return -fd;
+ } else {
+ return new_pstream_pvconn("ptcp", fd, ptcp_accept, NULL, pvconnp);
}
-
- return new_pstream_pvconn("ptcp", fd, ptcp_accept, pvconnp);
}
static int
unix_open(const char *name, char *suffix, struct vconn **vconnp)
{
const char *connect_path = suffix;
- char bind_path[128];
+ char *bind_path;
int fd;
- sprintf(bind_path, "/tmp/vconn-unix.%ld.%d",
- (long int) getpid(), n_unix_sockets++);
+ bind_path = xasprintf("/tmp/vconn-unix.%ld.%d",
+ (long int) getpid(), n_unix_sockets++);
fd = make_unix_socket(SOCK_STREAM, true, false, bind_path, connect_path);
if (fd < 0) {
VLOG_ERR("%s: connection to %s failed: %s",
bind_path, connect_path, strerror(-fd));
+ free(bind_path);
return -fd;
}
return new_stream_vconn(name, fd, check_connection_completion(fd),
- true, vconnp);
+ bind_path, vconnp);
}
struct vconn_class unix_vconn_class = {
static int
punix_open(const char *name UNUSED, char *suffix, struct pvconn **pvconnp)
{
- int fd;
+ int fd, error;
fd = make_unix_socket(SOCK_STREAM, true, true, suffix, NULL);
if (fd < 0) {
return errno;
}
- return new_pstream_pvconn("punix", fd, punix_accept, pvconnp);
+ error = set_nonblocking(fd);
+ if (error) {
+ close(fd);
+ return error;
+ }
+
+ if (listen(fd, 10) < 0) {
+ error = errno;
+ VLOG_ERR("%s: listen: %s", name, strerror(error));
+ close(fd);
+ return error;
+ }
+
+ return new_pstream_pvconn("punix", fd, punix_accept,
+ xstrdup(suffix), pvconnp);
}
static int
} else {
strcpy(name, "unix");
}
- return new_stream_vconn(name, fd, 0, true, vconnp);
+ return new_stream_vconn(name, fd, 0, NULL, vconnp);
}
struct pvconn_class punix_pvconn_class = {
if (passive) {
printf("Passive OpenFlow connection methods:\n");
- printf(" ptcp:[PORT] "
- "listen to TCP PORT (default: %d)\n",
+ printf(" ptcp:[PORT][:IP] "
+ "listen to TCP PORT (default: %d) on IP\n",
OFP_TCP_PORT);
#ifdef HAVE_OPENSSL
- printf(" pssl:[PORT] "
- "listen for SSL on PORT (default: %d)\n",
+ printf(" pssl:[PORT][:IP] "
+ "listen for SSL on PORT (default: %d) on IP\n",
OFP_SSL_PORT);
#endif
printf(" punix:FILE "
if (retval != EAGAIN) {
vconn->state = VCS_DISCONNECTED;
- vconn->error = retval;
+ vconn->error = retval == EOF ? ECONNRESET : retval;
}
}
static int
do_recv(struct vconn *vconn, struct ofpbuf **msgp)
{
- int retval;
-
-again:
- retval = (vconn->class->recv)(vconn, msgp);
+ int retval = (vconn->class->recv)(vconn, msgp);
if (!retval) {
struct ofp_header *oh;
&& oh->type != OFPT_VENDOR)
{
if (vconn->version < 0) {
- if (oh->type == OFPT_PACKET_IN
- || oh->type == OFPT_FLOW_EXPIRED
- || oh->type == OFPT_PORT_STATUS) {
- /* The kernel datapath is stateless and doesn't really
- * support version negotiation, so it can end up sending
- * these asynchronous message before version negotiation
- * is complete. Just ignore them.
- *
- * (After we move OFPT_PORT_STATUS messages from the kernel
- * into secchan, we won't get those here, since secchan
- * does proper version negotiation.) */
- ofpbuf_delete(*msgp);
- goto again;
- }
VLOG_ERR_RL(&bad_ofmsg_rl,
"%s: received OpenFlow message type %"PRIu8" "
"before version negotiation complete",
m->wildcards = htonl(wc);
}
+/* Initializes 'vconn' as a new vconn named 'name', implemented via 'class'.
+ * The initial connection status, supplied as 'connect_status', is interpreted
+ * as follows:
+ *
+ * - 0: 'vconn' is connected. Its 'send' and 'recv' functions may be
+ * called in the normal fashion.
+ *
+ * - EAGAIN: 'vconn' is trying to complete a connection. Its 'connect'
+ * function should be called to complete the connection.
+ *
+ * - Other positive errno values indicate that the connection failed with
+ * the specified error.
+ *
+ * After calling this function, vconn_close() must be used to destroy 'vconn',
+ * otherwise resources will be leaked.
+ *
+ * The caller retains ownership of 'name'. */
void
vconn_init(struct vconn *vconn, struct vconn_class *class, int connect_status,
- const char *name, bool reconnectable)
+ const char *name)
{
vconn->class = class;
vconn->state = (connect_status == EAGAIN ? VCS_CONNECTING
vconn->local_ip = 0;
vconn->local_port = 0;
vconn->name = xstrdup(name);
- vconn->reconnectable = reconnectable;
}
void
VLOG_MODULE(dhcp_client)
VLOG_MODULE(discovery)
VLOG_MODULE(dpif)
+VLOG_MODULE(dpif_linux)
+VLOG_MODULE(dpif_netdev)
VLOG_MODULE(dpctl)
VLOG_MODULE(executer)
VLOG_MODULE(ezio_term)
VLOG_MODULE(fail_open)
+VLOG_MODULE(fatal_signal)
VLOG_MODULE(fault)
VLOG_MODULE(flow)
VLOG_MODULE(in_band)
VLOG_MODULE(mac_learning)
VLOG_MODULE(mgmt)
VLOG_MODULE(netdev)
+VLOG_MODULE(netdev_linux)
VLOG_MODULE(netflow)
VLOG_MODULE(netlink)
VLOG_MODULE(ofctl)
VLOG_MODULE(ovs_discover)
VLOG_MODULE(ofproto)
+VLOG_MODULE(openflowd)
VLOG_MODULE(pktbuf)
VLOG_MODULE(pcap)
VLOG_MODULE(poll_loop)
VLOG_MODULE(port_watcher)
VLOG_MODULE(proc_net_compat)
VLOG_MODULE(process)
-VLOG_MODULE(secchan)
VLOG_MODULE(rconn)
+VLOG_MODULE(rtnetlink)
VLOG_MODULE(stp)
-VLOG_MODULE(stp_secchan)
VLOG_MODULE(stats)
VLOG_MODULE(status)
VLOG_MODULE(svec)
.RS
.IP \(bu
\fImodule\fR may be any valid module name (as displayed by the
-\fB--list\fR action on \fBovs-appctl\fR(8)), or the special name
+\fB--list\fR action on \fBovs\-appctl\fR(8)), or the special name
\fBANY\fR to set the logging levels for all modules.
.
.IP \(bu
.RE
.IP "\fBvlog/set PATTERN:\fIfacility\fB:\fIpattern\fR"
Sets the log pattern for \fIfacility\fR to \fIpattern\fR. Refer to
-\fBovs-appctl\fR(8) for a description of the valid syntax for \fIpattern\fR.
+\fBovs\-appctl\fR(8) for a description of the valid syntax for \fIpattern\fR.
.
.IP "\fBvlog/list\fR"
Lists the supported logging modules and their current levels.
p++;
}
field = 0;
- while (isdigit(*p)) {
+ while (isdigit((unsigned char)*p)) {
field = (field * 10) + (*p - '0');
p++;
}
.RS
.IP \(bu
\fImodule\fR may be any valid module name (as displayed by the
-\fB--list\fR action on \fBovs-appctl\fR(8)), or the special name
+\fB--list\fR action on \fBovs\-appctl\fR(8)), or the special name
\fBANY\fR to set the logging levels for all modules.
.IP \(bu
.TP
\fB-vPATTERN:\fIfacility\fB:\fIpattern\fR, \fB--verbose=PATTERN:\fIfacility\fB:\fIpattern\fR
Sets the log pattern for \fIfacility\fR to \fIpattern\fR. Refer to
-\fBovs-appctl\fR(8) for a description of the valid syntax for \fIpattern\fR.
+\fBovs\-appctl\fR(8) for a description of the valid syntax for \fIpattern\fR.
.TP
\fB--log-file\fR[\fB=\fIfile\fR]
# See the License for the specific language governing permissions and
# limitations under the License.
+dnl Checks for --enable-coverage and updates CFLAGS and LDFLAGS appropriately.
+AC_DEFUN([OVS_CHECK_COVERAGE],
+ [AC_REQUIRE([AC_PROG_CC])
+ AC_ARG_ENABLE(
+ [coverage],
+ [AC_HELP_STRING([--enable-coverage],
+ [Enable gcov coverage tool.])],
+ [case "${enableval}" in
+ (lcov) coverage=true lcov=true ;;
+ (yes) coverage=true lcov=false ;;
+ (no) coverage=false lcov=false ;;
+ (*) AC_MSG_ERROR([bad value ${enableval} for --enable-coverage]) ;;
+ esac],
+ [coverage=false lcov=false])
+ if $coverage; then
+ CFLAGS="$CFLAGS -O0 --coverage"
+ LDFLAGS="$LDFLAGS --coverage"
+ fi
+ if $lcov; then
+ if lcov --version >/dev/null 2>&1; then :; else
+ AC_MSG_ERROR([--enable-coverage=lcov was specified but lcov is not in \$PATH])
+ fi
+ fi
+ AC_SUBST([LCOV], [$lcov])])
+
dnl Checks for --enable-ndebug and defines NDEBUG if it is specified.
AC_DEFUN([OVS_CHECK_NDEBUG],
[AC_ARG_ENABLE(
AC_DEFUN([OVS_CHECK_PCRE],
[dnl Make sure that pkg-config is installed.
m4_pattern_forbid([PKG_CHECK_MODULES])
- PKG_CHECK_MODULES([PCRE], [libpcre], [HAVE_PCRE=yes], [HAVE_PCRE=no])
+ PKG_CHECK_MODULES([PCRE], [libpcre >= 7.2], [HAVE_PCRE=yes], [HAVE_PCRE=no])
AM_CONDITIONAL([HAVE_PCRE], [test "$HAVE_PCRE" = yes])
if test "$HAVE_PCRE" = yes; then
AC_DEFINE([HAVE_PCRE], [1], [Define to 1 if libpcre is installed.])
fi])
+
+dnl Checks for Python 2.x, x >= 4.
+AC_DEFUN([OVS_CHECK_PYTHON],
+ [AC_ARG_VAR([PYTHON], [path to Python 2.x])
+ AC_CACHE_CHECK(
+ [for Python 2.x for x >= 4],
+ [ovs_cv_python],
+ [if test -n "$PYTHON"; then
+ ovs_cv_python=$PYTHON
+ else
+ ovs_cv_python=no
+ for binary in python python2.4 python2.5; do
+ ovs_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+ for dir in $PATH; do
+ IFS=$ovs_save_IFS
+ test -z "$dir" && dir=.
+ if test -x $dir/$binary && $dir/$binary -c 'import sys
+if sys.hexversion >= 0x02040000 and sys.hexversion < 0x03000000:
+ sys.exit(0)
+else:
+ sys.exit(1)'; then
+ ovs_cv_python=$dir/$binary
+ break 2
+ fi
+ done
+ done
+ fi])
+ PYTHON=$ovs_cv_python])
--- /dev/null
+/Makefile
+/Makefile.in
--- /dev/null
+# Copyright (C) 2009 Nicira Networks, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved. This file is offered as-is,
+# without warranty of any kind.
+
+noinst_LIBRARIES += ofproto/libofproto.a
+ofproto_libofproto_a_SOURCES = \
+ ofproto/discovery.c \
+ ofproto/discovery.h \
+ ofproto/executer.c \
+ ofproto/executer.h \
+ ofproto/fail-open.c \
+ ofproto/fail-open.h \
+ ofproto/in-band.c \
+ ofproto/in-band.h \
+ ofproto/netflow.c \
+ ofproto/netflow.h \
+ ofproto/ofproto.c \
+ ofproto/ofproto.h \
+ ofproto/pktbuf.c \
+ ofproto/pktbuf.h \
+ ofproto/pinsched.c \
+ ofproto/pinsched.h \
+ ofproto/status.c \
+ ofproto/status.h
+
+include ofproto/commands/automake.mk
--- /dev/null
+commandsdir = ${pkgdatadir}/commands
+dist_commands_SCRIPTS = \
+ ofproto/commands/reboot
--- /dev/null
+#! /bin/sh
+ovs-kill --force --signal=USR1 ovs-switchui.pid
+reboot
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "discovery.h"
+#include <errno.h>
+#include <inttypes.h>
+#include <net/if.h>
+#include <regex.h>
+#include <stdlib.h>
+#include <string.h>
+#include "dhcp-client.h"
+#include "dhcp.h"
+#include "dpif.h"
+#include "netdev.h"
+#include "openflow/openflow.h"
+#include "packets.h"
+#include "status.h"
+#include "vconn-ssl.h"
+
+#define THIS_MODULE VLM_discovery
+#include "vlog.h"
+
+struct discovery {
+ char *re;
+ bool update_resolv_conf;
+ regex_t *regex;
+ struct dhclient *dhcp;
+ int n_changes;
+ struct status_category *ss_cat;
+};
+
+static void modify_dhcp_request(struct dhcp_msg *, void *aux);
+static bool validate_dhcp_offer(const struct dhcp_msg *, void *aux);
+
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(60, 60);
+
+static void
+discovery_status_cb(struct status_reply *sr, void *d_)
+{
+ struct discovery *d = d_;
+
+ status_reply_put(sr, "accept-remote=%s", d->re);
+ status_reply_put(sr, "n-changes=%d", d->n_changes);
+ if (d->dhcp) {
+ status_reply_put(sr, "state=%s", dhclient_get_state(d->dhcp));
+ status_reply_put(sr, "state-elapsed=%u",
+ dhclient_get_state_elapsed(d->dhcp));
+ if (dhclient_is_bound(d->dhcp)) {
+ uint32_t ip = dhclient_get_ip(d->dhcp);
+ uint32_t netmask = dhclient_get_netmask(d->dhcp);
+ uint32_t router = dhclient_get_router(d->dhcp);
+
+ const struct dhcp_msg *cfg = dhclient_get_config(d->dhcp);
+ uint32_t dns_server;
+ char *domain_name;
+ int i;
+
+ status_reply_put(sr, "ip="IP_FMT, IP_ARGS(&ip));
+ status_reply_put(sr, "netmask="IP_FMT, IP_ARGS(&netmask));
+ if (router) {
+ status_reply_put(sr, "router="IP_FMT, IP_ARGS(&router));
+ }
+
+ for (i = 0; dhcp_msg_get_ip(cfg, DHCP_CODE_DNS_SERVER, i,
+ &dns_server);
+ i++) {
+ status_reply_put(sr, "dns%d="IP_FMT, i, IP_ARGS(&dns_server));
+ }
+
+ domain_name = dhcp_msg_get_string(cfg, DHCP_CODE_DOMAIN_NAME);
+ if (domain_name) {
+ status_reply_put(sr, "domain=%s", domain_name);
+ free(domain_name);
+ }
+
+ status_reply_put(sr, "lease-remaining=%u",
+ dhclient_get_lease_remaining(d->dhcp));
+ }
+ }
+}
+
+int
+discovery_create(const char *re, bool update_resolv_conf,
+ struct dpif *dpif, struct switch_status *ss,
+ struct discovery **discoveryp)
+{
+ struct discovery *d;
+ char local_name[IF_NAMESIZE];
+ int error;
+
+ d = xcalloc(1, sizeof *d);
+
+ /* Controller regular expression. */
+ error = discovery_set_accept_controller_re(d, re);
+ if (error) {
+ goto error_free;
+ }
+ d->update_resolv_conf = update_resolv_conf;
+
+ /* Initialize DHCP client. */
+ error = dpif_port_get_name(dpif, ODPP_LOCAL,
+ local_name, sizeof local_name);
+ if (error) {
+ VLOG_ERR("failed to query datapath local port: %s", strerror(error));
+ goto error_regfree;
+ }
+ error = dhclient_create(local_name, modify_dhcp_request,
+ validate_dhcp_offer, d, &d->dhcp);
+ if (error) {
+ VLOG_ERR("failed to initialize DHCP client: %s", strerror(error));
+ goto error_regfree;
+ }
+ dhclient_set_max_timeout(d->dhcp, 3);
+ dhclient_init(d->dhcp, 0);
+
+ d->ss_cat = switch_status_register(ss, "discovery",
+ discovery_status_cb, d);
+
+ *discoveryp = d;
+ return 0;
+
+error_regfree:
+ regfree(d->regex);
+ free(d->regex);
+error_free:
+ free(d);
+ *discoveryp = 0;
+ return error;
+}
+
+void
+discovery_destroy(struct discovery *d)
+{
+ if (d) {
+ free(d->re);
+ regfree(d->regex);
+ free(d->regex);
+ dhclient_destroy(d->dhcp);
+ switch_status_unregister(d->ss_cat);
+ free(d);
+ }
+}
+
+void
+discovery_set_update_resolv_conf(struct discovery *d,
+ bool update_resolv_conf)
+{
+ d->update_resolv_conf = update_resolv_conf;
+}
+
+int
+discovery_set_accept_controller_re(struct discovery *d, const char *re_)
+{
+ regex_t *regex;
+ int error;
+ char *re;
+
+ re = (!re_ ? xstrdup(vconn_ssl_is_configured() ? "^ssl:.*" : "^tcp:.*")
+ : re_[0] == '^' ? xstrdup(re_) : xasprintf("^%s", re_));
+ regex = xmalloc(sizeof *regex);
+ error = regcomp(regex, re, REG_NOSUB | REG_EXTENDED);
+ if (error) {
+ size_t length = regerror(error, regex, NULL, 0);
+ char *buffer = xmalloc(length);
+ regerror(error, regex, buffer, length);
+ VLOG_WARN("%s: %s", re, buffer);
+ free(regex);
+ free(re);
+ return EINVAL;
+ } else {
+ if (d->regex) {
+ regfree(d->regex);
+ free(d->regex);
+ }
+ free(d->re);
+
+ d->regex = regex;
+ d->re = re;
+ return 0;
+ }
+}
+
+void
+discovery_question_connectivity(struct discovery *d)
+{
+ if (d->dhcp) {
+ dhclient_force_renew(d->dhcp, 15);
+ }
+}
+
+bool
+discovery_run(struct discovery *d, char **controller_name)
+{
+ if (!d->dhcp) {
+ *controller_name = NULL;
+ return true;
+ }
+
+ dhclient_run(d->dhcp);
+ if (!dhclient_changed(d->dhcp)) {
+ return false;
+ }
+
+ dhclient_configure_netdev(d->dhcp);
+ if (d->update_resolv_conf) {
+ dhclient_update_resolv_conf(d->dhcp);
+ }
+
+ if (dhclient_is_bound(d->dhcp)) {
+ *controller_name = dhcp_msg_get_string(dhclient_get_config(d->dhcp),
+ DHCP_CODE_OFP_CONTROLLER_VCONN);
+ VLOG_INFO("%s: discovered controller", *controller_name);
+ d->n_changes++;
+ } else {
+ *controller_name = NULL;
+ if (d->n_changes) {
+ VLOG_INFO("discovered controller no longer available");
+ d->n_changes++;
+ }
+ }
+ return true;
+}
+
+void
+discovery_wait(struct discovery *d)
+{
+ if (d->dhcp) {
+ dhclient_wait(d->dhcp);
+ }
+}
+
+static void
+modify_dhcp_request(struct dhcp_msg *msg, void *aux UNUSED)
+{
+ dhcp_msg_put_string(msg, DHCP_CODE_VENDOR_CLASS, "OpenFlow");
+}
+
+static bool
+validate_dhcp_offer(const struct dhcp_msg *msg, void *d_)
+{
+ const struct discovery *d = d_;
+ char *vconn_name;
+ bool accept;
+
+ vconn_name = dhcp_msg_get_string(msg, DHCP_CODE_OFP_CONTROLLER_VCONN);
+ if (!vconn_name) {
+ VLOG_WARN_RL(&rl, "rejecting DHCP offer missing controller vconn");
+ return false;
+ }
+ accept = !regexec(d->regex, vconn_name, 0, NULL, 0);
+ if (!accept) {
+ VLOG_WARN_RL(&rl, "rejecting controller vconn that fails to match %s",
+ d->re);
+ }
+ free(vconn_name);
+ return accept;
+}
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DISCOVERY_H
+#define DISCOVERY_H 1
+
+#include <stdbool.h>
+
+struct dpif;
+struct discovery;
+struct settings;
+struct switch_status;
+
+int discovery_create(const char *accept_controller_re, bool update_resolv_conf,
+ struct dpif *, struct switch_status *,
+ struct discovery **);
+void discovery_destroy(struct discovery *);
+void discovery_set_update_resolv_conf(struct discovery *,
+ bool update_resolv_conf);
+int discovery_set_accept_controller_re(struct discovery *, const char *re);
+void discovery_question_connectivity(struct discovery *);
+bool discovery_run(struct discovery *, char **controller_name);
+void discovery_wait(struct discovery *);
+
+#endif /* discovery.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "executer.h"
+#include <errno.h>
+#include <fcntl.h>
+#include <fnmatch.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <string.h>
+#include <unistd.h>
+#include "dirs.h"
+#include "dynamic-string.h"
+#include "fatal-signal.h"
+#include "openflow/nicira-ext.h"
+#include "ofpbuf.h"
+#include "openflow/openflow.h"
+#include "poll-loop.h"
+#include "rconn.h"
+#include "socket-util.h"
+#include "util.h"
+#include "vconn.h"
+
+#define THIS_MODULE VLM_executer
+#include "vlog.h"
+
+#define MAX_CHILDREN 8
+
+struct child {
+ /* Information about child process. */
+ char *name; /* argv[0] passed to child. */
+ pid_t pid; /* Child's process ID. */
+
+ /* For sending a reply to the controller when the child dies. */
+ struct rconn *rconn;
+ uint32_t xid; /* Transaction ID used by controller. */
+
+ /* We read up to MAX_OUTPUT bytes of output and send them back to the
+ * controller when the child dies. */
+#define MAX_OUTPUT 4096
+ int output_fd; /* FD from which to read child's output. */
+ uint8_t *output; /* Output data. */
+ size_t output_size; /* Number of bytes of output data so far. */
+};
+
+struct executer {
+ /* Settings. */
+ char *command_acl; /* Command white/blacklist, as shell globs. */
+ char *command_dir; /* Directory that contains commands. */
+
+ /* Children. */
+ struct child children[MAX_CHILDREN];
+ size_t n_children;
+};
+
+/* File descriptors for waking up when a child dies. */
+static int signal_fds[2] = {-1, -1};
+
+static void send_child_status(struct rconn *, uint32_t xid, uint32_t status,
+ const void *data, size_t size);
+static void send_child_message(struct rconn *, uint32_t xid, uint32_t status,
+ const char *message);
+
+/* Returns true if 'cmd' is allowed by 'acl', which is a command-separated
+ * access control list in the format described for --command-acl in
+ * ovs-openflowd(8). */
+static bool
+executer_is_permitted(const char *acl_, const char *cmd)
+{
+ char *acl, *save_ptr, *pattern;
+ bool allowed, denied;
+
+ /* Verify that 'cmd' consists only of alphanumerics plus _ or -. */
+ if (cmd[strspn(cmd, "abcdefghijklmnopqrstuvwxyz"
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-")] != '\0') {
+ VLOG_WARN("rejecting command name \"%s\" that contain forbidden "
+ "characters", cmd);
+ return false;
+ }
+
+ /* Check 'cmd' against 'acl'. */
+ acl = xstrdup(acl_);
+ save_ptr = acl;
+ allowed = denied = false;
+ while ((pattern = strsep(&save_ptr, ",")) != NULL && !denied) {
+ if (pattern[0] != '!' && !fnmatch(pattern, cmd, 0)) {
+ allowed = true;
+ } else if (pattern[0] == '!' && !fnmatch(pattern + 1, cmd, 0)) {
+ denied = true;
+ }
+ }
+ free(acl);
+
+ /* Check the command white/blacklisted state. */
+ if (allowed && !denied) {
+ VLOG_INFO("permitting command execution: \"%s\" is whitelisted", cmd);
+ } else if (allowed && denied) {
+ VLOG_WARN("denying command execution: \"%s\" is both blacklisted "
+ "and whitelisted", cmd);
+ } else if (!allowed) {
+ VLOG_WARN("denying command execution: \"%s\" is not whitelisted", cmd);
+ } else if (denied) {
+ VLOG_WARN("denying command execution: \"%s\" is blacklisted", cmd);
+ }
+ return allowed && !denied;
+}
+
+int
+executer_handle_request(struct executer *e, struct rconn *rconn,
+ struct nicira_header *request)
+{
+ char **argv;
+ char *args;
+ char *exec_file = NULL;
+ int max_fds;
+ struct stat s;
+ size_t args_size;
+ size_t argc;
+ size_t i;
+ pid_t pid;
+ int output_fds[2];
+
+ /* Verify limit on children not exceeded.
+ * XXX should probably kill children when the connection drops? */
+ if (e->n_children >= MAX_CHILDREN) {
+ send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
+ "too many child processes");
+ return 0;
+ }
+
+ /* Copy argument buffer, adding a null terminator at the end. Now every
+ * argument is null-terminated, instead of being merely null-delimited. */
+ args_size = ntohs(request->header.length) - sizeof *request;
+ args = xmemdup0((const void *) (request + 1), args_size);
+
+ /* Count arguments. */
+ argc = 0;
+ for (i = 0; i <= args_size; i++) {
+ argc += args[i] == '\0';
+ }
+
+ /* Set argv[*] to point to each argument. */
+ argv = xmalloc((argc + 1) * sizeof *argv);
+ argv[0] = args;
+ for (i = 1; i < argc; i++) {
+ argv[i] = strchr(argv[i - 1], '\0') + 1;
+ }
+ argv[argc] = NULL;
+
+ /* Check permissions. */
+ if (!executer_is_permitted(e->command_acl, argv[0])) {
+ send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
+ "command not allowed");
+ goto done;
+ }
+
+ /* Find the executable. */
+ exec_file = xasprintf("%s/%s", e->command_dir, argv[0]);
+ if (stat(exec_file, &s)) {
+ VLOG_WARN("failed to stat \"%s\": %s", exec_file, strerror(errno));
+ send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
+ "command not allowed");
+ goto done;
+ }
+ if (!S_ISREG(s.st_mode)) {
+ VLOG_WARN("\"%s\" is not a regular file", exec_file);
+ send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
+ "command not allowed");
+ goto done;
+ }
+ argv[0] = exec_file;
+
+ /* Arrange to capture output. */
+ if (pipe(output_fds)) {
+ VLOG_WARN("pipe failed: %s", strerror(errno));
+ send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
+ "internal error (pipe)");
+ goto done;
+ }
+
+ pid = fork();
+ if (!pid) {
+ /* Running in child.
+ * XXX should run in new process group so that we can signal all
+ * subprocesses at once? Would also want to catch fatal signals and
+ * kill them at the same time though. */
+ fatal_signal_fork();
+ dup2(get_null_fd(), 0);
+ dup2(output_fds[1], 1);
+ dup2(get_null_fd(), 2);
+ max_fds = get_max_fds();
+ for (i = 3; i < max_fds; i++) {
+ close(i);
+ }
+ if (chdir(e->command_dir)) {
+ printf("could not change directory to \"%s\": %s",
+ e->command_dir, strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+ execv(argv[0], argv);
+ printf("failed to start \"%s\": %s\n", argv[0], strerror(errno));
+ exit(EXIT_FAILURE);
+ } else if (pid > 0) {
+ /* Running in parent. */
+ struct child *child;
+
+ VLOG_INFO("started \"%s\" subprocess", argv[0]);
+ send_child_status(rconn, request->header.xid, NXT_STATUS_STARTED,
+ NULL, 0);
+ child = &e->children[e->n_children++];
+ child->name = xstrdup(argv[0]);
+ child->pid = pid;
+ child->rconn = rconn;
+ child->xid = request->header.xid;
+ child->output_fd = output_fds[0];
+ child->output = xmalloc(MAX_OUTPUT);
+ child->output_size = 0;
+ set_nonblocking(output_fds[0]);
+ close(output_fds[1]);
+ } else {
+ VLOG_WARN("fork failed: %s", strerror(errno));
+ send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
+ "internal error (fork)");
+ close(output_fds[0]);
+ close(output_fds[1]);
+ }
+
+done:
+ free(exec_file);
+ free(args);
+ free(argv);
+ return 0;
+}
+
+static void
+send_child_status(struct rconn *rconn, uint32_t xid, uint32_t status,
+ const void *data, size_t size)
+{
+ if (rconn) {
+ struct nx_command_reply *r;
+ struct ofpbuf *buffer;
+
+ r = make_openflow_xid(sizeof *r, OFPT_VENDOR, xid, &buffer);
+ r->nxh.vendor = htonl(NX_VENDOR_ID);
+ r->nxh.subtype = htonl(NXT_COMMAND_REPLY);
+ r->status = htonl(status);
+ ofpbuf_put(buffer, data, size);
+ update_openflow_length(buffer);
+ if (rconn_send(rconn, buffer, NULL)) {
+ ofpbuf_delete(buffer);
+ }
+ }
+}
+
+static void
+send_child_message(struct rconn *rconn, uint32_t xid, uint32_t status,
+ const char *message)
+{
+ send_child_status(rconn, xid, status, message, strlen(message));
+}
+
+/* 'child' died with 'status' as its return code. Deal with it. */
+static void
+child_terminated(struct child *child, int status)
+{
+ struct ds ds;
+ uint32_t ofp_status;
+
+ /* Log how it terminated. */
+ ds_init(&ds);
+ if (WIFEXITED(status)) {
+ ds_put_format(&ds, "normally with status %d", WEXITSTATUS(status));
+ } else if (WIFSIGNALED(status)) {
+ const char *name = NULL;
+#ifdef HAVE_STRSIGNAL
+ name = strsignal(WTERMSIG(status));
+#endif
+ ds_put_format(&ds, "by signal %d", WTERMSIG(status));
+ if (name) {
+ ds_put_format(&ds, " (%s)", name);
+ }
+ }
+ if (WCOREDUMP(status)) {
+ ds_put_cstr(&ds, " (core dumped)");
+ }
+ VLOG_INFO("child process \"%s\" with pid %ld terminated %s",
+ child->name, (long int) child->pid, ds_cstr(&ds));
+ ds_destroy(&ds);
+
+ /* Send a status message back to the controller that requested the
+ * command. */
+ if (WIFEXITED(status)) {
+ ofp_status = WEXITSTATUS(status) | NXT_STATUS_EXITED;
+ } else if (WIFSIGNALED(status)) {
+ ofp_status = WTERMSIG(status) | NXT_STATUS_SIGNALED;
+ } else {
+ ofp_status = NXT_STATUS_UNKNOWN;
+ }
+ if (WCOREDUMP(status)) {
+ ofp_status |= NXT_STATUS_COREDUMP;
+ }
+ send_child_status(child->rconn, child->xid, ofp_status,
+ child->output, child->output_size);
+}
+
+/* Read output from 'child' and append it to its output buffer. */
+static void
+poll_child(struct child *child)
+{
+ ssize_t n;
+
+ if (child->output_fd < 0) {
+ return;
+ }
+
+ do {
+ n = read(child->output_fd, child->output + child->output_size,
+ MAX_OUTPUT - child->output_size);
+ } while (n < 0 && errno == EINTR);
+ if (n > 0) {
+ child->output_size += n;
+ if (child->output_size < MAX_OUTPUT) {
+ return;
+ }
+ } else if (n < 0 && errno == EAGAIN) {
+ return;
+ }
+ close(child->output_fd);
+ child->output_fd = -1;
+}
+
+void
+executer_run(struct executer *e)
+{
+ char buffer[MAX_CHILDREN];
+ size_t i;
+
+ if (!e->n_children) {
+ return;
+ }
+
+ /* Read output from children. */
+ for (i = 0; i < e->n_children; i++) {
+ struct child *child = &e->children[i];
+ poll_child(child);
+ }
+
+ /* If SIGCHLD was received, reap dead children. */
+ if (read(signal_fds[0], buffer, sizeof buffer) <= 0) {
+ return;
+ }
+ for (;;) {
+ int status;
+ pid_t pid;
+
+ /* Get dead child in 'pid' and its return code in 'status'. */
+ pid = waitpid(WAIT_ANY, &status, WNOHANG);
+ if (pid < 0 && errno == EINTR) {
+ continue;
+ } else if (pid <= 0) {
+ return;
+ }
+
+ /* Find child with given 'pid' and drop it from the list. */
+ for (i = 0; i < e->n_children; i++) {
+ struct child *child = &e->children[i];
+ if (child->pid == pid) {
+ poll_child(child);
+ child_terminated(child, status);
+ free(child->name);
+ free(child->output);
+ *child = e->children[--e->n_children];
+ goto found;
+ }
+ }
+ VLOG_WARN("child with unknown pid %ld terminated", (long int) pid);
+ found:;
+ }
+
+}
+
+void
+executer_wait(struct executer *e)
+{
+ if (e->n_children) {
+ size_t i;
+
+ /* Wake up on SIGCHLD. */
+ poll_fd_wait(signal_fds[0], POLLIN);
+
+ /* Wake up when we get output from a child. */
+ for (i = 0; i < e->n_children; i++) {
+ struct child *child = &e->children[i];
+ if (child->output_fd >= 0) {
+ poll_fd_wait(child->output_fd, POLLIN);
+ }
+ }
+ }
+}
+
+void
+executer_rconn_closing(struct executer *e, struct rconn *rconn)
+{
+ size_t i;
+
+ /* If any of our children was connected to 'r', then disconnect it so we
+ * don't try to reference a dead connection when the process terminates
+ * later.
+ * XXX kill the children started by 'r'? */
+ for (i = 0; i < e->n_children; i++) {
+ if (e->children[i].rconn == rconn) {
+ e->children[i].rconn = NULL;
+ }
+ }
+}
+
+static void
+sigchld_handler(int signr UNUSED)
+{
+ write(signal_fds[1], "", 1);
+}
+
+int
+executer_create(const char *command_acl, const char *command_dir,
+ struct executer **executerp)
+{
+ struct executer *e;
+ struct sigaction sa;
+
+ *executerp = NULL;
+ if (signal_fds[0] == -1) {
+ /* Make sure we can get a fd for /dev/null. */
+ int null_fd = get_null_fd();
+ if (null_fd < 0) {
+ return -null_fd;
+ }
+
+ /* Create pipe for notifying us that SIGCHLD was invoked. */
+ if (pipe(signal_fds)) {
+ VLOG_ERR("pipe failed: %s", strerror(errno));
+ return errno;
+ }
+ set_nonblocking(signal_fds[0]);
+ set_nonblocking(signal_fds[1]);
+ }
+
+ /* Set up signal handler. */
+ memset(&sa, 0, sizeof sa);
+ sa.sa_handler = sigchld_handler;
+ sigemptyset(&sa.sa_mask);
+ sa.sa_flags = SA_NOCLDSTOP | SA_RESTART;
+ if (sigaction(SIGCHLD, &sa, NULL)) {
+ VLOG_ERR("sigaction(SIGCHLD) failed: %s", strerror(errno));
+ return errno;
+ }
+
+ e = xcalloc(1, sizeof *e);
+ e->command_acl = xstrdup(command_acl);
+ e->command_dir = (command_dir
+ ? xstrdup(command_dir)
+ : xasprintf("%s/commands", ovs_pkgdatadir));
+ e->n_children = 0;
+ *executerp = e;
+ return 0;
+}
+
+void
+executer_destroy(struct executer *e)
+{
+ if (e) {
+ size_t i;
+
+ free(e->command_acl);
+ free(e->command_dir);
+ for (i = 0; i < e->n_children; i++) {
+ struct child *child = &e->children[i];
+
+ free(child->name);
+ kill(child->pid, SIGHUP);
+ /* We don't own child->rconn. */
+ free(child->output);
+ free(child);
+ }
+ free(e);
+ }
+}
+
+void
+executer_set_acl(struct executer *e, const char *acl, const char *dir)
+{
+ free(e->command_acl);
+ e->command_acl = xstrdup(acl);
+ free(e->command_dir);
+ e->command_dir = xstrdup(dir);
+}
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef EXECUTER_H
+#define EXECUTER_H 1
+
+struct executer;
+struct nicira_header;
+struct rconn;
+
+int executer_create(const char *acl, const char *dir, struct executer **);
+void executer_set_acl(struct executer *, const char *acl, const char *dir);
+void executer_destroy(struct executer *);
+void executer_run(struct executer *);
+void executer_wait(struct executer *);
+void executer_rconn_closing(struct executer *, struct rconn *);
+int executer_handle_request(struct executer *, struct rconn *,
+ struct nicira_header *);
+
+#endif /* executer.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "fail-open.h"
+#include <inttypes.h>
+#include <stdlib.h>
+#include "flow.h"
+#include "mac-learning.h"
+#include "odp-util.h"
+#include "ofpbuf.h"
+#include "ofproto.h"
+#include "pktbuf.h"
+#include "poll-loop.h"
+#include "rconn.h"
+#include "status.h"
+#include "timeval.h"
+#include "vconn.h"
+
+#define THIS_MODULE VLM_fail_open
+#include "vlog.h"
+
+/*
+ * Fail-open mode.
+ *
+ * In fail-open mode, the switch detects when the controller cannot be
+ * contacted or when the controller is dropping switch connections because the
+ * switch does not pass its admission control policy. In those situations the
+ * switch sets up flows itself using the "normal" action.
+ *
+ * There is a little subtlety to implementation, to properly handle the case
+ * where the controller allows switch connections but drops them a few seconds
+ * later for admission control reasons. Because of this case, we don't want to
+ * just stop setting up flows when we connect to the controller: if we did,
+ * then new flow setup and existing flows would stop during the duration of
+ * connection to the controller, and thus the whole network would go down for
+ * that period of time.
+ *
+ * So, instead, we add some special caseswhen we are connected to a controller,
+ * but not yet sure that it has admitted us:
+ *
+ * - We set up flows immediately ourselves, but simultaneously send out an
+ * OFPT_PACKET_IN to the controller. We put a special bogus buffer-id in
+ * these OFPT_PACKET_IN messages so that duplicate packets don't get sent
+ * out to the network when the controller replies.
+ *
+ * - We also send out OFPT_PACKET_IN messages for totally bogus packets
+ * every so often, in case no real new flows are arriving in the network.
+ *
+ * - We don't flush the flow table at the time we connect, because this
+ * could cause network stuttering in a switch with lots of flows or very
+ * high-bandwidth flows by suddenly throwing lots of packets down to
+ * userspace.
+ */
+
+struct fail_open {
+ struct ofproto *ofproto;
+ struct rconn *controller;
+ int trigger_duration;
+ int last_disconn_secs;
+ struct status_category *ss_cat;
+ long long int next_bogus_packet_in;
+ struct rconn_packet_counter *bogus_packet_counter;
+};
+
+/* Returns true if 'fo' should be in fail-open mode, otherwise false. */
+static inline bool
+should_fail_open(const struct fail_open *fo)
+{
+ return rconn_failure_duration(fo->controller) >= fo->trigger_duration;
+}
+
+/* Returns true if 'fo' is currently in fail-open mode, otherwise false. */
+bool
+fail_open_is_active(const struct fail_open *fo)
+{
+ return fo->last_disconn_secs != 0;
+}
+
+static void
+send_bogus_packet_in(struct fail_open *fo)
+{
+ uint8_t mac[ETH_ADDR_LEN];
+ struct ofpbuf *opi;
+ struct ofpbuf b;
+
+ /* Compose ofp_packet_in. */
+ ofpbuf_init(&b, 128);
+ eth_addr_random(mac);
+ compose_benign_packet(&b, "Open vSwitch Controller Probe", 0xa033, mac);
+ opi = make_packet_in(pktbuf_get_null(), OFPP_LOCAL, OFPR_NO_MATCH, &b, 64);
+ ofpbuf_uninit(&b);
+
+ /* Send. */
+ rconn_send_with_limit(fo->controller, opi, fo->bogus_packet_counter, 1);
+}
+
+/* Enter fail-open mode if we should be in it. Handle reconnecting to a
+ * controller from fail-open mode. */
+void
+fail_open_run(struct fail_open *fo)
+{
+ /* Enter fail-open mode if 'fo' is not in it but should be. */
+ if (should_fail_open(fo)) {
+ int disconn_secs = rconn_failure_duration(fo->controller);
+ if (!fail_open_is_active(fo)) {
+ VLOG_WARN("Could not connect to controller (or switch failed "
+ "controller's post-connection admission control "
+ "policy) for %d seconds, failing open", disconn_secs);
+ fo->last_disconn_secs = disconn_secs;
+
+ /* Flush all OpenFlow and datapath flows. We will set up our
+ * fail-open rule from fail_open_flushed() when
+ * ofproto_flush_flows() calls back to us. */
+ ofproto_flush_flows(fo->ofproto);
+ } else if (disconn_secs > fo->last_disconn_secs + 60) {
+ VLOG_INFO("Still in fail-open mode after %d seconds disconnected "
+ "from controller", disconn_secs);
+ fo->last_disconn_secs = disconn_secs;
+ }
+ }
+
+ /* Schedule a bogus packet-in if we're connected and in fail-open. */
+ if (fail_open_is_active(fo)) {
+ if (rconn_is_connected(fo->controller)) {
+ bool expired = time_msec() >= fo->next_bogus_packet_in;
+ if (expired) {
+ send_bogus_packet_in(fo);
+ }
+ if (expired || fo->next_bogus_packet_in == LLONG_MAX) {
+ fo->next_bogus_packet_in = time_msec() + 2000;
+ }
+ } else {
+ fo->next_bogus_packet_in = LLONG_MAX;
+ }
+ }
+
+}
+
+/* If 'fo' is currently in fail-open mode and its rconn has connected to the
+ * controller, exits fail open mode. */
+void
+fail_open_maybe_recover(struct fail_open *fo)
+{
+ if (fail_open_is_active(fo) && rconn_is_admitted(fo->controller)) {
+ flow_t flow;
+
+ VLOG_WARN("No longer in fail-open mode");
+ fo->last_disconn_secs = 0;
+ fo->next_bogus_packet_in = LLONG_MAX;
+
+ memset(&flow, 0, sizeof flow);
+ ofproto_delete_flow(fo->ofproto, &flow, OFPFW_ALL, FAIL_OPEN_PRIORITY);
+ }
+}
+
+void
+fail_open_wait(struct fail_open *fo)
+{
+ if (fo->next_bogus_packet_in != LLONG_MAX) {
+ poll_timer_wait(fo->next_bogus_packet_in - time_msec());
+ }
+}
+
+void
+fail_open_flushed(struct fail_open *fo)
+{
+ int disconn_secs = rconn_failure_duration(fo->controller);
+ bool open = disconn_secs >= fo->trigger_duration;
+ if (open) {
+ union ofp_action action;
+ flow_t flow;
+
+ /* Set up a flow that matches every packet and directs them to
+ * OFPP_NORMAL. */
+ memset(&action, 0, sizeof action);
+ action.type = htons(OFPAT_OUTPUT);
+ action.output.len = htons(sizeof action);
+ action.output.port = htons(OFPP_NORMAL);
+ memset(&flow, 0, sizeof flow);
+ ofproto_add_flow(fo->ofproto, &flow, OFPFW_ALL, FAIL_OPEN_PRIORITY,
+ &action, 1, 0);
+ }
+}
+
+static void
+fail_open_status_cb(struct status_reply *sr, void *fo_)
+{
+ struct fail_open *fo = fo_;
+ int cur_duration = rconn_failure_duration(fo->controller);
+
+ status_reply_put(sr, "trigger-duration=%d", fo->trigger_duration);
+ status_reply_put(sr, "current-duration=%d", cur_duration);
+ status_reply_put(sr, "triggered=%s",
+ cur_duration >= fo->trigger_duration ? "true" : "false");
+}
+
+struct fail_open *
+fail_open_create(struct ofproto *ofproto,
+ int trigger_duration, struct switch_status *switch_status,
+ struct rconn *controller)
+{
+ struct fail_open *fo = xmalloc(sizeof *fo);
+ fo->ofproto = ofproto;
+ fo->controller = controller;
+ fo->trigger_duration = trigger_duration;
+ fo->last_disconn_secs = 0;
+ fo->ss_cat = switch_status_register(switch_status, "fail-open",
+ fail_open_status_cb, fo);
+ fo->next_bogus_packet_in = LLONG_MAX;
+ fo->bogus_packet_counter = rconn_packet_counter_create();
+ return fo;
+}
+
+void
+fail_open_set_trigger_duration(struct fail_open *fo, int trigger_duration)
+{
+ fo->trigger_duration = trigger_duration;
+}
+
+void
+fail_open_destroy(struct fail_open *fo)
+{
+ if (fo) {
+ /* We don't own fo->controller. */
+ switch_status_unregister(fo->ss_cat);
+ rconn_packet_counter_destroy(fo->bogus_packet_counter);
+ free(fo);
+ }
+}
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef FAIL_OPEN_H
+#define FAIL_OPEN_H 1
+
+#include <stdbool.h>
+#include <stdint.h>
+#include "flow.h"
+
+struct fail_open;
+struct ofproto;
+struct rconn;
+struct switch_status;
+
+/* Priority of the rule added by the fail-open subsystem when a switch enters
+ * fail-open mode. This priority value uniquely identifies a fail-open flow
+ * (OpenFlow priorities max out at 65535 and nothing else in Open vSwitch
+ * creates flows with this priority). */
+#define FAIL_OPEN_PRIORITY 70000
+
+struct fail_open *fail_open_create(struct ofproto *, int trigger_duration,
+ struct switch_status *,
+ struct rconn *controller);
+void fail_open_set_trigger_duration(struct fail_open *, int trigger_duration);
+void fail_open_destroy(struct fail_open *);
+void fail_open_wait(struct fail_open *);
+bool fail_open_is_active(const struct fail_open *);
+void fail_open_run(struct fail_open *);
+void fail_open_maybe_recover(struct fail_open *);
+void fail_open_flushed(struct fail_open *);
+
+#endif /* fail-open.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "in-band.h"
+#include <arpa/inet.h>
+#include <errno.h>
+#include <inttypes.h>
+#include <net/if.h>
+#include <string.h>
+#include <stdlib.h>
+#include "dhcp.h"
+#include "dpif.h"
+#include "flow.h"
+#include "mac-learning.h"
+#include "netdev.h"
+#include "odp-util.h"
+#include "ofp-print.h"
+#include "ofproto.h"
+#include "ofpbuf.h"
+#include "openflow/openflow.h"
+#include "openvswitch/datapath-protocol.h"
+#include "packets.h"
+#include "poll-loop.h"
+#include "rconn.h"
+#include "status.h"
+#include "timeval.h"
+#include "vconn.h"
+
+#define THIS_MODULE VLM_in_band
+#include "vlog.h"
+
+/* In-band control allows a single network to be used for OpenFlow
+ * traffic and other data traffic. Refer to ovs-vswitchd.conf(5) and
+ * secchan(8) for a description of configuring in-band control.
+ *
+ * This comment is an attempt to describe how in-band control works at a
+ * wire- and implementation-level. Correctly implementing in-band
+ * control has proven difficult due to its many subtleties, and has thus
+ * gone through many iterations. Please read through and understand the
+ * reasoning behind the chosen rules before making modifications.
+ *
+ * In Open vSwitch, in-band control is implemented as "hidden" flows (in
+ * that they are not visible through OpenFlow) and at a higher priority
+ * than wildcarded flows can be set up by the controller. This is done
+ * so that the controller cannot interfere with them and possibly break
+ * connectivity with its switches. It is possible to see all flows,
+ * including in-band ones, with the ovs-appctl "bridge/dump-flows"
+ * command.
+ *
+ * The following rules are always enabled with the "normal" action by a
+ * switch with in-band control:
+ *
+ * a. DHCP requests sent from the local port.
+ * b. ARP replies to the local port's MAC address.
+ * c. ARP requests from the local port's MAC address.
+ * d. ARP replies to the remote side's MAC address. Note that the
+ * remote side is either the controller or the gateway to reach
+ * the controller.
+ * e. ARP requests from the remote side's MAC address. Note that
+ * like (d), the MAC is either for the controller or gateway.
+ * f. ARP replies containing the controller's IP address as a target.
+ * g. ARP requests containing the controller's IP address as a source.
+ * h. OpenFlow (6633/tcp) traffic to the controller's IP.
+ * i. OpenFlow (6633/tcp) traffic from the controller's IP.
+ *
+ * The goal of these rules is to be as narrow as possible to allow a
+ * switch to join a network and be able to communicate with a
+ * controller. As mentioned earlier, these rules have higher priority
+ * than the controller's rules, so if they are too broad, they may
+ * prevent the controller from implementing its policy. As such,
+ * in-band actively monitors some aspects of flow and packet processing
+ * so that the rules can be made more precise.
+ *
+ * In-band control monitors attempts to add flows into the datapath that
+ * could interfere with its duties. The datapath only allows exact
+ * match entries, so in-band control is able to be very precise about
+ * the flows it prevents. Flows that miss in the datapath are sent to
+ * userspace to be processed, so preventing these flows from being
+ * cached in the "fast path" does not affect correctness. The only type
+ * of flow that is currently prevented is one that would prevent DHCP
+ * replies from being seen by the local port. For example, a rule that
+ * forwarded all DHCP traffic to the controller would not be allowed,
+ * but one that forwarded to all ports (including the local port) would.
+ *
+ * As mentioned earlier, packets that miss in the datapath are sent to
+ * the userspace for processing. The userspace has its own flow table,
+ * the "classifier", so in-band checks whether any special processing
+ * is needed before the classifier is consulted. If a packet is a DHCP
+ * response to a request from the local port, the packet is forwarded to
+ * the local port, regardless of the flow table. Note that this requires
+ * L7 processing of DHCP replies to determine whether the 'chaddr' field
+ * matches the MAC address of the local port.
+ *
+ * It is interesting to note that for an L3-based in-band control
+ * mechanism, the majority of rules are devoted to ARP traffic. At first
+ * glance, some of these rules appear redundant. However, each serves an
+ * important role. First, in order to determine the MAC address of the
+ * remote side (controller or gateway) for other ARP rules, we must allow
+ * ARP traffic for our local port with rules (b) and (c). If we are
+ * between a switch and its connection to the controller, we have to
+ * allow the other switch's ARP traffic to through. This is done with
+ * rules (d) and (e), since we do not know the addresses of the other
+ * switches a priori, but do know the controller's or gateway's. Finally,
+ * if the controller is running in a local guest VM that is not reached
+ * through the local port, the switch that is connected to the VM must
+ * allow ARP traffic based on the controller's IP address, since it will
+ * not know the MAC address of the local port that is sending the traffic
+ * or the MAC address of the controller in the guest VM.
+ *
+ * With a few notable exceptions below, in-band should work in most
+ * network setups. The following are considered "supported' in the
+ * current implementation:
+ *
+ * - Locally Connected. The switch and controller are on the same
+ * subnet. This uses rules (a), (b), (c), (h), and (i).
+ *
+ * - Reached through Gateway. The switch and controller are on
+ * different subnets and must go through a gateway. This uses
+ * rules (a), (b), (c), (h), and (i).
+ *
+ * - Between Switch and Controller. This switch is between another
+ * switch and the controller, and we want to allow the other
+ * switch's traffic through. This uses rules (d), (e), (h), and
+ * (i). It uses (b) and (c) indirectly in order to know the MAC
+ * address for rules (d) and (e). Note that DHCP for the other
+ * switch will not work unless the controller explicitly lets this
+ * switch pass the traffic.
+ *
+ * - Between Switch and Gateway. This switch is between another
+ * switch and the gateway, and we want to allow the other switch's
+ * traffic through. This uses the same rules and logic as the
+ * "Between Switch and Controller" configuration described earlier.
+ *
+ * - Controller on Local VM. The controller is a guest VM on the
+ * system running in-band control. This uses rules (a), (b), (c),
+ * (h), and (i).
+ *
+ * - Controller on Local VM with Different Networks. The controller
+ * is a guest VM on the system running in-band control, but the
+ * local port is not used to connect to the controller. For
+ * example, an IP address is configured on eth0 of the switch. The
+ * controller's VM is connected through eth1 of the switch, but an
+ * IP address has not been configured for that port on the switch.
+ * As such, the switch will use eth0 to connect to the controller,
+ * and eth1's rules about the local port will not work. In the
+ * example, the switch attached to eth0 would use rules (a), (b),
+ * (c), (h), and (i) on eth0. The switch attached to eth1 would use
+ * rules (f), (g), (h), and (i).
+ *
+ * The following are explicitly *not* supported by in-band control:
+ *
+ * - Specify Controller by Name. Currently, the controller must be
+ * identified by IP address. A naive approach would be to permit
+ * all DNS traffic. Unfortunately, this would prevent the
+ * controller from defining any policy over DNS. Since switches
+ * that are located behind us need to connect to the controller,
+ * in-band cannot simply add a rule that allows DNS traffic from
+ * the local port. The "correct" way to support this is to parse
+ * DNS requests to allow all traffic related to a request for the
+ * controller's name through. Due to the potential security
+ * problems and amount of processing, we decided to hold off for
+ * the time-being.
+ *
+ * - Multiple Controllers. There is nothing intrinsic in the high-
+ * level design that prevents using multiple (known) controllers,
+ * however, the current implementation's data structures assume
+ * only one.
+ *
+ * - Differing Controllers for Switches. All switches must know
+ * the L3 addresses for all the controllers that other switches
+ * may use, since rules need to be set up to allow traffic related
+ * to those controllers through. See rules (f), (g), (h), and (i).
+ *
+ * - Differing Routes for Switches. In order for the switch to
+ * allow other switches to connect to a controller through a
+ * gateway, it allows the gateway's traffic through with rules (d)
+ * and (e). If the routes to the controller differ for the two
+ * switches, we will not know the MAC address of the alternate
+ * gateway.
+ */
+
+#define IB_BASE_PRIORITY 18181800
+
+enum {
+ IBR_FROM_LOCAL_DHCP, /* (a) From local port, DHCP. */
+ IBR_TO_LOCAL_ARP, /* (b) To local port, ARP. */
+ IBR_FROM_LOCAL_ARP, /* (c) From local port, ARP. */
+ IBR_TO_REMOTE_ARP, /* (d) To remote MAC, ARP. */
+ IBR_FROM_REMOTE_ARP, /* (e) From remote MAC, ARP. */
+ IBR_TO_CTL_ARP, /* (f) To controller IP, ARP. */
+ IBR_FROM_CTL_ARP, /* (g) From controller IP, ARP. */
+ IBR_TO_CTL_OFP, /* (h) To controller, OpenFlow port. */
+ IBR_FROM_CTL_OFP, /* (i) From controller, OpenFlow port. */
+#if OFP_TCP_PORT != OFP_SSL_PORT
+#error Need to support separate TCP and SSL flows.
+#endif
+ N_IB_RULES
+};
+
+struct ib_rule {
+ bool installed;
+ flow_t flow;
+ uint32_t wildcards;
+ unsigned int priority;
+};
+
+struct in_band {
+ struct ofproto *ofproto;
+ struct rconn *controller;
+ struct status_category *ss_cat;
+
+ /* Keep track of local port's information. */
+ uint8_t local_mac[ETH_ADDR_LEN]; /* Current MAC. */
+ struct netdev *local_netdev; /* Local port's network device. */
+ time_t next_local_refresh;
+
+ /* Keep track of controller and next hop's information. */
+ uint32_t controller_ip; /* Controller IP, 0 if unknown. */
+ uint8_t remote_mac[ETH_ADDR_LEN]; /* Remote MAC. */
+ struct netdev *remote_netdev;
+ uint8_t last_remote_mac[ETH_ADDR_LEN]; /* Previous remote MAC. */
+ time_t next_remote_refresh;
+
+ /* Rules that we set up. */
+ struct ib_rule rules[N_IB_RULES];
+};
+
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(60, 60);
+
+static const uint8_t *
+get_remote_mac(struct in_band *ib)
+{
+ int retval;
+ bool have_mac;
+ struct in_addr c_in4; /* Controller's IP address. */
+ struct in_addr r_in4; /* Next hop IP address. */
+ char *next_hop_dev;
+ time_t now = time_now();
+
+ if (now >= ib->next_remote_refresh) {
+ /* Find the next-hop IP address. */
+ c_in4.s_addr = ib->controller_ip;
+ memset(ib->remote_mac, 0, sizeof ib->remote_mac);
+ retval = netdev_get_next_hop(ib->local_netdev,
+ &c_in4, &r_in4, &next_hop_dev);
+ if (retval) {
+ VLOG_WARN("cannot find route for controller ("IP_FMT"): %s",
+ IP_ARGS(&ib->controller_ip), strerror(retval));
+ ib->next_remote_refresh = now + 1;
+ return NULL;
+ }
+ if (!r_in4.s_addr) {
+ r_in4.s_addr = c_in4.s_addr;
+ }
+
+ /* Get the next-hop IP and network device. */
+ if (!ib->remote_netdev
+ || strcmp(netdev_get_name(ib->remote_netdev), next_hop_dev))
+ {
+ netdev_close(ib->remote_netdev);
+ retval = netdev_open(next_hop_dev, NETDEV_ETH_TYPE_NONE,
+ &ib->remote_netdev);
+ if (retval) {
+ VLOG_WARN_RL(&rl, "cannot open netdev %s (next hop "
+ "to controller "IP_FMT"): %s",
+ next_hop_dev, IP_ARGS(&ib->controller_ip),
+ strerror(retval));
+ ib->next_remote_refresh = now + 1;
+ return NULL;
+ }
+ }
+
+ /* Look up the MAC address of the next-hop IP address. */
+ retval = netdev_arp_lookup(ib->remote_netdev, r_in4.s_addr,
+ ib->remote_mac);
+ if (retval) {
+ VLOG_DBG_RL(&rl, "cannot look up remote MAC address ("IP_FMT"): %s",
+ IP_ARGS(&r_in4.s_addr), strerror(retval));
+ }
+ have_mac = !eth_addr_is_zero(ib->remote_mac);
+ free(next_hop_dev);
+ if (have_mac
+ && !eth_addr_equals(ib->last_remote_mac, ib->remote_mac)) {
+ VLOG_DBG("remote MAC address changed from "ETH_ADDR_FMT" to "
+ ETH_ADDR_FMT,
+ ETH_ADDR_ARGS(ib->last_remote_mac),
+ ETH_ADDR_ARGS(ib->remote_mac));
+ memcpy(ib->last_remote_mac, ib->remote_mac, ETH_ADDR_LEN);
+ }
+
+ /* Schedule next refresh.
+ *
+ * If we have an IP address but not a MAC address, then refresh
+ * quickly, since we probably will get a MAC address soon (via ARP).
+ * Otherwise, we can afford to wait a little while. */
+ ib->next_remote_refresh
+ = now + (!ib->controller_ip || have_mac ? 10 : 1);
+ }
+
+ return !eth_addr_is_zero(ib->remote_mac) ? ib->remote_mac : NULL;
+}
+
+static const uint8_t *
+get_local_mac(struct in_band *ib)
+{
+ time_t now = time_now();
+ if (now >= ib->next_local_refresh) {
+ uint8_t ea[ETH_ADDR_LEN];
+ if (ib->local_netdev && !netdev_get_etheraddr(ib->local_netdev, ea)) {
+ memcpy(ib->local_mac, ea, ETH_ADDR_LEN);
+ }
+ ib->next_local_refresh = now + 1;
+ }
+ return !eth_addr_is_zero(ib->local_mac) ? ib->local_mac : NULL;
+}
+
+static void
+in_band_status_cb(struct status_reply *sr, void *in_band_)
+{
+ struct in_band *in_band = in_band_;
+
+ if (!eth_addr_is_zero(in_band->local_mac)) {
+ status_reply_put(sr, "local-mac="ETH_ADDR_FMT,
+ ETH_ADDR_ARGS(in_band->local_mac));
+ }
+
+ if (!eth_addr_is_zero(in_band->remote_mac)) {
+ status_reply_put(sr, "remote-mac="ETH_ADDR_FMT,
+ ETH_ADDR_ARGS(in_band->remote_mac));
+ }
+}
+
+static void
+drop_flow(struct in_band *in_band, int rule_idx)
+{
+ struct ib_rule *rule = &in_band->rules[rule_idx];
+
+ if (rule->installed) {
+ rule->installed = false;
+ ofproto_delete_flow(in_band->ofproto, &rule->flow, rule->wildcards,
+ rule->priority);
+ }
+}
+
+/* out_port and fixed_fields are assumed never to change. */
+static void
+set_up_flow(struct in_band *in_band, int rule_idx, const flow_t *flow,
+ uint32_t fixed_fields, uint16_t out_port)
+{
+ struct ib_rule *rule = &in_band->rules[rule_idx];
+
+ if (!rule->installed || memcmp(flow, &rule->flow, sizeof *flow)) {
+ union ofp_action action;
+
+ drop_flow(in_band, rule_idx);
+
+ rule->installed = true;
+ rule->flow = *flow;
+ rule->wildcards = OFPFW_ALL & ~fixed_fields;
+ rule->priority = IB_BASE_PRIORITY + (N_IB_RULES - rule_idx);
+
+ action.type = htons(OFPAT_OUTPUT);
+ action.output.len = htons(sizeof action);
+ action.output.port = htons(out_port);
+ action.output.max_len = htons(0);
+ ofproto_add_flow(in_band->ofproto, &rule->flow, rule->wildcards,
+ rule->priority, &action, 1, 0);
+ }
+}
+
+/* Returns true if 'packet' should be sent to the local port regardless
+ * of the flow table. */
+bool
+in_band_msg_in_hook(struct in_band *in_band, const flow_t *flow,
+ const struct ofpbuf *packet)
+{
+ if (!in_band) {
+ return false;
+ }
+
+ /* Regardless of how the flow table is configured, we want to be
+ * able to see replies to our DHCP requests. */
+ if (flow->dl_type == htons(ETH_TYPE_IP)
+ && flow->nw_proto == IP_TYPE_UDP
+ && flow->tp_src == htons(DHCP_SERVER_PORT)
+ && flow->tp_dst == htons(DHCP_CLIENT_PORT)
+ && packet->l7) {
+ struct dhcp_header *dhcp;
+ const uint8_t *local_mac;
+
+ dhcp = ofpbuf_at(packet, (char *)packet->l7 - (char *)packet->data,
+ sizeof *dhcp);
+ if (!dhcp) {
+ return false;
+ }
+
+ local_mac = get_local_mac(in_band);
+ if (eth_addr_equals(dhcp->chaddr, local_mac)) {
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/* Returns true if the rule that would match 'flow' with 'actions' is
+ * allowed to be set up in the datapath. */
+bool
+in_band_rule_check(struct in_band *in_band, const flow_t *flow,
+ const struct odp_actions *actions)
+{
+ if (!in_band) {
+ return true;
+ }
+
+ /* Don't allow flows that would prevent DHCP replies from being seen
+ * by the local port. */
+ if (flow->dl_type == htons(ETH_TYPE_IP)
+ && flow->nw_proto == IP_TYPE_UDP
+ && flow->tp_src == htons(DHCP_SERVER_PORT)
+ && flow->tp_dst == htons(DHCP_CLIENT_PORT)) {
+ int i;
+
+ for (i=0; i<actions->n_actions; i++) {
+ if (actions->actions[i].output.type == ODPAT_OUTPUT
+ && actions->actions[i].output.port == ODPP_LOCAL) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ return true;
+}
+
+void
+in_band_run(struct in_band *in_band)
+{
+ time_t now = time_now();
+ uint32_t controller_ip;
+ const uint8_t *remote_mac;
+ const uint8_t *local_mac;
+ flow_t flow;
+
+ if (now < in_band->next_remote_refresh
+ && now < in_band->next_local_refresh) {
+ return;
+ }
+
+ controller_ip = rconn_get_remote_ip(in_band->controller);
+ if (in_band->controller_ip && controller_ip != in_band->controller_ip) {
+ VLOG_DBG("controller IP address changed from "IP_FMT" to "IP_FMT,
+ IP_ARGS(&in_band->controller_ip),
+ IP_ARGS(&controller_ip));
+ }
+ in_band->controller_ip = controller_ip;
+
+ remote_mac = get_remote_mac(in_band);
+ local_mac = get_local_mac(in_band);
+
+ if (local_mac) {
+ /* Allow DHCP requests to be sent from the local port. */
+ memset(&flow, 0, sizeof flow);
+ flow.in_port = ODPP_LOCAL;
+ flow.dl_type = htons(ETH_TYPE_IP);
+ memcpy(flow.dl_src, local_mac, ETH_ADDR_LEN);
+ flow.nw_proto = IP_TYPE_UDP;
+ flow.tp_src = htons(DHCP_CLIENT_PORT);
+ flow.tp_dst = htons(DHCP_SERVER_PORT);
+ set_up_flow(in_band, IBR_FROM_LOCAL_DHCP, &flow,
+ (OFPFW_IN_PORT | OFPFW_DL_TYPE | OFPFW_DL_SRC
+ | OFPFW_NW_PROTO | OFPFW_TP_SRC | OFPFW_TP_DST),
+ OFPP_NORMAL);
+
+ /* Allow the connection's interface to receive directed ARP traffic. */
+ memset(&flow, 0, sizeof flow);
+ flow.dl_type = htons(ETH_TYPE_ARP);
+ memcpy(flow.dl_dst, local_mac, ETH_ADDR_LEN);
+ flow.nw_proto = ARP_OP_REPLY;
+ set_up_flow(in_band, IBR_TO_LOCAL_ARP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_DL_DST | OFPFW_NW_PROTO),
+ OFPP_NORMAL);
+
+ /* Allow the connection's interface to be the source of ARP traffic. */
+ memset(&flow, 0, sizeof flow);
+ flow.dl_type = htons(ETH_TYPE_ARP);
+ memcpy(flow.dl_src, local_mac, ETH_ADDR_LEN);
+ flow.nw_proto = ARP_OP_REQUEST;
+ set_up_flow(in_band, IBR_FROM_LOCAL_ARP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_DL_SRC | OFPFW_NW_PROTO),
+ OFPP_NORMAL);
+ } else {
+ drop_flow(in_band, IBR_TO_LOCAL_ARP);
+ drop_flow(in_band, IBR_FROM_LOCAL_ARP);
+ }
+
+ if (remote_mac) {
+ /* Allow ARP replies to the remote side's MAC. */
+ memset(&flow, 0, sizeof flow);
+ flow.dl_type = htons(ETH_TYPE_ARP);
+ memcpy(flow.dl_dst, remote_mac, ETH_ADDR_LEN);
+ flow.nw_proto = ARP_OP_REPLY;
+ set_up_flow(in_band, IBR_TO_REMOTE_ARP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_DL_DST | OFPFW_NW_PROTO),
+ OFPP_NORMAL);
+
+ /* Allow ARP requests from the remote side's MAC. */
+ memset(&flow, 0, sizeof flow);
+ flow.dl_type = htons(ETH_TYPE_ARP);
+ memcpy(flow.dl_src, remote_mac, ETH_ADDR_LEN);
+ flow.nw_proto = ARP_OP_REQUEST;
+ set_up_flow(in_band, IBR_FROM_REMOTE_ARP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_DL_SRC | OFPFW_NW_PROTO),
+ OFPP_NORMAL);
+ } else {
+ drop_flow(in_band, IBR_TO_REMOTE_ARP);
+ drop_flow(in_band, IBR_FROM_REMOTE_ARP);
+ }
+
+ if (controller_ip) {
+ /* Allow ARP replies to the controller's IP. */
+ memset(&flow, 0, sizeof flow);
+ flow.dl_type = htons(ETH_TYPE_ARP);
+ flow.nw_proto = ARP_OP_REPLY;
+ flow.nw_dst = controller_ip;
+ set_up_flow(in_band, IBR_TO_CTL_ARP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_DST_MASK),
+ OFPP_NORMAL);
+
+ /* Allow ARP requests from the controller's IP. */
+ memset(&flow, 0, sizeof flow);
+ flow.dl_type = htons(ETH_TYPE_ARP);
+ flow.nw_proto = ARP_OP_REQUEST;
+ flow.nw_src = controller_ip;
+ set_up_flow(in_band, IBR_FROM_CTL_ARP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_SRC_MASK),
+ OFPP_NORMAL);
+
+ /* OpenFlow traffic to or from the controller.
+ *
+ * (A given field's value is completely ignored if it is wildcarded,
+ * which is why we can get away with using a single 'flow' in each
+ * case here.) */
+ memset(&flow, 0, sizeof flow);
+ flow.dl_type = htons(ETH_TYPE_IP);
+ flow.nw_proto = IP_TYPE_TCP;
+ flow.nw_src = controller_ip;
+ flow.nw_dst = controller_ip;
+ flow.tp_src = htons(OFP_TCP_PORT);
+ flow.tp_dst = htons(OFP_TCP_PORT);
+ set_up_flow(in_band, IBR_TO_CTL_OFP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_DST_MASK
+ | OFPFW_TP_DST), OFPP_NORMAL);
+ set_up_flow(in_band, IBR_FROM_CTL_OFP, &flow,
+ (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_SRC_MASK
+ | OFPFW_TP_SRC), OFPP_NORMAL);
+ } else {
+ drop_flow(in_band, IBR_TO_CTL_ARP);
+ drop_flow(in_band, IBR_FROM_CTL_ARP);
+ drop_flow(in_band, IBR_TO_CTL_OFP);
+ drop_flow(in_band, IBR_FROM_CTL_OFP);
+ }
+}
+
+void
+in_band_wait(struct in_band *in_band)
+{
+ time_t now = time_now();
+ time_t wakeup
+ = MIN(in_band->next_remote_refresh, in_band->next_local_refresh);
+ if (wakeup > now) {
+ poll_timer_wait((wakeup - now) * 1000);
+ } else {
+ poll_immediate_wake();
+ }
+}
+
+void
+in_band_flushed(struct in_band *in_band)
+{
+ int i;
+
+ for (i = 0; i < N_IB_RULES; i++) {
+ in_band->rules[i].installed = false;
+ }
+}
+
+int
+in_band_create(struct ofproto *ofproto, struct dpif *dpif,
+ struct switch_status *ss, struct rconn *controller,
+ struct in_band **in_bandp)
+{
+ struct in_band *in_band;
+ char local_name[IF_NAMESIZE];
+ struct netdev *local_netdev;
+ int error;
+
+ error = dpif_port_get_name(dpif, ODPP_LOCAL,
+ local_name, sizeof local_name);
+ if (error) {
+ VLOG_ERR("failed to initialize in-band control: cannot get name "
+ "of datapath local port (%s)", strerror(error));
+ return error;
+ }
+
+ error = netdev_open(local_name, NETDEV_ETH_TYPE_NONE, &local_netdev);
+ if (error) {
+ VLOG_ERR("failed to initialize in-band control: cannot open "
+ "datapath local port %s (%s)", local_name, strerror(error));
+ return error;
+ }
+
+ in_band = xcalloc(1, sizeof *in_band);
+ in_band->ofproto = ofproto;
+ in_band->controller = controller;
+ in_band->ss_cat = switch_status_register(ss, "in-band",
+ in_band_status_cb, in_band);
+ in_band->local_netdev = local_netdev;
+ in_band->next_local_refresh = TIME_MIN;
+ in_band->remote_netdev = NULL;
+ in_band->next_remote_refresh = TIME_MIN;
+
+ *in_bandp = in_band;
+
+ return 0;
+}
+
+void
+in_band_destroy(struct in_band *in_band)
+{
+ if (in_band) {
+ switch_status_unregister(in_band->ss_cat);
+ netdev_close(in_band->local_netdev);
+ netdev_close(in_band->remote_netdev);
+ /* We don't own the rconn. */
+ }
+}
+
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef IN_BAND_H
+#define IN_BAND_H 1
+
+#include "flow.h"
+
+struct dpif;
+struct in_band;
+struct odp_actions;
+struct ofproto;
+struct rconn;
+struct settings;
+struct switch_status;
+
+int in_band_create(struct ofproto *, struct dpif *, struct switch_status *,
+ struct rconn *controller, struct in_band **);
+void in_band_destroy(struct in_band *);
+void in_band_run(struct in_band *);
+bool in_band_msg_in_hook(struct in_band *, const flow_t *,
+ const struct ofpbuf *packet);
+bool in_band_rule_check(struct in_band *, const flow_t *,
+ const struct odp_actions *);
+void in_band_wait(struct in_band *);
+void in_band_flushed(struct in_band *);
+
+#endif /* in-band.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "netflow.h"
+#include <arpa/inet.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "cfg.h"
+#include "flow.h"
+#include "netflow.h"
+#include "ofpbuf.h"
+#include "ofproto.h"
+#include "packets.h"
+#include "socket-util.h"
+#include "svec.h"
+#include "timeval.h"
+#include "util.h"
+#include "xtoxll.h"
+
+#define THIS_MODULE VLM_netflow
+#include "vlog.h"
+
+#define NETFLOW_V5_VERSION 5
+
+static const int ACTIVE_TIMEOUT_DEFAULT = 600;
+
+/* Every NetFlow v5 message contains the header that follows. This is
+ * followed by up to thirty records that describe a terminating flow.
+ * We only send a single record per NetFlow message.
+ */
+struct netflow_v5_header {
+ uint16_t version; /* NetFlow version is 5. */
+ uint16_t count; /* Number of records in this message. */
+ uint32_t sysuptime; /* System uptime in milliseconds. */
+ uint32_t unix_secs; /* Number of seconds since Unix epoch. */
+ uint32_t unix_nsecs; /* Number of residual nanoseconds
+ after epoch seconds. */
+ uint32_t flow_seq; /* Number of flows since sending
+ messages began. */
+ uint8_t engine_type; /* Engine type. */
+ uint8_t engine_id; /* Engine id. */
+ uint16_t sampling_interval; /* Set to zero. */
+};
+BUILD_ASSERT_DECL(sizeof(struct netflow_v5_header) == 24);
+
+/* A NetFlow v5 description of a terminating flow. It is preceded by a
+ * NetFlow v5 header.
+ */
+struct netflow_v5_record {
+ uint32_t src_addr; /* Source IP address. */
+ uint32_t dst_addr; /* Destination IP address. */
+ uint32_t nexthop; /* IP address of next hop. Set to 0. */
+ uint16_t input; /* Input interface index. */
+ uint16_t output; /* Output interface index. */
+ uint32_t packet_count; /* Number of packets. */
+ uint32_t byte_count; /* Number of bytes. */
+ uint32_t init_time; /* Value of sysuptime on first packet. */
+ uint32_t used_time; /* Value of sysuptime on last packet. */
+
+ /* The 'src_port' and 'dst_port' identify the source and destination
+ * port, respectively, for TCP and UDP. For ICMP, the high-order
+ * byte identifies the type and low-order byte identifies the code
+ * in the 'dst_port' field. */
+ uint16_t src_port;
+ uint16_t dst_port;
+
+ uint8_t pad1;
+ uint8_t tcp_flags; /* Union of seen TCP flags. */
+ uint8_t ip_proto; /* IP protocol. */
+ uint8_t ip_tos; /* IP TOS value. */
+ uint16_t src_as; /* Source AS ID. Set to 0. */
+ uint16_t dst_as; /* Destination AS ID. Set to 0. */
+ uint8_t src_mask; /* Source mask bits. Set to 0. */
+ uint8_t dst_mask; /* Destination mask bits. Set to 0. */
+ uint8_t pad[2];
+};
+BUILD_ASSERT_DECL(sizeof(struct netflow_v5_record) == 48);
+
+struct netflow {
+ uint8_t engine_type; /* Value of engine_type to use. */
+ uint8_t engine_id; /* Value of engine_id to use. */
+ long long int boot_time; /* Time when netflow_create() was called. */
+ int *fds; /* Sockets for NetFlow collectors. */
+ size_t n_fds; /* Number of Netflow collectors. */
+ bool add_id_to_iface; /* Put the 7 least signficiant bits of
+ * 'engine_id' into the most signficant
+ * bits of the interface fields. */
+ uint32_t netflow_cnt; /* Flow sequence number for NetFlow. */
+ struct ofpbuf packet; /* NetFlow packet being accumulated. */
+ long long int active_timeout; /* Timeout for flows that are still active. */
+ long long int reconfig_time; /* When we reconfigured the timeouts. */
+};
+
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+
+static int
+open_collector(char *dst)
+{
+ char *save_ptr = NULL;
+ const char *host_name;
+ const char *port_string;
+ struct sockaddr_in sin;
+ int retval;
+ int fd;
+
+ /* Glibc 2.7 has a bug in strtok_r when compiling with optimization that
+ * can cause segfaults here:
+ * http://sources.redhat.com/bugzilla/show_bug.cgi?id=5614.
+ * Using "::" instead of the obvious ":" works around it. */
+ host_name = strtok_r(dst, ":", &save_ptr);
+ port_string = strtok_r(NULL, ":", &save_ptr);
+ if (!host_name) {
+ ovs_error(0, "%s: bad peer name format", dst);
+ return -EAFNOSUPPORT;
+ }
+ if (!port_string) {
+ ovs_error(0, "%s: bad port format", dst);
+ return -EAFNOSUPPORT;
+ }
+
+ memset(&sin, 0, sizeof sin);
+ sin.sin_family = AF_INET;
+ if (lookup_ip(host_name, &sin.sin_addr)) {
+ return -ENOENT;
+ }
+ sin.sin_port = htons(atoi(port_string));
+
+ fd = socket(AF_INET, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ VLOG_ERR("%s: socket: %s", dst, strerror(errno));
+ return -errno;
+ }
+
+ retval = set_nonblocking(fd);
+ if (retval) {
+ close(fd);
+ return -retval;
+ }
+
+ retval = connect(fd, (struct sockaddr *) &sin, sizeof sin);
+ if (retval < 0) {
+ int error = errno;
+ VLOG_ERR("%s: connect: %s", dst, strerror(error));
+ close(fd);
+ return -error;
+ }
+
+ return fd;
+}
+
+void
+netflow_expire(struct netflow *nf, struct netflow_flow *nf_flow,
+ struct ofexpired *expired)
+{
+ struct netflow_v5_header *nf_hdr;
+ struct netflow_v5_record *nf_rec;
+ struct timeval now;
+
+ nf_flow->last_expired += nf->active_timeout;
+
+ /* NetFlow only reports on IP packets and we should only report flows
+ * that actually have traffic. */
+ if (expired->flow.dl_type != htons(ETH_TYPE_IP) ||
+ expired->packet_count - nf_flow->packet_count_off == 0) {
+ return;
+ }
+
+ time_timeval(&now);
+
+ if (!nf->packet.size) {
+ nf_hdr = ofpbuf_put_zeros(&nf->packet, sizeof *nf_hdr);
+ nf_hdr->version = htons(NETFLOW_V5_VERSION);
+ nf_hdr->count = htons(0);
+ nf_hdr->sysuptime = htonl(time_msec() - nf->boot_time);
+ nf_hdr->unix_secs = htonl(now.tv_sec);
+ nf_hdr->unix_nsecs = htonl(now.tv_usec * 1000);
+ nf_hdr->flow_seq = htonl(nf->netflow_cnt++);
+ nf_hdr->engine_type = nf->engine_type;
+ nf_hdr->engine_id = nf->engine_id;
+ nf_hdr->sampling_interval = htons(0);
+ }
+
+ nf_hdr = nf->packet.data;
+ nf_hdr->count = htons(ntohs(nf_hdr->count) + 1);
+
+ nf_rec = ofpbuf_put_zeros(&nf->packet, sizeof *nf_rec);
+ nf_rec->src_addr = expired->flow.nw_src;
+ nf_rec->dst_addr = expired->flow.nw_dst;
+ nf_rec->nexthop = htons(0);
+ if (nf->add_id_to_iface) {
+ uint16_t iface = (nf->engine_id & 0x7f) << 9;
+ nf_rec->input = htons(iface | (expired->flow.in_port & 0x1ff));
+ nf_rec->output = htons(iface | (nf_flow->output_iface & 0x1ff));
+ } else {
+ nf_rec->input = htons(expired->flow.in_port);
+ nf_rec->output = htons(nf_flow->output_iface);
+ }
+ nf_rec->packet_count = htonl(MIN(expired->packet_count -
+ nf_flow->packet_count_off, UINT32_MAX));
+ nf_rec->byte_count = htonl(MIN(expired->byte_count -
+ nf_flow->byte_count_off, UINT32_MAX));
+ nf_rec->init_time = htonl(nf_flow->created - nf->boot_time);
+ nf_rec->used_time = htonl(MAX(nf_flow->created, expired->used)
+ - nf->boot_time);
+ if (expired->flow.nw_proto == IP_TYPE_ICMP) {
+ /* In NetFlow, the ICMP type and code are concatenated and
+ * placed in the 'dst_port' field. */
+ uint8_t type = ntohs(expired->flow.tp_src);
+ uint8_t code = ntohs(expired->flow.tp_dst);
+ nf_rec->src_port = htons(0);
+ nf_rec->dst_port = htons((type << 8) | code);
+ } else {
+ nf_rec->src_port = expired->flow.tp_src;
+ nf_rec->dst_port = expired->flow.tp_dst;
+ }
+ nf_rec->tcp_flags = nf_flow->tcp_flags;
+ nf_rec->ip_proto = expired->flow.nw_proto;
+ nf_rec->ip_tos = nf_flow->ip_tos;
+
+ /* Update flow tracking data. */
+ nf_flow->created = 0;
+ nf_flow->packet_count_off = expired->packet_count;
+ nf_flow->byte_count_off = expired->byte_count;
+ nf_flow->tcp_flags = 0;
+
+ /* NetFlow messages are limited to 30 records. */
+ if (ntohs(nf_hdr->count) >= 30) {
+ netflow_run(nf);
+ }
+}
+
+void
+netflow_run(struct netflow *nf)
+{
+ size_t i;
+
+ if (!nf->packet.size) {
+ return;
+ }
+
+ for (i = 0; i < nf->n_fds; i++) {
+ if (send(nf->fds[i], nf->packet.data, nf->packet.size, 0) == -1) {
+ VLOG_WARN_RL(&rl, "netflow message send failed: %s",
+ strerror(errno));
+ }
+ }
+ nf->packet.size = 0;
+}
+
+static void
+clear_collectors(struct netflow *nf)
+{
+ size_t i;
+
+ for (i = 0; i < nf->n_fds; i++) {
+ close(nf->fds[i]);
+ }
+ free(nf->fds);
+ nf->fds = NULL;
+ nf->n_fds = 0;
+}
+
+int
+netflow_set_options(struct netflow *nf,
+ const struct netflow_options *nf_options)
+{
+ struct svec collectors;
+ int error = 0;
+ size_t i;
+ long long int old_timeout;
+
+ nf->engine_type = nf_options->engine_type;
+ nf->engine_id = nf_options->engine_id;
+ nf->add_id_to_iface = nf_options->add_id_to_iface;
+
+ clear_collectors(nf);
+
+ svec_clone(&collectors, &nf_options->collectors);
+ svec_sort_unique(&collectors);
+
+ nf->fds = xmalloc(sizeof *nf->fds * collectors.n);
+ for (i = 0; i < collectors.n; i++) {
+ const char *name = collectors.names[i];
+ char *tmpname = xstrdup(name);
+ int fd = open_collector(tmpname);
+ free(tmpname);
+ if (fd >= 0) {
+ nf->fds[nf->n_fds++] = fd;
+ } else {
+ VLOG_WARN("couldn't open connection to collector (%s), "
+ "ignoring %s\n", strerror(-fd), name);
+ if (!error) {
+ error = -fd;
+ }
+ }
+ }
+
+ svec_destroy(&collectors);
+
+ old_timeout = nf->active_timeout;
+ if (nf_options->active_timeout != -1) {
+ nf->active_timeout = nf_options->active_timeout;
+ } else {
+ nf->active_timeout = ACTIVE_TIMEOUT_DEFAULT;
+ }
+ nf->active_timeout *= 1000;
+ if (old_timeout != nf->active_timeout) {
+ nf->reconfig_time = time_msec();
+ }
+
+ return error;
+}
+
+struct netflow *
+netflow_create(void)
+{
+ struct netflow *nf = xmalloc(sizeof *nf);
+ nf->engine_type = 0;
+ nf->engine_id = 0;
+ nf->boot_time = time_msec();
+ nf->fds = NULL;
+ nf->n_fds = 0;
+ nf->add_id_to_iface = false;
+ nf->netflow_cnt = 0;
+ ofpbuf_init(&nf->packet, 1500);
+ return nf;
+}
+
+void
+netflow_destroy(struct netflow *nf)
+{
+ if (nf) {
+ ofpbuf_uninit(&nf->packet);
+ clear_collectors(nf);
+ free(nf);
+ }
+}
+
+void
+netflow_flow_clear(struct netflow_flow *nf_flow)
+{
+ uint16_t output_iface = nf_flow->output_iface;
+
+ memset(nf_flow, 0, sizeof *nf_flow);
+ nf_flow->output_iface = output_iface;
+}
+
+void
+netflow_flow_update_time(struct netflow *nf, struct netflow_flow *nf_flow,
+ long long int used)
+{
+ if (!nf_flow->created) {
+ nf_flow->created = used;
+ }
+
+ if (!nf || !nf->active_timeout || !nf_flow->last_expired ||
+ nf->reconfig_time > nf_flow->last_expired) {
+ /* Keep the time updated to prevent a flood of expiration in
+ * the future. */
+ nf_flow->last_expired = time_msec();
+ }
+}
+
+void
+netflow_flow_update_flags(struct netflow_flow *nf_flow, uint8_t ip_tos,
+ uint8_t tcp_flags)
+{
+ nf_flow->ip_tos = ip_tos;
+ nf_flow->tcp_flags |= tcp_flags;
+}
+
+bool
+netflow_active_timeout_expired(struct netflow *nf, struct netflow_flow *nf_flow)
+{
+ if (nf->active_timeout) {
+ return time_msec() > nf_flow->last_expired + nf->active_timeout;
+ }
+
+ return false;
+}
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETFLOW_H
+#define NETFLOW_H 1
+
+#include "flow.h"
+#include "svec.h"
+
+struct ofexpired;
+
+struct netflow_options {
+ struct svec collectors;
+ uint8_t engine_type;
+ uint8_t engine_id;
+ int active_timeout;
+ bool add_id_to_iface;
+};
+
+enum netflow_output_ports {
+ NF_OUT_FLOOD = UINT16_MAX,
+ NF_OUT_MULTI = UINT16_MAX - 1,
+ NF_OUT_DROP = UINT16_MAX - 2
+};
+
+struct netflow_flow {
+ long long int last_expired; /* Time this flow last timed out. */
+ long long int created; /* Time flow was created since time out. */
+
+ uint64_t packet_count_off; /* Packet count at last time out. */
+ uint64_t byte_count_off; /* Byte count at last time out. */
+
+ uint16_t output_iface; /* Output interface index. */
+ uint8_t ip_tos; /* Last-seen IP type-of-service. */
+ uint8_t tcp_flags; /* Bitwise-OR of all TCP flags seen. */
+};
+
+struct netflow *netflow_create(void);
+void netflow_destroy(struct netflow *);
+int netflow_set_options(struct netflow *, const struct netflow_options *);
+void netflow_expire(struct netflow *, struct netflow_flow *,
+ struct ofexpired *);
+void netflow_run(struct netflow *);
+
+void netflow_flow_clear(struct netflow_flow *);
+void netflow_flow_update_time(struct netflow *, struct netflow_flow *,
+ long long int used);
+void netflow_flow_update_flags(struct netflow_flow *, uint8_t ip_tos,
+ uint8_t tcp_flags);
+bool netflow_active_timeout_expired(struct netflow *, struct netflow_flow *);
+
+#endif /* netflow.h */
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "ofproto.h"
+#include <errno.h>
+#include <inttypes.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include "classifier.h"
+#include "coverage.h"
+#include "discovery.h"
+#include "dpif.h"
+#include "dynamic-string.h"
+#include "executer.h"
+#include "fail-open.h"
+#include "in-band.h"
+#include "mac-learning.h"
+#include "netdev.h"
+#include "netflow.h"
+#include "odp-util.h"
+#include "ofp-print.h"
+#include "ofpbuf.h"
+#include "openflow/nicira-ext.h"
+#include "openflow/openflow.h"
+#include "openflow/openflow-mgmt.h"
+#include "openvswitch/datapath-protocol.h"
+#include "packets.h"
+#include "pinsched.h"
+#include "pktbuf.h"
+#include "poll-loop.h"
+#include "port-array.h"
+#include "rconn.h"
+#include "shash.h"
+#include "status.h"
+#include "stp.h"
+#include "svec.h"
+#include "tag.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "vconn.h"
+#include "vconn-ssl.h"
+#include "xtoxll.h"
+
+#define THIS_MODULE VLM_ofproto
+#include "vlog.h"
+
+enum {
+ DP_GROUP_FLOOD = 0,
+ DP_GROUP_ALL = 1
+};
+
+enum {
+ TABLEID_HASH = 0,
+ TABLEID_CLASSIFIER = 1
+};
+
+struct ofport {
+ struct netdev *netdev;
+ struct ofp_phy_port opp; /* In host byte order. */
+};
+
+static void ofport_free(struct ofport *);
+static void hton_ofp_phy_port(struct ofp_phy_port *);
+
+static int xlate_actions(const union ofp_action *in, size_t n_in,
+ const flow_t *flow, struct ofproto *ofproto,
+ const struct ofpbuf *packet,
+ struct odp_actions *out, tag_type *tags,
+ bool *may_set_up_flow, uint16_t *nf_output_iface);
+
+struct rule {
+ struct cls_rule cr;
+
+ uint16_t idle_timeout; /* In seconds from time of last use. */
+ uint16_t hard_timeout; /* In seconds from time of creation. */
+ long long int used; /* Last-used time (0 if never used). */
+ long long int created; /* Creation time. */
+ uint64_t packet_count; /* Number of packets received. */
+ uint64_t byte_count; /* Number of bytes received. */
+ uint64_t accounted_bytes; /* Number of bytes passed to account_cb. */
+ tag_type tags; /* Tags (set only by hooks). */
+ struct netflow_flow nf_flow; /* Per-flow NetFlow tracking data. */
+
+ /* If 'super' is non-NULL, this rule is a subrule, that is, it is an
+ * exact-match rule (having cr.wc.wildcards of 0) generated from the
+ * wildcard rule 'super'. In this case, 'list' is an element of the
+ * super-rule's list.
+ *
+ * If 'super' is NULL, this rule is a super-rule, and 'list' is the head of
+ * a list of subrules. A super-rule with no wildcards (where
+ * cr.wc.wildcards is 0) will never have any subrules. */
+ struct rule *super;
+ struct list list;
+
+ /* OpenFlow actions.
+ *
+ * A subrule has no actions (it uses the super-rule's actions). */
+ int n_actions;
+ union ofp_action *actions;
+
+ /* Datapath actions.
+ *
+ * A super-rule with wildcard fields never has ODP actions (since the
+ * datapath only supports exact-match flows). */
+ bool installed; /* Installed in datapath? */
+ bool may_install; /* True ordinarily; false if actions must
+ * be reassessed for every packet. */
+ int n_odp_actions;
+ union odp_action *odp_actions;
+};
+
+static inline bool
+rule_is_hidden(const struct rule *rule)
+{
+ /* Subrules are merely an implementation detail, so hide them from the
+ * controller. */
+ if (rule->super != NULL) {
+ return true;
+ }
+
+ /* Rules with priority higher than UINT16_MAX are set up by ofproto itself
+ * (e.g. by in-band control) and are intentionally hidden from the
+ * controller. */
+ if (rule->cr.priority > UINT16_MAX) {
+ return true;
+ }
+
+ return false;
+}
+
+static struct rule *rule_create(struct ofproto *, struct rule *super,
+ const union ofp_action *, size_t n_actions,
+ uint16_t idle_timeout, uint16_t hard_timeout);
+static void rule_free(struct rule *);
+static void rule_destroy(struct ofproto *, struct rule *);
+static struct rule *rule_from_cls_rule(const struct cls_rule *);
+static void rule_insert(struct ofproto *, struct rule *,
+ struct ofpbuf *packet, uint16_t in_port);
+static void rule_remove(struct ofproto *, struct rule *);
+static bool rule_make_actions(struct ofproto *, struct rule *,
+ const struct ofpbuf *packet);
+static void rule_install(struct ofproto *, struct rule *,
+ struct rule *displaced_rule);
+static void rule_uninstall(struct ofproto *, struct rule *);
+static void rule_post_uninstall(struct ofproto *, struct rule *);
+
+struct ofconn {
+ struct list node;
+ struct rconn *rconn;
+ struct pktbuf *pktbuf;
+ bool send_flow_exp;
+ int miss_send_len;
+
+ struct rconn_packet_counter *packet_in_counter;
+
+ /* Number of OpenFlow messages queued as replies to OpenFlow requests, and
+ * the maximum number before we stop reading OpenFlow requests. */
+#define OFCONN_REPLY_MAX 100
+ struct rconn_packet_counter *reply_counter;
+};
+
+static struct ofconn *ofconn_create(struct ofproto *, struct rconn *);
+static void ofconn_destroy(struct ofconn *, struct ofproto *);
+static void ofconn_run(struct ofconn *, struct ofproto *);
+static void ofconn_wait(struct ofconn *);
+static void queue_tx(struct ofpbuf *msg, const struct ofconn *ofconn,
+ struct rconn_packet_counter *counter);
+
+struct ofproto {
+ /* Settings. */
+ uint64_t datapath_id; /* Datapath ID. */
+ uint64_t fallback_dpid; /* Datapath ID if no better choice found. */
+ uint64_t mgmt_id; /* Management channel identifier. */
+ char *manufacturer; /* Manufacturer. */
+ char *hardware; /* Hardware. */
+ char *software; /* Software version. */
+ char *serial; /* Serial number. */
+
+ /* Datapath. */
+ struct dpif *dpif;
+ struct netdev_monitor *netdev_monitor;
+ struct port_array ports; /* Index is ODP port nr; ofport->opp.port_no is
+ * OFP port nr. */
+ struct shash port_by_name;
+ uint32_t max_ports;
+
+ /* Configuration. */
+ struct switch_status *switch_status;
+ struct status_category *ss_cat;
+ struct in_band *in_band;
+ struct discovery *discovery;
+ struct fail_open *fail_open;
+ struct pinsched *miss_sched, *action_sched;
+ struct executer *executer;
+ struct netflow *netflow;
+
+ /* Flow table. */
+ struct classifier cls;
+ bool need_revalidate;
+ long long int next_expiration;
+ struct tag_set revalidate_set;
+
+ /* OpenFlow connections. */
+ struct list all_conns;
+ struct ofconn *controller;
+ struct pvconn **listeners;
+ size_t n_listeners;
+ struct pvconn **snoops;
+ size_t n_snoops;
+
+ /* Hooks for ovs-vswitchd. */
+ const struct ofhooks *ofhooks;
+ void *aux;
+
+ /* Used by default ofhooks. */
+ struct mac_learning *ml;
+};
+
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+
+static const struct ofhooks default_ofhooks;
+
+static uint64_t pick_datapath_id(const struct ofproto *);
+static uint64_t pick_fallback_dpid(void);
+static void send_packet_in_miss(struct ofpbuf *, void *ofproto);
+static void send_packet_in_action(struct ofpbuf *, void *ofproto);
+static void update_used(struct ofproto *);
+static void update_stats(struct ofproto *, struct rule *,
+ const struct odp_flow_stats *);
+static void expire_rule(struct cls_rule *, void *ofproto);
+static void active_timeout(struct ofproto *ofproto, struct rule *rule);
+static bool revalidate_rule(struct ofproto *p, struct rule *rule);
+static void revalidate_cb(struct cls_rule *rule_, void *p_);
+
+static void handle_odp_msg(struct ofproto *, struct ofpbuf *);
+
+static void handle_openflow(struct ofconn *, struct ofproto *,
+ struct ofpbuf *);
+
+static void refresh_port_group(struct ofproto *, unsigned int group);
+static void update_port(struct ofproto *, const char *devname);
+static int init_ports(struct ofproto *);
+static void reinit_ports(struct ofproto *);
+
+int
+ofproto_create(const char *datapath, const struct ofhooks *ofhooks, void *aux,
+ struct ofproto **ofprotop)
+{
+ struct odp_stats stats;
+ struct ofproto *p;
+ struct dpif *dpif;
+ int error;
+
+ *ofprotop = NULL;
+
+ /* Connect to datapath and start listening for messages. */
+ error = dpif_open(datapath, &dpif);
+ if (error) {
+ VLOG_ERR("failed to open datapath %s: %s", datapath, strerror(error));
+ return error;
+ }
+ error = dpif_get_dp_stats(dpif, &stats);
+ if (error) {
+ VLOG_ERR("failed to obtain stats for datapath %s: %s",
+ datapath, strerror(error));
+ dpif_close(dpif);
+ return error;
+ }
+ error = dpif_recv_set_mask(dpif, ODPL_MISS | ODPL_ACTION);
+ if (error) {
+ VLOG_ERR("failed to listen on datapath %s: %s",
+ datapath, strerror(error));
+ dpif_close(dpif);
+ return error;
+ }
+ dpif_flow_flush(dpif);
+ dpif_recv_purge(dpif);
+
+ /* Initialize settings. */
+ p = xcalloc(1, sizeof *p);
+ p->fallback_dpid = pick_fallback_dpid();
+ p->datapath_id = p->fallback_dpid;
+ p->manufacturer = xstrdup("Nicira Networks, Inc.");
+ p->hardware = xstrdup("Reference Implementation");
+ p->software = xstrdup(VERSION BUILDNR);
+ p->serial = xstrdup("None");
+
+ /* Initialize datapath. */
+ p->dpif = dpif;
+ p->netdev_monitor = netdev_monitor_create();
+ port_array_init(&p->ports);
+ shash_init(&p->port_by_name);
+ p->max_ports = stats.max_ports;
+
+ /* Initialize submodules. */
+ p->switch_status = switch_status_create(p);
+ p->in_band = NULL;
+ p->discovery = NULL;
+ p->fail_open = NULL;
+ p->miss_sched = p->action_sched = NULL;
+ p->executer = NULL;
+ p->netflow = NULL;
+
+ /* Initialize flow table. */
+ classifier_init(&p->cls);
+ p->need_revalidate = false;
+ p->next_expiration = time_msec() + 1000;
+ tag_set_init(&p->revalidate_set);
+
+ /* Initialize OpenFlow connections. */
+ list_init(&p->all_conns);
+ p->controller = ofconn_create(p, rconn_create(5, 8));
+ p->controller->pktbuf = pktbuf_create();
+ p->controller->miss_send_len = OFP_DEFAULT_MISS_SEND_LEN;
+ p->listeners = NULL;
+ p->n_listeners = 0;
+ p->snoops = NULL;
+ p->n_snoops = 0;
+
+ /* Initialize hooks. */
+ if (ofhooks) {
+ p->ofhooks = ofhooks;
+ p->aux = aux;
+ p->ml = NULL;
+ } else {
+ p->ofhooks = &default_ofhooks;
+ p->aux = p;
+ p->ml = mac_learning_create();
+ }
+
+ /* Register switch status category. */
+ p->ss_cat = switch_status_register(p->switch_status, "remote",
+ rconn_status_cb, p->controller->rconn);
+
+ /* Almost done... */
+ error = init_ports(p);
+ if (error) {
+ ofproto_destroy(p);
+ return error;
+ }
+
+ /* Pick final datapath ID. */
+ p->datapath_id = pick_datapath_id(p);
+ VLOG_INFO("using datapath ID %012"PRIx64, p->datapath_id);
+
+ *ofprotop = p;
+ return 0;
+}
+
+void
+ofproto_set_datapath_id(struct ofproto *p, uint64_t datapath_id)
+{
+ uint64_t old_dpid = p->datapath_id;
+ p->datapath_id = datapath_id ? datapath_id : pick_datapath_id(p);
+ if (p->datapath_id != old_dpid) {
+ VLOG_INFO("datapath ID changed to %012"PRIx64, p->datapath_id);
+ rconn_reconnect(p->controller->rconn);
+ }
+}
+
+void
+ofproto_set_mgmt_id(struct ofproto *p, uint64_t mgmt_id)
+{
+ p->mgmt_id = mgmt_id;
+}
+
+void
+ofproto_set_probe_interval(struct ofproto *p, int probe_interval)
+{
+ probe_interval = probe_interval ? MAX(probe_interval, 5) : 0;
+ rconn_set_probe_interval(p->controller->rconn, probe_interval);
+ if (p->fail_open) {
+ int trigger_duration = probe_interval ? probe_interval * 3 : 15;
+ fail_open_set_trigger_duration(p->fail_open, trigger_duration);
+ }
+}
+
+void
+ofproto_set_max_backoff(struct ofproto *p, int max_backoff)
+{
+ rconn_set_max_backoff(p->controller->rconn, max_backoff);
+}
+
+void
+ofproto_set_desc(struct ofproto *p,
+ const char *manufacturer, const char *hardware,
+ const char *software, const char *serial)
+{
+ if (manufacturer) {
+ free(p->manufacturer);
+ p->manufacturer = xstrdup(manufacturer);
+ }
+ if (hardware) {
+ free(p->hardware);
+ p->hardware = xstrdup(hardware);
+ }
+ if (software) {
+ free(p->software);
+ p->software = xstrdup(software);
+ }
+ if (serial) {
+ free(p->serial);
+ p->serial = xstrdup(serial);
+ }
+}
+
+int
+ofproto_set_in_band(struct ofproto *p, bool in_band)
+{
+ if (in_band != (p->in_band != NULL)) {
+ if (in_band) {
+ return in_band_create(p, p->dpif, p->switch_status,
+ p->controller->rconn, &p->in_band);
+ } else {
+ ofproto_set_discovery(p, false, NULL, true);
+ in_band_destroy(p->in_band);
+ p->in_band = NULL;
+ }
+ rconn_reconnect(p->controller->rconn);
+ }
+ return 0;
+}
+
+int
+ofproto_set_discovery(struct ofproto *p, bool discovery,
+ const char *re, bool update_resolv_conf)
+{
+ if (discovery != (p->discovery != NULL)) {
+ if (discovery) {
+ int error = ofproto_set_in_band(p, true);
+ if (error) {
+ return error;
+ }
+ error = discovery_create(re, update_resolv_conf,
+ p->dpif, p->switch_status,
+ &p->discovery);
+ if (error) {
+ return error;
+ }
+ } else {
+ discovery_destroy(p->discovery);
+ p->discovery = NULL;
+ }
+ rconn_disconnect(p->controller->rconn);
+ } else if (discovery) {
+ discovery_set_update_resolv_conf(p->discovery, update_resolv_conf);
+ return discovery_set_accept_controller_re(p->discovery, re);
+ }
+ return 0;
+}
+
+int
+ofproto_set_controller(struct ofproto *ofproto, const char *controller)
+{
+ if (ofproto->discovery) {
+ return EINVAL;
+ } else if (controller) {
+ if (strcmp(rconn_get_name(ofproto->controller->rconn), controller)) {
+ return rconn_connect(ofproto->controller->rconn, controller);
+ } else {
+ return 0;
+ }
+ } else {
+ rconn_disconnect(ofproto->controller->rconn);
+ return 0;
+ }
+}
+
+static int
+set_pvconns(struct pvconn ***pvconnsp, size_t *n_pvconnsp,
+ const struct svec *svec)
+{
+ struct pvconn **pvconns = *pvconnsp;
+ size_t n_pvconns = *n_pvconnsp;
+ int retval = 0;
+ size_t i;
+
+ for (i = 0; i < n_pvconns; i++) {
+ pvconn_close(pvconns[i]);
+ }
+ free(pvconns);
+
+ pvconns = xmalloc(svec->n * sizeof *pvconns);
+ n_pvconns = 0;
+ for (i = 0; i < svec->n; i++) {
+ const char *name = svec->names[i];
+ struct pvconn *pvconn;
+ int error;
+
+ error = pvconn_open(name, &pvconn);
+ if (!error) {
+ pvconns[n_pvconns++] = pvconn;
+ } else {
+ VLOG_ERR("failed to listen on %s: %s", name, strerror(error));
+ if (!retval) {
+ retval = error;
+ }
+ }
+ }
+
+ *pvconnsp = pvconns;
+ *n_pvconnsp = n_pvconns;
+
+ return retval;
+}
+
+int
+ofproto_set_listeners(struct ofproto *ofproto, const struct svec *listeners)
+{
+ return set_pvconns(&ofproto->listeners, &ofproto->n_listeners, listeners);
+}
+
+int
+ofproto_set_snoops(struct ofproto *ofproto, const struct svec *snoops)
+{
+ return set_pvconns(&ofproto->snoops, &ofproto->n_snoops, snoops);
+}
+
+int
+ofproto_set_netflow(struct ofproto *ofproto,
+ const struct netflow_options *nf_options)
+{
+ if (nf_options->collectors.n) {
+ if (!ofproto->netflow) {
+ ofproto->netflow = netflow_create();
+ }
+ return netflow_set_options(ofproto->netflow, nf_options);
+ } else {
+ netflow_destroy(ofproto->netflow);
+ ofproto->netflow = NULL;
+ return 0;
+ }
+}
+
+void
+ofproto_set_failure(struct ofproto *ofproto, bool fail_open)
+{
+ if (fail_open) {
+ struct rconn *rconn = ofproto->controller->rconn;
+ int trigger_duration = rconn_get_probe_interval(rconn) * 3;
+ if (!ofproto->fail_open) {
+ ofproto->fail_open = fail_open_create(ofproto, trigger_duration,
+ ofproto->switch_status,
+ rconn);
+ } else {
+ fail_open_set_trigger_duration(ofproto->fail_open,
+ trigger_duration);
+ }
+ } else {
+ fail_open_destroy(ofproto->fail_open);
+ ofproto->fail_open = NULL;
+ }
+}
+
+void
+ofproto_set_rate_limit(struct ofproto *ofproto,
+ int rate_limit, int burst_limit)
+{
+ if (rate_limit > 0) {
+ if (!ofproto->miss_sched) {
+ ofproto->miss_sched = pinsched_create(rate_limit, burst_limit,
+ ofproto->switch_status);
+ ofproto->action_sched = pinsched_create(rate_limit, burst_limit,
+ NULL);
+ } else {
+ pinsched_set_limits(ofproto->miss_sched, rate_limit, burst_limit);
+ pinsched_set_limits(ofproto->action_sched,
+ rate_limit, burst_limit);
+ }
+ } else {
+ pinsched_destroy(ofproto->miss_sched);
+ ofproto->miss_sched = NULL;
+ pinsched_destroy(ofproto->action_sched);
+ ofproto->action_sched = NULL;
+ }
+}
+
+int
+ofproto_set_stp(struct ofproto *ofproto UNUSED, bool enable_stp)
+{
+ /* XXX */
+ if (enable_stp) {
+ VLOG_WARN("STP is not yet implemented");
+ return EINVAL;
+ } else {
+ return 0;
+ }
+}
+
+int
+ofproto_set_remote_execution(struct ofproto *ofproto, const char *command_acl,
+ const char *command_dir)
+{
+ if (command_acl) {
+ if (!ofproto->executer) {
+ return executer_create(command_acl, command_dir,
+ &ofproto->executer);
+ } else {
+ executer_set_acl(ofproto->executer, command_acl, command_dir);
+ }
+ } else {
+ executer_destroy(ofproto->executer);
+ ofproto->executer = NULL;
+ }
+ return 0;
+}
+
+uint64_t
+ofproto_get_datapath_id(const struct ofproto *ofproto)
+{
+ return ofproto->datapath_id;
+}
+
+uint64_t
+ofproto_get_mgmt_id(const struct ofproto *ofproto)
+{
+ return ofproto->mgmt_id;
+}
+
+int
+ofproto_get_probe_interval(const struct ofproto *ofproto)
+{
+ return rconn_get_probe_interval(ofproto->controller->rconn);
+}
+
+int
+ofproto_get_max_backoff(const struct ofproto *ofproto)
+{
+ return rconn_get_max_backoff(ofproto->controller->rconn);
+}
+
+bool
+ofproto_get_in_band(const struct ofproto *ofproto)
+{
+ return ofproto->in_band != NULL;
+}
+
+bool
+ofproto_get_discovery(const struct ofproto *ofproto)
+{
+ return ofproto->discovery != NULL;
+}
+
+const char *
+ofproto_get_controller(const struct ofproto *ofproto)
+{
+ return rconn_get_name(ofproto->controller->rconn);
+}
+
+void
+ofproto_get_listeners(const struct ofproto *ofproto, struct svec *listeners)
+{
+ size_t i;
+
+ for (i = 0; i < ofproto->n_listeners; i++) {
+ svec_add(listeners, pvconn_get_name(ofproto->listeners[i]));
+ }
+}
+
+void
+ofproto_get_snoops(const struct ofproto *ofproto, struct svec *snoops)
+{
+ size_t i;
+
+ for (i = 0; i < ofproto->n_snoops; i++) {
+ svec_add(snoops, pvconn_get_name(ofproto->snoops[i]));
+ }
+}
+
+void
+ofproto_destroy(struct ofproto *p)
+{
+ struct ofconn *ofconn, *next_ofconn;
+ struct ofport *ofport;
+ unsigned int port_no;
+ size_t i;
+
+ if (!p) {
+ return;
+ }
+
+ ofproto_flush_flows(p);
+ classifier_destroy(&p->cls);
+
+ LIST_FOR_EACH_SAFE (ofconn, next_ofconn, struct ofconn, node,
+ &p->all_conns) {
+ ofconn_destroy(ofconn, p);
+ }
+
+ dpif_close(p->dpif);
+ netdev_monitor_destroy(p->netdev_monitor);
+ PORT_ARRAY_FOR_EACH (ofport, &p->ports, port_no) {
+ ofport_free(ofport);
+ }
+ shash_destroy(&p->port_by_name);
+
+ switch_status_destroy(p->switch_status);
+ in_band_destroy(p->in_band);
+ discovery_destroy(p->discovery);
+ fail_open_destroy(p->fail_open);
+ pinsched_destroy(p->miss_sched);
+ pinsched_destroy(p->action_sched);
+ executer_destroy(p->executer);
+ netflow_destroy(p->netflow);
+
+ switch_status_unregister(p->ss_cat);
+
+ for (i = 0; i < p->n_listeners; i++) {
+ pvconn_close(p->listeners[i]);
+ }
+ free(p->listeners);
+
+ for (i = 0; i < p->n_snoops; i++) {
+ pvconn_close(p->snoops[i]);
+ }
+ free(p->snoops);
+
+ mac_learning_destroy(p->ml);
+
+ free(p);
+}
+
+int
+ofproto_run(struct ofproto *p)
+{
+ int error = ofproto_run1(p);
+ if (!error) {
+ error = ofproto_run2(p, false);
+ }
+ return error;
+}
+
+static void
+process_port_change(struct ofproto *ofproto, int error, char *devname)
+{
+ if (error == ENOBUFS) {
+ reinit_ports(ofproto);
+ } else if (!error) {
+ update_port(ofproto, devname);
+ free(devname);
+ }
+}
+
+int
+ofproto_run1(struct ofproto *p)
+{
+ struct ofconn *ofconn, *next_ofconn;
+ char *devname;
+ int error;
+ int i;
+
+ for (i = 0; i < 50; i++) {
+ struct ofpbuf *buf;
+ int error;
+
+ error = dpif_recv(p->dpif, &buf);
+ if (error) {
+ if (error == ENODEV) {
+ /* Someone destroyed the datapath behind our back. The caller
+ * better destroy us and give up, because we're just going to
+ * spin from here on out. */
+ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+ VLOG_ERR_RL(&rl, "%s: datapath was destroyed externally",
+ dpif_name(p->dpif));
+ return ENODEV;
+ }
+ break;
+ }
+
+ handle_odp_msg(p, buf);
+ }
+
+ while ((error = dpif_port_poll(p->dpif, &devname)) != EAGAIN) {
+ process_port_change(p, error, devname);
+ }
+ while ((error = netdev_monitor_poll(p->netdev_monitor,
+ &devname)) != EAGAIN) {
+ process_port_change(p, error, devname);
+ }
+
+ if (p->in_band) {
+ in_band_run(p->in_band);
+ }
+ if (p->discovery) {
+ char *controller_name;
+ if (rconn_is_connectivity_questionable(p->controller->rconn)) {
+ discovery_question_connectivity(p->discovery);
+ }
+ if (discovery_run(p->discovery, &controller_name)) {
+ if (controller_name) {
+ rconn_connect(p->controller->rconn, controller_name);
+ } else {
+ rconn_disconnect(p->controller->rconn);
+ }
+ }
+ }
+ pinsched_run(p->miss_sched, send_packet_in_miss, p);
+ pinsched_run(p->action_sched, send_packet_in_action, p);
+ if (p->executer) {
+ executer_run(p->executer);
+ }
+
+ LIST_FOR_EACH_SAFE (ofconn, next_ofconn, struct ofconn, node,
+ &p->all_conns) {
+ ofconn_run(ofconn, p);
+ }
+
+ /* Fail-open maintenance. Do this after processing the ofconns since
+ * fail-open checks the status of the controller rconn. */
+ if (p->fail_open) {
+ fail_open_run(p->fail_open);
+ }
+
+ for (i = 0; i < p->n_listeners; i++) {
+ struct vconn *vconn;
+ int retval;
+
+ retval = pvconn_accept(p->listeners[i], OFP_VERSION, &vconn);
+ if (!retval) {
+ ofconn_create(p, rconn_new_from_vconn("passive", vconn));
+ } else if (retval != EAGAIN) {
+ VLOG_WARN_RL(&rl, "accept failed (%s)", strerror(retval));
+ }
+ }
+
+ for (i = 0; i < p->n_snoops; i++) {
+ struct vconn *vconn;
+ int retval;
+
+ retval = pvconn_accept(p->snoops[i], OFP_VERSION, &vconn);
+ if (!retval) {
+ rconn_add_monitor(p->controller->rconn, vconn);
+ } else if (retval != EAGAIN) {
+ VLOG_WARN_RL(&rl, "accept failed (%s)", strerror(retval));
+ }
+ }
+
+ if (time_msec() >= p->next_expiration) {
+ COVERAGE_INC(ofproto_expiration);
+ p->next_expiration = time_msec() + 1000;
+ update_used(p);
+
+ classifier_for_each(&p->cls, CLS_INC_ALL, expire_rule, p);
+
+ /* Let the hook know that we're at a stable point: all outstanding data
+ * in existing flows has been accounted to the account_cb. Thus, the
+ * hook can now reasonably do operations that depend on having accurate
+ * flow volume accounting (currently, that's just bond rebalancing). */
+ if (p->ofhooks->account_checkpoint_cb) {
+ p->ofhooks->account_checkpoint_cb(p->aux);
+ }
+ }
+
+ if (p->netflow) {
+ netflow_run(p->netflow);
+ }
+
+ return 0;
+}
+
+struct revalidate_cbdata {
+ struct ofproto *ofproto;
+ bool revalidate_all; /* Revalidate all exact-match rules? */
+ bool revalidate_subrules; /* Revalidate all exact-match subrules? */
+ struct tag_set revalidate_set; /* Set of tags to revalidate. */
+};
+
+int
+ofproto_run2(struct ofproto *p, bool revalidate_all)
+{
+ if (p->need_revalidate || revalidate_all
+ || !tag_set_is_empty(&p->revalidate_set)) {
+ struct revalidate_cbdata cbdata;
+ cbdata.ofproto = p;
+ cbdata.revalidate_all = revalidate_all;
+ cbdata.revalidate_subrules = p->need_revalidate;
+ cbdata.revalidate_set = p->revalidate_set;
+ tag_set_init(&p->revalidate_set);
+ COVERAGE_INC(ofproto_revalidate);
+ classifier_for_each(&p->cls, CLS_INC_EXACT, revalidate_cb, &cbdata);
+ p->need_revalidate = false;
+ }
+
+ return 0;
+}
+
+void
+ofproto_wait(struct ofproto *p)
+{
+ struct ofconn *ofconn;
+ size_t i;
+
+ dpif_recv_wait(p->dpif);
+ dpif_port_poll_wait(p->dpif);
+ netdev_monitor_poll_wait(p->netdev_monitor);
+ LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
+ ofconn_wait(ofconn);
+ }
+ if (p->in_band) {
+ in_band_wait(p->in_band);
+ }
+ if (p->discovery) {
+ discovery_wait(p->discovery);
+ }
+ if (p->fail_open) {
+ fail_open_wait(p->fail_open);
+ }
+ pinsched_wait(p->miss_sched);
+ pinsched_wait(p->action_sched);
+ if (p->executer) {
+ executer_wait(p->executer);
+ }
+ if (!tag_set_is_empty(&p->revalidate_set)) {
+ poll_immediate_wake();
+ }
+ if (p->need_revalidate) {
+ /* Shouldn't happen, but if it does just go around again. */
+ VLOG_DBG_RL(&rl, "need revalidate in ofproto_wait_cb()");
+ poll_immediate_wake();
+ } else if (p->next_expiration != LLONG_MAX) {
+ poll_timer_wait(p->next_expiration - time_msec());
+ }
+ for (i = 0; i < p->n_listeners; i++) {
+ pvconn_wait(p->listeners[i]);
+ }
+ for (i = 0; i < p->n_snoops; i++) {
+ pvconn_wait(p->snoops[i]);
+ }
+}
+
+void
+ofproto_revalidate(struct ofproto *ofproto, tag_type tag)
+{
+ tag_set_add(&ofproto->revalidate_set, tag);
+}
+
+struct tag_set *
+ofproto_get_revalidate_set(struct ofproto *ofproto)
+{
+ return &ofproto->revalidate_set;
+}
+
+bool
+ofproto_is_alive(const struct ofproto *p)
+{
+ return p->discovery || rconn_is_alive(p->controller->rconn);
+}
+
+int
+ofproto_send_packet(struct ofproto *p, const flow_t *flow,
+ const union ofp_action *actions, size_t n_actions,
+ const struct ofpbuf *packet)
+{
+ struct odp_actions odp_actions;
+ int error;
+
+ error = xlate_actions(actions, n_actions, flow, p, packet, &odp_actions,
+ NULL, NULL, NULL);
+ if (error) {
+ return error;
+ }
+
+ /* XXX Should we translate the dpif_execute() errno value into an OpenFlow
+ * error code? */
+ dpif_execute(p->dpif, flow->in_port, odp_actions.actions,
+ odp_actions.n_actions, packet);
+ return 0;
+}
+
+void
+ofproto_add_flow(struct ofproto *p,
+ const flow_t *flow, uint32_t wildcards, unsigned int priority,
+ const union ofp_action *actions, size_t n_actions,
+ int idle_timeout)
+{
+ struct rule *rule;
+ rule = rule_create(p, NULL, actions, n_actions,
+ idle_timeout >= 0 ? idle_timeout : 5 /* XXX */, 0);
+ cls_rule_from_flow(&rule->cr, flow, wildcards, priority);
+ rule_insert(p, rule, NULL, 0);
+}
+
+void
+ofproto_delete_flow(struct ofproto *ofproto, const flow_t *flow,
+ uint32_t wildcards, unsigned int priority)
+{
+ struct rule *rule;
+
+ rule = rule_from_cls_rule(classifier_find_rule_exactly(&ofproto->cls,
+ flow, wildcards,
+ priority));
+ if (rule) {
+ rule_remove(ofproto, rule);
+ }
+}
+
+static void
+destroy_rule(struct cls_rule *rule_, void *ofproto_)
+{
+ struct rule *rule = rule_from_cls_rule(rule_);
+ struct ofproto *ofproto = ofproto_;
+
+ /* Mark the flow as not installed, even though it might really be
+ * installed, so that rule_remove() doesn't bother trying to uninstall it.
+ * There is no point in uninstalling it individually since we are about to
+ * blow away all the flows with dpif_flow_flush(). */
+ rule->installed = false;
+
+ rule_remove(ofproto, rule);
+}
+
+void
+ofproto_flush_flows(struct ofproto *ofproto)
+{
+ COVERAGE_INC(ofproto_flush);
+ classifier_for_each(&ofproto->cls, CLS_INC_ALL, destroy_rule, ofproto);
+ dpif_flow_flush(ofproto->dpif);
+ if (ofproto->in_band) {
+ in_band_flushed(ofproto->in_band);
+ }
+ if (ofproto->fail_open) {
+ fail_open_flushed(ofproto->fail_open);
+ }
+}
+\f
+static void
+reinit_ports(struct ofproto *p)
+{
+ struct svec devnames;
+ struct ofport *ofport;
+ unsigned int port_no;
+ struct odp_port *odp_ports;
+ size_t n_odp_ports;
+ size_t i;
+
+ svec_init(&devnames);
+ PORT_ARRAY_FOR_EACH (ofport, &p->ports, port_no) {
+ svec_add (&devnames, (char *) ofport->opp.name);
+ }
+ dpif_port_list(p->dpif, &odp_ports, &n_odp_ports);
+ for (i = 0; i < n_odp_ports; i++) {
+ svec_add (&devnames, odp_ports[i].devname);
+ }
+ free(odp_ports);
+
+ svec_sort_unique(&devnames);
+ for (i = 0; i < devnames.n; i++) {
+ update_port(p, devnames.names[i]);
+ }
+ svec_destroy(&devnames);
+}
+
+static void
+refresh_port_group(struct ofproto *p, unsigned int group)
+{
+ uint16_t *ports;
+ size_t n_ports;
+ struct ofport *port;
+ unsigned int port_no;
+
+ assert(group == DP_GROUP_ALL || group == DP_GROUP_FLOOD);
+
+ ports = xmalloc(port_array_count(&p->ports) * sizeof *ports);
+ n_ports = 0;
+ PORT_ARRAY_FOR_EACH (port, &p->ports, port_no) {
+ if (group == DP_GROUP_ALL || !(port->opp.config & OFPPC_NO_FLOOD)) {
+ ports[n_ports++] = port_no;
+ }
+ }
+ dpif_port_group_set(p->dpif, group, ports, n_ports);
+ free(ports);
+}
+
+static void
+refresh_port_groups(struct ofproto *p)
+{
+ refresh_port_group(p, DP_GROUP_FLOOD);
+ refresh_port_group(p, DP_GROUP_ALL);
+}
+
+static struct ofport *
+make_ofport(const struct odp_port *odp_port)
+{
+ enum netdev_flags flags;
+ struct ofport *ofport;
+ struct netdev *netdev;
+ bool carrier;
+ int error;
+
+ error = netdev_open(odp_port->devname, NETDEV_ETH_TYPE_NONE, &netdev);
+ if (error) {
+ VLOG_WARN_RL(&rl, "ignoring port %s (%"PRIu16") because netdev %s "
+ "cannot be opened (%s)",
+ odp_port->devname, odp_port->port,
+ odp_port->devname, strerror(error));
+ return NULL;
+ }
+
+ ofport = xmalloc(sizeof *ofport);
+ ofport->netdev = netdev;
+ ofport->opp.port_no = odp_port_to_ofp_port(odp_port->port);
+ netdev_get_etheraddr(netdev, ofport->opp.hw_addr);
+ memcpy(ofport->opp.name, odp_port->devname,
+ MIN(sizeof ofport->opp.name, sizeof odp_port->devname));
+ ofport->opp.name[sizeof ofport->opp.name - 1] = '\0';
+
+ netdev_get_flags(netdev, &flags);
+ ofport->opp.config = flags & NETDEV_UP ? 0 : OFPPC_PORT_DOWN;
+
+ netdev_get_carrier(netdev, &carrier);
+ ofport->opp.state = carrier ? 0 : OFPPS_LINK_DOWN;
+
+ netdev_get_features(netdev,
+ &ofport->opp.curr, &ofport->opp.advertised,
+ &ofport->opp.supported, &ofport->opp.peer);
+ return ofport;
+}
+
+static bool
+ofport_conflicts(const struct ofproto *p, const struct odp_port *odp_port)
+{
+ if (port_array_get(&p->ports, odp_port->port)) {
+ VLOG_WARN_RL(&rl, "ignoring duplicate port %"PRIu16" in datapath",
+ odp_port->port);
+ return true;
+ } else if (shash_find(&p->port_by_name, odp_port->devname)) {
+ VLOG_WARN_RL(&rl, "ignoring duplicate device %s in datapath",
+ odp_port->devname);
+ return true;
+ } else {
+ return false;
+ }
+}
+
+static int
+ofport_equal(const struct ofport *a_, const struct ofport *b_)
+{
+ const struct ofp_phy_port *a = &a_->opp;
+ const struct ofp_phy_port *b = &b_->opp;
+
+ BUILD_ASSERT_DECL(sizeof *a == 48); /* Detect ofp_phy_port changes. */
+ return (a->port_no == b->port_no
+ && !memcmp(a->hw_addr, b->hw_addr, sizeof a->hw_addr)
+ && !strcmp((char *) a->name, (char *) b->name)
+ && a->state == b->state
+ && a->config == b->config
+ && a->curr == b->curr
+ && a->advertised == b->advertised
+ && a->supported == b->supported
+ && a->peer == b->peer);
+}
+
+static void
+send_port_status(struct ofproto *p, const struct ofport *ofport,
+ uint8_t reason)
+{
+ /* XXX Should limit the number of queued port status change messages. */
+ struct ofconn *ofconn;
+ LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
+ struct ofp_port_status *ops;
+ struct ofpbuf *b;
+
+ ops = make_openflow_xid(sizeof *ops, OFPT_PORT_STATUS, 0, &b);
+ ops->reason = reason;
+ ops->desc = ofport->opp;
+ hton_ofp_phy_port(&ops->desc);
+ queue_tx(b, ofconn, NULL);
+ }
+ if (p->ofhooks->port_changed_cb) {
+ p->ofhooks->port_changed_cb(reason, &ofport->opp, p->aux);
+ }
+}
+
+static void
+ofport_install(struct ofproto *p, struct ofport *ofport)
+{
+ netdev_monitor_add(p->netdev_monitor, ofport->netdev);
+ port_array_set(&p->ports, ofp_port_to_odp_port(ofport->opp.port_no),
+ ofport);
+ shash_add(&p->port_by_name, (char *) ofport->opp.name, ofport);
+}
+
+static void
+ofport_remove(struct ofproto *p, struct ofport *ofport)
+{
+ netdev_monitor_remove(p->netdev_monitor, ofport->netdev);
+ port_array_set(&p->ports, ofp_port_to_odp_port(ofport->opp.port_no), NULL);
+ shash_delete(&p->port_by_name,
+ shash_find(&p->port_by_name, (char *) ofport->opp.name));
+}
+
+static void
+ofport_free(struct ofport *ofport)
+{
+ if (ofport) {
+ netdev_close(ofport->netdev);
+ free(ofport);
+ }
+}
+
+static void
+update_port(struct ofproto *p, const char *devname)
+{
+ struct odp_port odp_port;
+ struct ofport *old_ofport;
+ struct ofport *new_ofport;
+ int error;
+
+ COVERAGE_INC(ofproto_update_port);
+
+ /* Query the datapath for port information. */
+ error = dpif_port_query_by_name(p->dpif, devname, &odp_port);
+
+ /* Find the old ofport. */
+ old_ofport = shash_find_data(&p->port_by_name, devname);
+ if (!error) {
+ if (!old_ofport) {
+ /* There's no port named 'devname' but there might be a port with
+ * the same port number. This could happen if a port is deleted
+ * and then a new one added in its place very quickly, or if a port
+ * is renamed. In the former case we want to send an OFPPR_DELETE
+ * and an OFPPR_ADD, and in the latter case we want to send a
+ * single OFPPR_MODIFY. We can distinguish the cases by comparing
+ * the old port's ifindex against the new port, or perhaps less
+ * reliably but more portably by comparing the old port's MAC
+ * against the new port's MAC. However, this code isn't that smart
+ * and always sends an OFPPR_MODIFY (XXX). */
+ old_ofport = port_array_get(&p->ports, odp_port.port);
+ }
+ } else if (error != ENOENT && error != ENODEV) {
+ VLOG_WARN_RL(&rl, "dpif_port_query_by_name returned unexpected error "
+ "%s", strerror(error));
+ return;
+ }
+
+ /* Create a new ofport. */
+ new_ofport = !error ? make_ofport(&odp_port) : NULL;
+
+ /* Eliminate a few pathological cases. */
+ if (!old_ofport && !new_ofport) {
+ return;
+ } else if (old_ofport && new_ofport) {
+ /* Most of the 'config' bits are OpenFlow soft state, but
+ * OFPPC_PORT_DOWN is maintained the kernel. So transfer the OpenFlow
+ * bits from old_ofport. (make_ofport() only sets OFPPC_PORT_DOWN and
+ * leaves the other bits 0.) */
+ new_ofport->opp.config |= old_ofport->opp.config & ~OFPPC_PORT_DOWN;
+
+ if (ofport_equal(old_ofport, new_ofport)) {
+ /* False alarm--no change. */
+ ofport_free(new_ofport);
+ return;
+ }
+ }
+
+ /* Now deal with the normal cases. */
+ if (old_ofport) {
+ ofport_remove(p, old_ofport);
+ }
+ if (new_ofport) {
+ ofport_install(p, new_ofport);
+ }
+ send_port_status(p, new_ofport ? new_ofport : old_ofport,
+ (!old_ofport ? OFPPR_ADD
+ : !new_ofport ? OFPPR_DELETE
+ : OFPPR_MODIFY));
+ ofport_free(old_ofport);
+
+ /* Update port groups. */
+ refresh_port_groups(p);
+}
+
+static int
+init_ports(struct ofproto *p)
+{
+ struct odp_port *ports;
+ size_t n_ports;
+ size_t i;
+ int error;
+
+ error = dpif_port_list(p->dpif, &ports, &n_ports);
+ if (error) {
+ return error;
+ }
+
+ for (i = 0; i < n_ports; i++) {
+ const struct odp_port *odp_port = &ports[i];
+ if (!ofport_conflicts(p, odp_port)) {
+ struct ofport *ofport = make_ofport(odp_port);
+ if (ofport) {
+ ofport_install(p, ofport);
+ }
+ }
+ }
+ free(ports);
+ refresh_port_groups(p);
+ return 0;
+}
+\f
+static struct ofconn *
+ofconn_create(struct ofproto *p, struct rconn *rconn)
+{
+ struct ofconn *ofconn = xmalloc(sizeof *ofconn);
+ list_push_back(&p->all_conns, &ofconn->node);
+ ofconn->rconn = rconn;
+ ofconn->pktbuf = NULL;
+ ofconn->send_flow_exp = false;
+ ofconn->miss_send_len = 0;
+ ofconn->packet_in_counter = rconn_packet_counter_create ();
+ ofconn->reply_counter = rconn_packet_counter_create ();
+ return ofconn;
+}
+
+static void
+ofconn_destroy(struct ofconn *ofconn, struct ofproto *p)
+{
+ if (p->executer) {
+ executer_rconn_closing(p->executer, ofconn->rconn);
+ }
+
+ list_remove(&ofconn->node);
+ rconn_destroy(ofconn->rconn);
+ rconn_packet_counter_destroy(ofconn->packet_in_counter);
+ rconn_packet_counter_destroy(ofconn->reply_counter);
+ pktbuf_destroy(ofconn->pktbuf);
+ free(ofconn);
+}
+
+static void
+ofconn_run(struct ofconn *ofconn, struct ofproto *p)
+{
+ int iteration;
+
+ rconn_run(ofconn->rconn);
+
+ if (rconn_packet_counter_read (ofconn->reply_counter) < OFCONN_REPLY_MAX) {
+ /* Limit the number of iterations to prevent other tasks from
+ * starving. */
+ for (iteration = 0; iteration < 50; iteration++) {
+ struct ofpbuf *of_msg = rconn_recv(ofconn->rconn);
+ if (!of_msg) {
+ break;
+ }
+ if (p->fail_open) {
+ fail_open_maybe_recover(p->fail_open);
+ }
+ handle_openflow(ofconn, p, of_msg);
+ ofpbuf_delete(of_msg);
+ }
+ }
+
+ if (ofconn != p->controller && !rconn_is_alive(ofconn->rconn)) {
+ ofconn_destroy(ofconn, p);
+ }
+}
+
+static void
+ofconn_wait(struct ofconn *ofconn)
+{
+ rconn_run_wait(ofconn->rconn);
+ if (rconn_packet_counter_read (ofconn->reply_counter) < OFCONN_REPLY_MAX) {
+ rconn_recv_wait(ofconn->rconn);
+ } else {
+ COVERAGE_INC(ofproto_ofconn_stuck);
+ }
+}
+\f
+/* Caller is responsible for initializing the 'cr' member of the returned
+ * rule. */
+static struct rule *
+rule_create(struct ofproto *ofproto, struct rule *super,
+ const union ofp_action *actions, size_t n_actions,
+ uint16_t idle_timeout, uint16_t hard_timeout)
+{
+ struct rule *rule = xcalloc(1, sizeof *rule);
+ rule->idle_timeout = idle_timeout;
+ rule->hard_timeout = hard_timeout;
+ rule->used = rule->created = time_msec();
+ rule->super = super;
+ if (super) {
+ list_push_back(&super->list, &rule->list);
+ } else {
+ list_init(&rule->list);
+ }
+ rule->n_actions = n_actions;
+ rule->actions = xmemdup(actions, n_actions * sizeof *actions);
+ netflow_flow_clear(&rule->nf_flow);
+ netflow_flow_update_time(ofproto->netflow, &rule->nf_flow, rule->created);
+
+ return rule;
+}
+
+static struct rule *
+rule_from_cls_rule(const struct cls_rule *cls_rule)
+{
+ return cls_rule ? CONTAINER_OF(cls_rule, struct rule, cr) : NULL;
+}
+
+static void
+rule_free(struct rule *rule)
+{
+ free(rule->actions);
+ free(rule->odp_actions);
+ free(rule);
+}
+
+/* Destroys 'rule'. If 'rule' is a subrule, also removes it from its
+ * super-rule's list of subrules. If 'rule' is a super-rule, also iterates
+ * through all of its subrules and revalidates them, destroying any that no
+ * longer has a super-rule (which is probably all of them).
+ *
+ * Before calling this function, the caller must make have removed 'rule' from
+ * the classifier. If 'rule' is an exact-match rule, the caller is also
+ * responsible for ensuring that it has been uninstalled from the datapath. */
+static void
+rule_destroy(struct ofproto *ofproto, struct rule *rule)
+{
+ if (!rule->super) {
+ struct rule *subrule, *next;
+ LIST_FOR_EACH_SAFE (subrule, next, struct rule, list, &rule->list) {
+ revalidate_rule(ofproto, subrule);
+ }
+ } else {
+ list_remove(&rule->list);
+ }
+ rule_free(rule);
+}
+
+static bool
+rule_has_out_port(const struct rule *rule, uint16_t out_port)
+{
+ const union ofp_action *oa;
+ struct actions_iterator i;
+
+ if (out_port == htons(OFPP_NONE)) {
+ return true;
+ }
+ for (oa = actions_first(&i, rule->actions, rule->n_actions); oa;
+ oa = actions_next(&i)) {
+ if (oa->type == htons(OFPAT_OUTPUT) && oa->output.port == out_port) {
+ return true;
+ }
+ }
+ return false;
+}
+
+/* Executes the actions indicated by 'rule' on 'packet', which is in flow
+ * 'flow' and is considered to have arrived on ODP port 'in_port'.
+ *
+ * The flow that 'packet' actually contains does not need to actually match
+ * 'rule'; the actions in 'rule' will be applied to it either way. Likewise,
+ * the packet and byte counters for 'rule' will be credited for the packet sent
+ * out whether or not the packet actually matches 'rule'.
+ *
+ * If 'rule' is an exact-match rule and 'flow' actually equals the rule's flow,
+ * the caller must already have accurately composed ODP actions for it given
+ * 'packet' using rule_make_actions(). If 'rule' is a wildcard rule, or if
+ * 'rule' is an exact-match rule but 'flow' is not the rule's flow, then this
+ * function will compose a set of ODP actions based on 'rule''s OpenFlow
+ * actions and apply them to 'packet'. */
+static void
+rule_execute(struct ofproto *ofproto, struct rule *rule,
+ struct ofpbuf *packet, const flow_t *flow)
+{
+ const union odp_action *actions;
+ size_t n_actions;
+ struct odp_actions a;
+
+ /* Grab or compose the ODP actions.
+ *
+ * The special case for an exact-match 'rule' where 'flow' is not the
+ * rule's flow is important to avoid, e.g., sending a packet out its input
+ * port simply because the ODP actions were composed for the wrong
+ * scenario. */
+ if (rule->cr.wc.wildcards || !flow_equal(flow, &rule->cr.flow)) {
+ struct rule *super = rule->super ? rule->super : rule;
+ if (xlate_actions(super->actions, super->n_actions, flow, ofproto,
+ packet, &a, NULL, 0, NULL)) {
+ return;
+ }
+ actions = a.actions;
+ n_actions = a.n_actions;
+ } else {
+ actions = rule->odp_actions;
+ n_actions = rule->n_odp_actions;
+ }
+
+ /* Execute the ODP actions. */
+ if (!dpif_execute(ofproto->dpif, flow->in_port,
+ actions, n_actions, packet)) {
+ struct odp_flow_stats stats;
+ flow_extract_stats(flow, packet, &stats);
+ update_stats(ofproto, rule, &stats);
+ rule->used = time_msec();
+ netflow_flow_update_time(ofproto->netflow, &rule->nf_flow, rule->used);
+ }
+}
+
+static void
+rule_insert(struct ofproto *p, struct rule *rule, struct ofpbuf *packet,
+ uint16_t in_port)
+{
+ struct rule *displaced_rule;
+
+ /* Insert the rule in the classifier. */
+ displaced_rule = rule_from_cls_rule(classifier_insert(&p->cls, &rule->cr));
+ if (!rule->cr.wc.wildcards) {
+ rule_make_actions(p, rule, packet);
+ }
+
+ /* Send the packet and credit it to the rule. */
+ if (packet) {
+ flow_t flow;
+ flow_extract(packet, in_port, &flow);
+ rule_execute(p, rule, packet, &flow);
+ }
+
+ /* Install the rule in the datapath only after sending the packet, to
+ * avoid packet reordering. */
+ if (rule->cr.wc.wildcards) {
+ COVERAGE_INC(ofproto_add_wc_flow);
+ p->need_revalidate = true;
+ } else {
+ rule_install(p, rule, displaced_rule);
+ }
+
+ /* Free the rule that was displaced, if any. */
+ if (displaced_rule) {
+ rule_destroy(p, displaced_rule);
+ }
+}
+
+static struct rule *
+rule_create_subrule(struct ofproto *ofproto, struct rule *rule,
+ const flow_t *flow)
+{
+ struct rule *subrule = rule_create(ofproto, rule, NULL, 0,
+ rule->idle_timeout, rule->hard_timeout);
+ COVERAGE_INC(ofproto_subrule_create);
+ cls_rule_from_flow(&subrule->cr, flow, 0,
+ (rule->cr.priority <= UINT16_MAX ? UINT16_MAX
+ : rule->cr.priority));
+ classifier_insert_exact(&ofproto->cls, &subrule->cr);
+
+ return subrule;
+}
+
+static void
+rule_remove(struct ofproto *ofproto, struct rule *rule)
+{
+ if (rule->cr.wc.wildcards) {
+ COVERAGE_INC(ofproto_del_wc_flow);
+ ofproto->need_revalidate = true;
+ } else {
+ rule_uninstall(ofproto, rule);
+ }
+ classifier_remove(&ofproto->cls, &rule->cr);
+ rule_destroy(ofproto, rule);
+}
+
+/* Returns true if the actions changed, false otherwise. */
+static bool
+rule_make_actions(struct ofproto *p, struct rule *rule,
+ const struct ofpbuf *packet)
+{
+ const struct rule *super;
+ struct odp_actions a;
+ size_t actions_len;
+
+ assert(!rule->cr.wc.wildcards);
+
+ super = rule->super ? rule->super : rule;
+ rule->tags = 0;
+ xlate_actions(super->actions, super->n_actions, &rule->cr.flow, p,
+ packet, &a, &rule->tags, &rule->may_install,
+ &rule->nf_flow.output_iface);
+
+ actions_len = a.n_actions * sizeof *a.actions;
+ if (rule->n_odp_actions != a.n_actions
+ || memcmp(rule->odp_actions, a.actions, actions_len)) {
+ COVERAGE_INC(ofproto_odp_unchanged);
+ free(rule->odp_actions);
+ rule->n_odp_actions = a.n_actions;
+ rule->odp_actions = xmemdup(a.actions, actions_len);
+ return true;
+ } else {
+ return false;
+ }
+}
+
+static int
+do_put_flow(struct ofproto *ofproto, struct rule *rule, int flags,
+ struct odp_flow_put *put)
+{
+ memset(&put->flow.stats, 0, sizeof put->flow.stats);
+ put->flow.key = rule->cr.flow;
+ put->flow.actions = rule->odp_actions;
+ put->flow.n_actions = rule->n_odp_actions;
+ put->flags = flags;
+ return dpif_flow_put(ofproto->dpif, put);
+}
+
+static void
+rule_install(struct ofproto *p, struct rule *rule, struct rule *displaced_rule)
+{
+ assert(!rule->cr.wc.wildcards);
+
+ if (rule->may_install) {
+ struct odp_flow_put put;
+ if (!do_put_flow(p, rule,
+ ODPPF_CREATE | ODPPF_MODIFY | ODPPF_ZERO_STATS,
+ &put)) {
+ rule->installed = true;
+ if (displaced_rule) {
+ update_stats(p, rule, &put.flow.stats);
+ rule_post_uninstall(p, displaced_rule);
+ }
+ }
+ } else if (displaced_rule) {
+ rule_uninstall(p, displaced_rule);
+ }
+}
+
+static void
+rule_reinstall(struct ofproto *ofproto, struct rule *rule)
+{
+ if (rule->installed) {
+ struct odp_flow_put put;
+ COVERAGE_INC(ofproto_dp_missed);
+ do_put_flow(ofproto, rule, ODPPF_CREATE | ODPPF_MODIFY, &put);
+ } else {
+ rule_install(ofproto, rule, NULL);
+ }
+}
+
+static void
+rule_update_actions(struct ofproto *ofproto, struct rule *rule)
+{
+ bool actions_changed = rule_make_actions(ofproto, rule, NULL);
+ if (rule->may_install) {
+ if (rule->installed) {
+ if (actions_changed) {
+ /* XXX should really do rule_post_uninstall() for the *old* set
+ * of actions, and distinguish the old stats from the new. */
+ struct odp_flow_put put;
+ do_put_flow(ofproto, rule, ODPPF_CREATE | ODPPF_MODIFY, &put);
+ }
+ } else {
+ rule_install(ofproto, rule, NULL);
+ }
+ } else {
+ rule_uninstall(ofproto, rule);
+ }
+}
+
+static void
+rule_account(struct ofproto *ofproto, struct rule *rule, uint64_t extra_bytes)
+{
+ uint64_t total_bytes = rule->byte_count + extra_bytes;
+
+ if (ofproto->ofhooks->account_flow_cb
+ && total_bytes > rule->accounted_bytes)
+ {
+ ofproto->ofhooks->account_flow_cb(
+ &rule->cr.flow, rule->odp_actions, rule->n_odp_actions,
+ total_bytes - rule->accounted_bytes, ofproto->aux);
+ rule->accounted_bytes = total_bytes;
+ }
+}
+
+static void
+rule_uninstall(struct ofproto *p, struct rule *rule)
+{
+ assert(!rule->cr.wc.wildcards);
+ if (rule->installed) {
+ struct odp_flow odp_flow;
+
+ odp_flow.key = rule->cr.flow;
+ odp_flow.actions = NULL;
+ odp_flow.n_actions = 0;
+ if (!dpif_flow_del(p->dpif, &odp_flow)) {
+ update_stats(p, rule, &odp_flow.stats);
+ }
+ rule->installed = false;
+
+ rule_post_uninstall(p, rule);
+ }
+}
+
+static bool
+is_controller_rule(struct rule *rule)
+{
+ /* If the only action is send to the controller then don't report
+ * NetFlow expiration messages since it is just part of the control
+ * logic for the network and not real traffic. */
+
+ if (rule && rule->super) {
+ struct rule *super = rule->super;
+
+ return super->n_actions == 1 &&
+ super->actions[0].type == htons(OFPAT_OUTPUT) &&
+ super->actions[0].output.port == htons(OFPP_CONTROLLER);
+ }
+
+ return false;
+}
+
+static void
+rule_post_uninstall(struct ofproto *ofproto, struct rule *rule)
+{
+ struct rule *super = rule->super;
+
+ rule_account(ofproto, rule, 0);
+
+ if (ofproto->netflow && !is_controller_rule(rule)) {
+ struct ofexpired expired;
+ expired.flow = rule->cr.flow;
+ expired.packet_count = rule->packet_count;
+ expired.byte_count = rule->byte_count;
+ expired.used = rule->used;
+ netflow_expire(ofproto->netflow, &rule->nf_flow, &expired);
+ }
+ if (super) {
+ super->packet_count += rule->packet_count;
+ super->byte_count += rule->byte_count;
+
+ /* Reset counters to prevent double counting if the rule ever gets
+ * reinstalled. */
+ rule->packet_count = 0;
+ rule->byte_count = 0;
+ rule->accounted_bytes = 0;
+
+ netflow_flow_clear(&rule->nf_flow);
+ }
+}
+\f
+static void
+queue_tx(struct ofpbuf *msg, const struct ofconn *ofconn,
+ struct rconn_packet_counter *counter)
+{
+ update_openflow_length(msg);
+ if (rconn_send(ofconn->rconn, msg, counter)) {
+ ofpbuf_delete(msg);
+ }
+}
+
+static void
+send_error(const struct ofconn *ofconn, const struct ofp_header *oh,
+ int error, const void *data, size_t len)
+{
+ struct ofpbuf *buf;
+ struct ofp_error_msg *oem;
+
+ if (!(error >> 16)) {
+ VLOG_WARN_RL(&rl, "not sending bad error code %d to controller",
+ error);
+ return;
+ }
+
+ COVERAGE_INC(ofproto_error);
+ oem = make_openflow_xid(len + sizeof *oem, OFPT_ERROR,
+ oh ? oh->xid : 0, &buf);
+ oem->type = htons((unsigned int) error >> 16);
+ oem->code = htons(error & 0xffff);
+ memcpy(oem->data, data, len);
+ queue_tx(buf, ofconn, ofconn->reply_counter);
+}
+
+static void
+send_error_oh(const struct ofconn *ofconn, const struct ofp_header *oh,
+ int error)
+{
+ size_t oh_length = ntohs(oh->length);
+ send_error(ofconn, oh, error, oh, MIN(oh_length, 64));
+}
+
+static void
+hton_ofp_phy_port(struct ofp_phy_port *opp)
+{
+ opp->port_no = htons(opp->port_no);
+ opp->config = htonl(opp->config);
+ opp->state = htonl(opp->state);
+ opp->curr = htonl(opp->curr);
+ opp->advertised = htonl(opp->advertised);
+ opp->supported = htonl(opp->supported);
+ opp->peer = htonl(opp->peer);
+}
+
+static int
+handle_echo_request(struct ofconn *ofconn, struct ofp_header *oh)
+{
+ struct ofp_header *rq = oh;
+ queue_tx(make_echo_reply(rq), ofconn, ofconn->reply_counter);
+ return 0;
+}
+
+static int
+handle_features_request(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_header *oh)
+{
+ struct ofp_switch_features *osf;
+ struct ofpbuf *buf;
+ unsigned int port_no;
+ struct ofport *port;
+
+ osf = make_openflow_xid(sizeof *osf, OFPT_FEATURES_REPLY, oh->xid, &buf);
+ osf->datapath_id = htonll(p->datapath_id);
+ osf->n_buffers = htonl(pktbuf_capacity());
+ osf->n_tables = 2;
+ osf->capabilities = htonl(OFPC_FLOW_STATS | OFPC_TABLE_STATS |
+ OFPC_PORT_STATS | OFPC_MULTI_PHY_TX);
+ osf->actions = htonl((1u << OFPAT_OUTPUT) |
+ (1u << OFPAT_SET_VLAN_VID) |
+ (1u << OFPAT_SET_VLAN_PCP) |
+ (1u << OFPAT_STRIP_VLAN) |
+ (1u << OFPAT_SET_DL_SRC) |
+ (1u << OFPAT_SET_DL_DST) |
+ (1u << OFPAT_SET_NW_SRC) |
+ (1u << OFPAT_SET_NW_DST) |
+ (1u << OFPAT_SET_TP_SRC) |
+ (1u << OFPAT_SET_TP_DST));
+
+ PORT_ARRAY_FOR_EACH (port, &p->ports, port_no) {
+ hton_ofp_phy_port(ofpbuf_put(buf, &port->opp, sizeof port->opp));
+ }
+
+ queue_tx(buf, ofconn, ofconn->reply_counter);
+ return 0;
+}
+
+static int
+handle_get_config_request(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_header *oh)
+{
+ struct ofpbuf *buf;
+ struct ofp_switch_config *osc;
+ uint16_t flags;
+ bool drop_frags;
+
+ /* Figure out flags. */
+ dpif_get_drop_frags(p->dpif, &drop_frags);
+ flags = drop_frags ? OFPC_FRAG_DROP : OFPC_FRAG_NORMAL;
+ if (ofconn->send_flow_exp) {
+ flags |= OFPC_SEND_FLOW_EXP;
+ }
+
+ /* Send reply. */
+ osc = make_openflow_xid(sizeof *osc, OFPT_GET_CONFIG_REPLY, oh->xid, &buf);
+ osc->flags = htons(flags);
+ osc->miss_send_len = htons(ofconn->miss_send_len);
+ queue_tx(buf, ofconn, ofconn->reply_counter);
+
+ return 0;
+}
+
+static int
+handle_set_config(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_switch_config *osc)
+{
+ uint16_t flags;
+ int error;
+
+ error = check_ofp_message(&osc->header, OFPT_SET_CONFIG, sizeof *osc);
+ if (error) {
+ return error;
+ }
+ flags = ntohs(osc->flags);
+
+ ofconn->send_flow_exp = (flags & OFPC_SEND_FLOW_EXP) != 0;
+
+ if (ofconn == p->controller) {
+ switch (flags & OFPC_FRAG_MASK) {
+ case OFPC_FRAG_NORMAL:
+ dpif_set_drop_frags(p->dpif, false);
+ break;
+ case OFPC_FRAG_DROP:
+ dpif_set_drop_frags(p->dpif, true);
+ break;
+ default:
+ VLOG_WARN_RL(&rl, "requested bad fragment mode (flags=%"PRIx16")",
+ osc->flags);
+ break;
+ }
+ }
+
+ if ((ntohs(osc->miss_send_len) != 0) != (ofconn->miss_send_len != 0)) {
+ if (ntohs(osc->miss_send_len) != 0) {
+ ofconn->pktbuf = pktbuf_create();
+ } else {
+ pktbuf_destroy(ofconn->pktbuf);
+ }
+ }
+
+ ofconn->miss_send_len = ntohs(osc->miss_send_len);
+
+ return 0;
+}
+
+static void
+add_output_group_action(struct odp_actions *actions, uint16_t group,
+ uint16_t *nf_output_iface)
+{
+ odp_actions_add(actions, ODPAT_OUTPUT_GROUP)->output_group.group = group;
+
+ if (group == DP_GROUP_ALL || group == DP_GROUP_FLOOD) {
+ *nf_output_iface = NF_OUT_FLOOD;
+ }
+}
+
+static void
+add_controller_action(struct odp_actions *actions,
+ const struct ofp_action_output *oao)
+{
+ union odp_action *a = odp_actions_add(actions, ODPAT_CONTROLLER);
+ a->controller.arg = oao->max_len ? ntohs(oao->max_len) : UINT32_MAX;
+}
+
+struct action_xlate_ctx {
+ /* Input. */
+ const flow_t *flow; /* Flow to which these actions correspond. */
+ int recurse; /* Recursion level, via xlate_table_action. */
+ struct ofproto *ofproto;
+ const struct ofpbuf *packet; /* The packet corresponding to 'flow', or a
+ * null pointer if we are revalidating
+ * without a packet to refer to. */
+
+ /* Output. */
+ struct odp_actions *out; /* Datapath actions. */
+ tag_type *tags; /* Tags associated with OFPP_NORMAL actions. */
+ bool may_set_up_flow; /* True ordinarily; false if the actions must
+ * be reassessed for every packet. */
+ uint16_t nf_output_iface; /* Output interface index for NetFlow. */
+};
+
+static void do_xlate_actions(const union ofp_action *in, size_t n_in,
+ struct action_xlate_ctx *ctx);
+
+static void
+add_output_action(struct action_xlate_ctx *ctx, uint16_t port)
+{
+ const struct ofport *ofport = port_array_get(&ctx->ofproto->ports, port);
+
+ if (ofport) {
+ if (ofport->opp.config & OFPPC_NO_FWD) {
+ /* Forwarding disabled on port. */
+ return;
+ }
+ } else {
+ /*
+ * We don't have an ofport record for this port, but it doesn't hurt to
+ * allow forwarding to it anyhow. Maybe such a port will appear later
+ * and we're pre-populating the flow table.
+ */
+ }
+
+ odp_actions_add(ctx->out, ODPAT_OUTPUT)->output.port = port;
+ ctx->nf_output_iface = port;
+}
+
+static struct rule *
+lookup_valid_rule(struct ofproto *ofproto, const flow_t *flow)
+{
+ struct rule *rule;
+ rule = rule_from_cls_rule(classifier_lookup(&ofproto->cls, flow));
+
+ /* The rule we found might not be valid, since we could be in need of
+ * revalidation. If it is not valid, don't return it. */
+ if (rule
+ && rule->super
+ && ofproto->need_revalidate
+ && !revalidate_rule(ofproto, rule)) {
+ COVERAGE_INC(ofproto_invalidated);
+ return NULL;
+ }
+
+ return rule;
+}
+
+static void
+xlate_table_action(struct action_xlate_ctx *ctx, uint16_t in_port)
+{
+ if (!ctx->recurse) {
+ struct rule *rule;
+ flow_t flow;
+
+ flow = *ctx->flow;
+ flow.in_port = in_port;
+
+ rule = lookup_valid_rule(ctx->ofproto, &flow);
+ if (rule) {
+ if (rule->super) {
+ rule = rule->super;
+ }
+
+ ctx->recurse++;
+ do_xlate_actions(rule->actions, rule->n_actions, ctx);
+ ctx->recurse--;
+ }
+ }
+}
+
+static void
+xlate_output_action(struct action_xlate_ctx *ctx,
+ const struct ofp_action_output *oao)
+{
+ uint16_t odp_port;
+ uint16_t prev_nf_output_iface = ctx->nf_output_iface;
+
+ ctx->nf_output_iface = NF_OUT_DROP;
+
+ switch (ntohs(oao->port)) {
+ case OFPP_IN_PORT:
+ add_output_action(ctx, ctx->flow->in_port);
+ break;
+ case OFPP_TABLE:
+ xlate_table_action(ctx, ctx->flow->in_port);
+ break;
+ case OFPP_NORMAL:
+ if (!ctx->ofproto->ofhooks->normal_cb(ctx->flow, ctx->packet,
+ ctx->out, ctx->tags,
+ &ctx->nf_output_iface,
+ ctx->ofproto->aux)) {
+ COVERAGE_INC(ofproto_uninstallable);
+ ctx->may_set_up_flow = false;
+ }
+ break;
+ case OFPP_FLOOD:
+ add_output_group_action(ctx->out, DP_GROUP_FLOOD,
+ &ctx->nf_output_iface);
+ break;
+ case OFPP_ALL:
+ add_output_group_action(ctx->out, DP_GROUP_ALL, &ctx->nf_output_iface);
+ break;
+ case OFPP_CONTROLLER:
+ add_controller_action(ctx->out, oao);
+ break;
+ case OFPP_LOCAL:
+ add_output_action(ctx, ODPP_LOCAL);
+ break;
+ default:
+ odp_port = ofp_port_to_odp_port(ntohs(oao->port));
+ if (odp_port != ctx->flow->in_port) {
+ add_output_action(ctx, odp_port);
+ }
+ break;
+ }
+
+ if (prev_nf_output_iface == NF_OUT_FLOOD) {
+ ctx->nf_output_iface = NF_OUT_FLOOD;
+ } else if (ctx->nf_output_iface == NF_OUT_DROP) {
+ ctx->nf_output_iface = prev_nf_output_iface;
+ } else if (prev_nf_output_iface != NF_OUT_DROP &&
+ ctx->nf_output_iface != NF_OUT_FLOOD) {
+ ctx->nf_output_iface = NF_OUT_MULTI;
+ }
+}
+
+static void
+xlate_nicira_action(struct action_xlate_ctx *ctx,
+ const struct nx_action_header *nah)
+{
+ const struct nx_action_resubmit *nar;
+ int subtype = ntohs(nah->subtype);
+
+ assert(nah->vendor == htonl(NX_VENDOR_ID));
+ switch (subtype) {
+ case NXAST_RESUBMIT:
+ nar = (const struct nx_action_resubmit *) nah;
+ xlate_table_action(ctx, ofp_port_to_odp_port(ntohs(nar->in_port)));
+ break;
+
+ default:
+ VLOG_DBG_RL(&rl, "unknown Nicira action type %"PRIu16, subtype);
+ break;
+ }
+}
+
+static void
+do_xlate_actions(const union ofp_action *in, size_t n_in,
+ struct action_xlate_ctx *ctx)
+{
+ struct actions_iterator iter;
+ const union ofp_action *ia;
+ const struct ofport *port;
+
+ port = port_array_get(&ctx->ofproto->ports, ctx->flow->in_port);
+ if (port && port->opp.config & (OFPPC_NO_RECV | OFPPC_NO_RECV_STP) &&
+ port->opp.config & (eth_addr_equals(ctx->flow->dl_dst, stp_eth_addr)
+ ? OFPPC_NO_RECV_STP : OFPPC_NO_RECV)) {
+ /* Drop this flow. */
+ return;
+ }
+
+ for (ia = actions_first(&iter, in, n_in); ia; ia = actions_next(&iter)) {
+ uint16_t type = ntohs(ia->type);
+ union odp_action *oa;
+
+ switch (type) {
+ case OFPAT_OUTPUT:
+ xlate_output_action(ctx, &ia->output);
+ break;
+
+ case OFPAT_SET_VLAN_VID:
+ oa = odp_actions_add(ctx->out, ODPAT_SET_VLAN_VID);
+ oa->vlan_vid.vlan_vid = ia->vlan_vid.vlan_vid;
+ break;
+
+ case OFPAT_SET_VLAN_PCP:
+ oa = odp_actions_add(ctx->out, ODPAT_SET_VLAN_PCP);
+ oa->vlan_pcp.vlan_pcp = ia->vlan_pcp.vlan_pcp;
+ break;
+
+ case OFPAT_STRIP_VLAN:
+ odp_actions_add(ctx->out, ODPAT_STRIP_VLAN);
+ break;
+
+ case OFPAT_SET_DL_SRC:
+ oa = odp_actions_add(ctx->out, ODPAT_SET_DL_SRC);
+ memcpy(oa->dl_addr.dl_addr,
+ ((struct ofp_action_dl_addr *) ia)->dl_addr, ETH_ADDR_LEN);
+ break;
+
+ case OFPAT_SET_DL_DST:
+ oa = odp_actions_add(ctx->out, ODPAT_SET_DL_DST);
+ memcpy(oa->dl_addr.dl_addr,
+ ((struct ofp_action_dl_addr *) ia)->dl_addr, ETH_ADDR_LEN);
+ break;
+
+ case OFPAT_SET_NW_SRC:
+ oa = odp_actions_add(ctx->out, ODPAT_SET_NW_SRC);
+ oa->nw_addr.nw_addr = ia->nw_addr.nw_addr;
+ break;
+
+ case OFPAT_SET_TP_SRC:
+ oa = odp_actions_add(ctx->out, ODPAT_SET_TP_SRC);
+ oa->tp_port.tp_port = ia->tp_port.tp_port;
+ break;
+
+ case OFPAT_VENDOR:
+ xlate_nicira_action(ctx, (const struct nx_action_header *) ia);
+ break;
+
+ default:
+ VLOG_DBG_RL(&rl, "unknown action type %"PRIu16, type);
+ break;
+ }
+ }
+}
+
+static int
+xlate_actions(const union ofp_action *in, size_t n_in,
+ const flow_t *flow, struct ofproto *ofproto,
+ const struct ofpbuf *packet,
+ struct odp_actions *out, tag_type *tags, bool *may_set_up_flow,
+ uint16_t *nf_output_iface)
+{
+ tag_type no_tags = 0;
+ struct action_xlate_ctx ctx;
+ COVERAGE_INC(ofproto_ofp2odp);
+ odp_actions_init(out);
+ ctx.flow = flow;
+ ctx.recurse = 0;
+ ctx.ofproto = ofproto;
+ ctx.packet = packet;
+ ctx.out = out;
+ ctx.tags = tags ? tags : &no_tags;
+ ctx.may_set_up_flow = true;
+ ctx.nf_output_iface = NF_OUT_DROP;
+ do_xlate_actions(in, n_in, &ctx);
+
+ /* Check with in-band control to see if we're allowed to set up this
+ * flow. */
+ if (!in_band_rule_check(ofproto->in_band, flow, out)) {
+ ctx.may_set_up_flow = false;
+ }
+
+ if (may_set_up_flow) {
+ *may_set_up_flow = ctx.may_set_up_flow;
+ }
+ if (nf_output_iface) {
+ *nf_output_iface = ctx.nf_output_iface;
+ }
+ if (odp_actions_overflow(out)) {
+ odp_actions_init(out);
+ return ofp_mkerr(OFPET_BAD_ACTION, OFPBAC_TOO_MANY);
+ }
+ return 0;
+}
+
+static int
+handle_packet_out(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_header *oh)
+{
+ struct ofp_packet_out *opo;
+ struct ofpbuf payload, *buffer;
+ struct odp_actions actions;
+ int n_actions;
+ uint16_t in_port;
+ flow_t flow;
+ int error;
+
+ error = check_ofp_packet_out(oh, &payload, &n_actions, p->max_ports);
+ if (error) {
+ return error;
+ }
+ opo = (struct ofp_packet_out *) oh;
+
+ COVERAGE_INC(ofproto_packet_out);
+ if (opo->buffer_id != htonl(UINT32_MAX)) {
+ error = pktbuf_retrieve(ofconn->pktbuf, ntohl(opo->buffer_id),
+ &buffer, &in_port);
+ if (error || !buffer) {
+ return error;
+ }
+ payload = *buffer;
+ } else {
+ buffer = NULL;
+ }
+
+ flow_extract(&payload, ofp_port_to_odp_port(ntohs(opo->in_port)), &flow);
+ error = xlate_actions((const union ofp_action *) opo->actions, n_actions,
+ &flow, p, &payload, &actions, NULL, NULL, NULL);
+ if (error) {
+ return error;
+ }
+
+ dpif_execute(p->dpif, flow.in_port, actions.actions, actions.n_actions,
+ &payload);
+ ofpbuf_delete(buffer);
+
+ return 0;
+}
+
+static void
+update_port_config(struct ofproto *p, struct ofport *port,
+ uint32_t config, uint32_t mask)
+{
+ mask &= config ^ port->opp.config;
+ if (mask & OFPPC_PORT_DOWN) {
+ if (config & OFPPC_PORT_DOWN) {
+ netdev_turn_flags_off(port->netdev, NETDEV_UP, true);
+ } else {
+ netdev_turn_flags_on(port->netdev, NETDEV_UP, true);
+ }
+ }
+#define REVALIDATE_BITS (OFPPC_NO_RECV | OFPPC_NO_RECV_STP | OFPPC_NO_FWD)
+ if (mask & REVALIDATE_BITS) {
+ COVERAGE_INC(ofproto_costly_flags);
+ port->opp.config ^= mask & REVALIDATE_BITS;
+ p->need_revalidate = true;
+ }
+#undef REVALIDATE_BITS
+ if (mask & OFPPC_NO_FLOOD) {
+ port->opp.config ^= OFPPC_NO_FLOOD;
+ refresh_port_group(p, DP_GROUP_FLOOD);
+ }
+ if (mask & OFPPC_NO_PACKET_IN) {
+ port->opp.config ^= OFPPC_NO_PACKET_IN;
+ }
+}
+
+static int
+handle_port_mod(struct ofproto *p, struct ofp_header *oh)
+{
+ const struct ofp_port_mod *opm;
+ struct ofport *port;
+ int error;
+
+ error = check_ofp_message(oh, OFPT_PORT_MOD, sizeof *opm);
+ if (error) {
+ return error;
+ }
+ opm = (struct ofp_port_mod *) oh;
+
+ port = port_array_get(&p->ports,
+ ofp_port_to_odp_port(ntohs(opm->port_no)));
+ if (!port) {
+ return ofp_mkerr(OFPET_PORT_MOD_FAILED, OFPPMFC_BAD_PORT);
+ } else if (memcmp(port->opp.hw_addr, opm->hw_addr, OFP_ETH_ALEN)) {
+ return ofp_mkerr(OFPET_PORT_MOD_FAILED, OFPPMFC_BAD_HW_ADDR);
+ } else {
+ update_port_config(p, port, ntohl(opm->config), ntohl(opm->mask));
+ if (opm->advertise) {
+ netdev_set_advertisements(port->netdev, ntohl(opm->advertise));
+ }
+ }
+ return 0;
+}
+
+static struct ofpbuf *
+make_stats_reply(uint32_t xid, uint16_t type, size_t body_len)
+{
+ struct ofp_stats_reply *osr;
+ struct ofpbuf *msg;
+
+ msg = ofpbuf_new(MIN(sizeof *osr + body_len, UINT16_MAX));
+ osr = put_openflow_xid(sizeof *osr, OFPT_STATS_REPLY, xid, msg);
+ osr->type = type;
+ osr->flags = htons(0);
+ return msg;
+}
+
+static struct ofpbuf *
+start_stats_reply(const struct ofp_stats_request *request, size_t body_len)
+{
+ return make_stats_reply(request->header.xid, request->type, body_len);
+}
+
+static void *
+append_stats_reply(size_t nbytes, struct ofconn *ofconn, struct ofpbuf **msgp)
+{
+ struct ofpbuf *msg = *msgp;
+ assert(nbytes <= UINT16_MAX - sizeof(struct ofp_stats_reply));
+ if (nbytes + msg->size > UINT16_MAX) {
+ struct ofp_stats_reply *reply = msg->data;
+ reply->flags = htons(OFPSF_REPLY_MORE);
+ *msgp = make_stats_reply(reply->header.xid, reply->type, nbytes);
+ queue_tx(msg, ofconn, ofconn->reply_counter);
+ }
+ return ofpbuf_put_uninit(*msgp, nbytes);
+}
+
+static int
+handle_desc_stats_request(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_stats_request *request)
+{
+ struct ofp_desc_stats *ods;
+ struct ofpbuf *msg;
+
+ msg = start_stats_reply(request, sizeof *ods);
+ ods = append_stats_reply(sizeof *ods, ofconn, &msg);
+ strncpy(ods->mfr_desc, p->manufacturer, sizeof ods->mfr_desc);
+ strncpy(ods->hw_desc, p->hardware, sizeof ods->hw_desc);
+ strncpy(ods->sw_desc, p->software, sizeof ods->sw_desc);
+ strncpy(ods->serial_num, p->serial, sizeof ods->serial_num);
+ queue_tx(msg, ofconn, ofconn->reply_counter);
+
+ return 0;
+}
+
+static void
+count_subrules(struct cls_rule *cls_rule, void *n_subrules_)
+{
+ struct rule *rule = rule_from_cls_rule(cls_rule);
+ int *n_subrules = n_subrules_;
+
+ if (rule->super) {
+ (*n_subrules)++;
+ }
+}
+
+static int
+handle_table_stats_request(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_stats_request *request)
+{
+ struct ofp_table_stats *ots;
+ struct ofpbuf *msg;
+ struct odp_stats dpstats;
+ int n_exact, n_subrules, n_wild;
+
+ msg = start_stats_reply(request, sizeof *ots * 2);
+
+ /* Count rules of various kinds. */
+ n_subrules = 0;
+ classifier_for_each(&p->cls, CLS_INC_EXACT, count_subrules, &n_subrules);
+ n_exact = classifier_count_exact(&p->cls) - n_subrules;
+ n_wild = classifier_count(&p->cls) - classifier_count_exact(&p->cls);
+
+ /* Hash table. */
+ dpif_get_dp_stats(p->dpif, &dpstats);
+ ots = append_stats_reply(sizeof *ots, ofconn, &msg);
+ memset(ots, 0, sizeof *ots);
+ ots->table_id = TABLEID_HASH;
+ strcpy(ots->name, "hash");
+ ots->wildcards = htonl(0);
+ ots->max_entries = htonl(dpstats.max_capacity);
+ ots->active_count = htonl(n_exact);
+ ots->lookup_count = htonll(dpstats.n_frags + dpstats.n_hit +
+ dpstats.n_missed);
+ ots->matched_count = htonll(dpstats.n_hit); /* XXX */
+
+ /* Classifier table. */
+ ots = append_stats_reply(sizeof *ots, ofconn, &msg);
+ memset(ots, 0, sizeof *ots);
+ ots->table_id = TABLEID_CLASSIFIER;
+ strcpy(ots->name, "classifier");
+ ots->wildcards = htonl(OFPFW_ALL);
+ ots->max_entries = htonl(65536);
+ ots->active_count = htonl(n_wild);
+ ots->lookup_count = htonll(0); /* XXX */
+ ots->matched_count = htonll(0); /* XXX */
+
+ queue_tx(msg, ofconn, ofconn->reply_counter);
+ return 0;
+}
+
+static int
+handle_port_stats_request(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_stats_request *request)
+{
+ struct ofp_port_stats *ops;
+ struct ofpbuf *msg;
+ struct ofport *port;
+ unsigned int port_no;
+
+ msg = start_stats_reply(request, sizeof *ops * 16);
+ PORT_ARRAY_FOR_EACH (port, &p->ports, port_no) {
+ struct netdev_stats stats;
+
+ /* Intentionally ignore return value, since errors will set 'stats' to
+ * all-1s, which is correct for OpenFlow, and netdev_get_stats() will
+ * log errors. */
+ netdev_get_stats(port->netdev, &stats);
+
+ ops = append_stats_reply(sizeof *ops, ofconn, &msg);
+ ops->port_no = htons(odp_port_to_ofp_port(port_no));
+ memset(ops->pad, 0, sizeof ops->pad);
+ ops->rx_packets = htonll(stats.rx_packets);
+ ops->tx_packets = htonll(stats.tx_packets);
+ ops->rx_bytes = htonll(stats.rx_bytes);
+ ops->tx_bytes = htonll(stats.tx_bytes);
+ ops->rx_dropped = htonll(stats.rx_dropped);
+ ops->tx_dropped = htonll(stats.tx_dropped);
+ ops->rx_errors = htonll(stats.rx_errors);
+ ops->tx_errors = htonll(stats.tx_errors);
+ ops->rx_frame_err = htonll(stats.rx_frame_errors);
+ ops->rx_over_err = htonll(stats.rx_over_errors);
+ ops->rx_crc_err = htonll(stats.rx_crc_errors);
+ ops->collisions = htonll(stats.collisions);
+ }
+
+ queue_tx(msg, ofconn, ofconn->reply_counter);
+ return 0;
+}
+
+struct flow_stats_cbdata {
+ struct ofproto *ofproto;
+ struct ofconn *ofconn;
+ uint16_t out_port;
+ struct ofpbuf *msg;
+};
+
+static void
+query_stats(struct ofproto *p, struct rule *rule,
+ uint64_t *packet_countp, uint64_t *byte_countp)
+{
+ uint64_t packet_count, byte_count;
+ struct rule *subrule;
+ struct odp_flow *odp_flows;
+ size_t n_odp_flows;
+
+ packet_count = rule->packet_count;
+ byte_count = rule->byte_count;
+
+ n_odp_flows = rule->cr.wc.wildcards ? list_size(&rule->list) : 1;
+ odp_flows = xcalloc(1, n_odp_flows * sizeof *odp_flows);
+ if (rule->cr.wc.wildcards) {
+ size_t i = 0;
+ LIST_FOR_EACH (subrule, struct rule, list, &rule->list) {
+ odp_flows[i++].key = subrule->cr.flow;
+ packet_count += subrule->packet_count;
+ byte_count += subrule->byte_count;
+ }
+ } else {
+ odp_flows[0].key = rule->cr.flow;
+ }
+
+ packet_count = rule->packet_count;
+ byte_count = rule->byte_count;
+ if (!dpif_flow_get_multiple(p->dpif, odp_flows, n_odp_flows)) {
+ size_t i;
+ for (i = 0; i < n_odp_flows; i++) {
+ struct odp_flow *odp_flow = &odp_flows[i];
+ packet_count += odp_flow->stats.n_packets;
+ byte_count += odp_flow->stats.n_bytes;
+ }
+ }
+ free(odp_flows);
+
+ *packet_countp = packet_count;
+ *byte_countp = byte_count;
+}
+
+static void
+flow_stats_cb(struct cls_rule *rule_, void *cbdata_)
+{
+ struct rule *rule = rule_from_cls_rule(rule_);
+ struct flow_stats_cbdata *cbdata = cbdata_;
+ struct ofp_flow_stats *ofs;
+ uint64_t packet_count, byte_count;
+ size_t act_len, len;
+
+ if (rule_is_hidden(rule) || !rule_has_out_port(rule, cbdata->out_port)) {
+ return;
+ }
+
+ act_len = sizeof *rule->actions * rule->n_actions;
+ len = offsetof(struct ofp_flow_stats, actions) + act_len;
+
+ query_stats(cbdata->ofproto, rule, &packet_count, &byte_count);
+
+ ofs = append_stats_reply(len, cbdata->ofconn, &cbdata->msg);
+ ofs->length = htons(len);
+ ofs->table_id = rule->cr.wc.wildcards ? TABLEID_CLASSIFIER : TABLEID_HASH;
+ ofs->pad = 0;
+ flow_to_match(&rule->cr.flow, rule->cr.wc.wildcards, &ofs->match);
+ ofs->duration = htonl((time_msec() - rule->created) / 1000);
+ ofs->priority = htons(rule->cr.priority);
+ ofs->idle_timeout = htons(rule->idle_timeout);
+ ofs->hard_timeout = htons(rule->hard_timeout);
+ memset(ofs->pad2, 0, sizeof ofs->pad2);
+ ofs->packet_count = htonll(packet_count);
+ ofs->byte_count = htonll(byte_count);
+ memcpy(ofs->actions, rule->actions, act_len);
+}
+
+static int
+table_id_to_include(uint8_t table_id)
+{
+ return (table_id == TABLEID_HASH ? CLS_INC_EXACT
+ : table_id == TABLEID_CLASSIFIER ? CLS_INC_WILD
+ : table_id == 0xff ? CLS_INC_ALL
+ : 0);
+}
+
+static int
+handle_flow_stats_request(struct ofproto *p, struct ofconn *ofconn,
+ const struct ofp_stats_request *osr,
+ size_t arg_size)
+{
+ struct ofp_flow_stats_request *fsr;
+ struct flow_stats_cbdata cbdata;
+ struct cls_rule target;
+
+ if (arg_size != sizeof *fsr) {
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
+ }
+ fsr = (struct ofp_flow_stats_request *) osr->body;
+
+ COVERAGE_INC(ofproto_flows_req);
+ cbdata.ofproto = p;
+ cbdata.ofconn = ofconn;
+ cbdata.out_port = fsr->out_port;
+ cbdata.msg = start_stats_reply(osr, 1024);
+ cls_rule_from_match(&target, &fsr->match, 0);
+ classifier_for_each_match(&p->cls, &target,
+ table_id_to_include(fsr->table_id),
+ flow_stats_cb, &cbdata);
+ queue_tx(cbdata.msg, ofconn, ofconn->reply_counter);
+ return 0;
+}
+
+struct flow_stats_ds_cbdata {
+ struct ofproto *ofproto;
+ struct ds *results;
+};
+
+static void
+flow_stats_ds_cb(struct cls_rule *rule_, void *cbdata_)
+{
+ struct rule *rule = rule_from_cls_rule(rule_);
+ struct flow_stats_ds_cbdata *cbdata = cbdata_;
+ struct ds *results = cbdata->results;
+ struct ofp_match match;
+ uint64_t packet_count, byte_count;
+ size_t act_len = sizeof *rule->actions * rule->n_actions;
+
+ /* Don't report on subrules. */
+ if (rule->super != NULL) {
+ return;
+ }
+
+ query_stats(cbdata->ofproto, rule, &packet_count, &byte_count);
+ flow_to_ovs_match(&rule->cr.flow, rule->cr.wc.wildcards, &match);
+
+ ds_put_format(results, "duration=%llds, ",
+ (time_msec() - rule->created) / 1000);
+ ds_put_format(results, "priority=%u, ", rule->cr.priority);
+ ds_put_format(results, "n_packets=%"PRIu64", ", packet_count);
+ ds_put_format(results, "n_bytes=%"PRIu64", ", byte_count);
+ ofp_print_match(results, &match, true);
+ ofp_print_actions(results, &rule->actions->header, act_len);
+ ds_put_cstr(results, "\n");
+}
+
+/* Adds a pretty-printed description of all flows to 'results', including
+ * those marked hidden by secchan (e.g., by in-band control). */
+void
+ofproto_get_all_flows(struct ofproto *p, struct ds *results)
+{
+ struct ofp_match match;
+ struct cls_rule target;
+ struct flow_stats_ds_cbdata cbdata;
+
+ memset(&match, 0, sizeof match);
+ match.wildcards = htonl(OFPFW_ALL);
+
+ cbdata.ofproto = p;
+ cbdata.results = results;
+
+ cls_rule_from_match(&target, &match, 0);
+ classifier_for_each_match(&p->cls, &target, CLS_INC_ALL,
+ flow_stats_ds_cb, &cbdata);
+}
+
+struct aggregate_stats_cbdata {
+ struct ofproto *ofproto;
+ uint16_t out_port;
+ uint64_t packet_count;
+ uint64_t byte_count;
+ uint32_t n_flows;
+};
+
+static void
+aggregate_stats_cb(struct cls_rule *rule_, void *cbdata_)
+{
+ struct rule *rule = rule_from_cls_rule(rule_);
+ struct aggregate_stats_cbdata *cbdata = cbdata_;
+ uint64_t packet_count, byte_count;
+
+ if (rule_is_hidden(rule) || !rule_has_out_port(rule, cbdata->out_port)) {
+ return;
+ }
+
+ query_stats(cbdata->ofproto, rule, &packet_count, &byte_count);
+
+ cbdata->packet_count += packet_count;
+ cbdata->byte_count += byte_count;
+ cbdata->n_flows++;
+}
+
+static int
+handle_aggregate_stats_request(struct ofproto *p, struct ofconn *ofconn,
+ const struct ofp_stats_request *osr,
+ size_t arg_size)
+{
+ struct ofp_aggregate_stats_request *asr;
+ struct ofp_aggregate_stats_reply *reply;
+ struct aggregate_stats_cbdata cbdata;
+ struct cls_rule target;
+ struct ofpbuf *msg;
+
+ if (arg_size != sizeof *asr) {
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
+ }
+ asr = (struct ofp_aggregate_stats_request *) osr->body;
+
+ COVERAGE_INC(ofproto_agg_request);
+ cbdata.ofproto = p;
+ cbdata.out_port = asr->out_port;
+ cbdata.packet_count = 0;
+ cbdata.byte_count = 0;
+ cbdata.n_flows = 0;
+ cls_rule_from_match(&target, &asr->match, 0);
+ classifier_for_each_match(&p->cls, &target,
+ table_id_to_include(asr->table_id),
+ aggregate_stats_cb, &cbdata);
+
+ msg = start_stats_reply(osr, sizeof *reply);
+ reply = append_stats_reply(sizeof *reply, ofconn, &msg);
+ reply->flow_count = htonl(cbdata.n_flows);
+ reply->packet_count = htonll(cbdata.packet_count);
+ reply->byte_count = htonll(cbdata.byte_count);
+ queue_tx(msg, ofconn, ofconn->reply_counter);
+ return 0;
+}
+
+static int
+handle_stats_request(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_header *oh)
+{
+ struct ofp_stats_request *osr;
+ size_t arg_size;
+ int error;
+
+ error = check_ofp_message_array(oh, OFPT_STATS_REQUEST, sizeof *osr,
+ 1, &arg_size);
+ if (error) {
+ return error;
+ }
+ osr = (struct ofp_stats_request *) oh;
+
+ switch (ntohs(osr->type)) {
+ case OFPST_DESC:
+ return handle_desc_stats_request(p, ofconn, osr);
+
+ case OFPST_FLOW:
+ return handle_flow_stats_request(p, ofconn, osr, arg_size);
+
+ case OFPST_AGGREGATE:
+ return handle_aggregate_stats_request(p, ofconn, osr, arg_size);
+
+ case OFPST_TABLE:
+ return handle_table_stats_request(p, ofconn, osr);
+
+ case OFPST_PORT:
+ return handle_port_stats_request(p, ofconn, osr);
+
+ case OFPST_VENDOR:
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_VENDOR);
+
+ default:
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_STAT);
+ }
+}
+
+static long long int
+msec_from_nsec(uint64_t sec, uint32_t nsec)
+{
+ return !sec ? 0 : sec * 1000 + nsec / 1000000;
+}
+
+static void
+update_time(struct ofproto *ofproto, struct rule *rule,
+ const struct odp_flow_stats *stats)
+{
+ long long int used = msec_from_nsec(stats->used_sec, stats->used_nsec);
+ if (used > rule->used) {
+ rule->used = used;
+ netflow_flow_update_time(ofproto->netflow, &rule->nf_flow, used);
+ }
+}
+
+static void
+update_stats(struct ofproto *ofproto, struct rule *rule,
+ const struct odp_flow_stats *stats)
+{
+ if (stats->n_packets) {
+ update_time(ofproto, rule, stats);
+ rule->packet_count += stats->n_packets;
+ rule->byte_count += stats->n_bytes;
+ netflow_flow_update_flags(&rule->nf_flow, stats->ip_tos,
+ stats->tcp_flags);
+ }
+}
+
+static int
+add_flow(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_flow_mod *ofm, size_t n_actions)
+{
+ struct ofpbuf *packet;
+ struct rule *rule;
+ uint16_t in_port;
+ int error;
+
+ rule = rule_create(p, NULL, (const union ofp_action *) ofm->actions,
+ n_actions, ntohs(ofm->idle_timeout),
+ ntohs(ofm->hard_timeout));
+ cls_rule_from_match(&rule->cr, &ofm->match, ntohs(ofm->priority));
+
+ packet = NULL;
+ error = 0;
+ if (ofm->buffer_id != htonl(UINT32_MAX)) {
+ error = pktbuf_retrieve(ofconn->pktbuf, ntohl(ofm->buffer_id),
+ &packet, &in_port);
+ }
+
+ rule_insert(p, rule, packet, in_port);
+ ofpbuf_delete(packet);
+ return error;
+}
+
+static int
+modify_flow(struct ofproto *p, const struct ofp_flow_mod *ofm,
+ size_t n_actions, uint16_t command, struct rule *rule)
+{
+ if (rule_is_hidden(rule)) {
+ return 0;
+ }
+
+ if (command == OFPFC_DELETE) {
+ rule_remove(p, rule);
+ } else {
+ size_t actions_len = n_actions * sizeof *rule->actions;
+
+ if (n_actions == rule->n_actions
+ && !memcmp(ofm->actions, rule->actions, actions_len))
+ {
+ return 0;
+ }
+
+ free(rule->actions);
+ rule->actions = xmemdup(ofm->actions, actions_len);
+ rule->n_actions = n_actions;
+
+ if (rule->cr.wc.wildcards) {
+ COVERAGE_INC(ofproto_mod_wc_flow);
+ p->need_revalidate = true;
+ } else {
+ rule_update_actions(p, rule);
+ }
+ }
+
+ return 0;
+}
+
+static int
+modify_flows_strict(struct ofproto *p, const struct ofp_flow_mod *ofm,
+ size_t n_actions, uint16_t command)
+{
+ struct rule *rule;
+ uint32_t wildcards;
+ flow_t flow;
+
+ flow_from_match(&flow, &wildcards, &ofm->match);
+ rule = rule_from_cls_rule(classifier_find_rule_exactly(
+ &p->cls, &flow, wildcards,
+ ntohs(ofm->priority)));
+
+ if (rule) {
+ if (command == OFPFC_DELETE
+ && ofm->out_port != htons(OFPP_NONE)
+ && !rule_has_out_port(rule, ofm->out_port)) {
+ return 0;
+ }
+
+ modify_flow(p, ofm, n_actions, command, rule);
+ }
+ return 0;
+}
+
+struct modify_flows_cbdata {
+ struct ofproto *ofproto;
+ const struct ofp_flow_mod *ofm;
+ uint16_t out_port;
+ size_t n_actions;
+ uint16_t command;
+};
+
+static void
+modify_flows_cb(struct cls_rule *rule_, void *cbdata_)
+{
+ struct rule *rule = rule_from_cls_rule(rule_);
+ struct modify_flows_cbdata *cbdata = cbdata_;
+
+ if (cbdata->out_port != htons(OFPP_NONE)
+ && !rule_has_out_port(rule, cbdata->out_port)) {
+ return;
+ }
+
+ modify_flow(cbdata->ofproto, cbdata->ofm, cbdata->n_actions,
+ cbdata->command, rule);
+}
+
+static int
+modify_flows_loose(struct ofproto *p, const struct ofp_flow_mod *ofm,
+ size_t n_actions, uint16_t command)
+{
+ struct modify_flows_cbdata cbdata;
+ struct cls_rule target;
+
+ cbdata.ofproto = p;
+ cbdata.ofm = ofm;
+ cbdata.out_port = (command == OFPFC_DELETE ? ofm->out_port
+ : htons(OFPP_NONE));
+ cbdata.n_actions = n_actions;
+ cbdata.command = command;
+
+ cls_rule_from_match(&target, &ofm->match, 0);
+
+ classifier_for_each_match(&p->cls, &target, CLS_INC_ALL,
+ modify_flows_cb, &cbdata);
+ return 0;
+}
+
+static int
+handle_flow_mod(struct ofproto *p, struct ofconn *ofconn,
+ struct ofp_flow_mod *ofm)
+{
+ size_t n_actions;
+ int error;
+
+ error = check_ofp_message_array(&ofm->header, OFPT_FLOW_MOD, sizeof *ofm,
+ sizeof *ofm->actions, &n_actions);
+ if (error) {
+ return error;
+ }
+
+ normalize_match(&ofm->match);
+ if (!ofm->match.wildcards) {
+ ofm->priority = htons(UINT16_MAX);
+ }
+
+ error = validate_actions((const union ofp_action *) ofm->actions,
+ n_actions, p->max_ports);
+ if (error) {
+ return error;
+ }
+
+ switch (ntohs(ofm->command)) {
+ case OFPFC_ADD:
+ return add_flow(p, ofconn, ofm, n_actions);
+
+ case OFPFC_MODIFY:
+ return modify_flows_loose(p, ofm, n_actions, OFPFC_MODIFY);
+
+ case OFPFC_MODIFY_STRICT:
+ return modify_flows_strict(p, ofm, n_actions, OFPFC_MODIFY);
+
+ case OFPFC_DELETE:
+ return modify_flows_loose(p, ofm, n_actions, OFPFC_DELETE);
+
+ case OFPFC_DELETE_STRICT:
+ return modify_flows_strict(p, ofm, n_actions, OFPFC_DELETE);
+
+ default:
+ return ofp_mkerr(OFPET_FLOW_MOD_FAILED, OFPFMFC_BAD_COMMAND);
+ }
+}
+
+static void
+send_capability_reply(struct ofproto *p, struct ofconn *ofconn, uint32_t xid)
+{
+ struct ofmp_capability_reply *ocr;
+ struct ofpbuf *b;
+ char capabilities[] = "com.nicira.mgmt.manager=false\n";
+
+ ocr = make_openflow_xid(sizeof(*ocr), OFPT_VENDOR, xid, &b);
+ ocr->header.header.vendor = htonl(NX_VENDOR_ID);
+ ocr->header.header.subtype = htonl(NXT_MGMT);
+ ocr->header.type = htons(OFMPT_CAPABILITY_REPLY);
+
+ ocr->format = htonl(OFMPCOF_SIMPLE);
+ ocr->mgmt_id = htonll(p->mgmt_id);
+
+ ofpbuf_put(b, capabilities, strlen(capabilities));
+
+ queue_tx(b, ofconn, ofconn->reply_counter);
+}
+
+static int
+handle_ofmp(struct ofproto *p, struct ofconn *ofconn,
+ struct ofmp_header *ofmph)
+{
+ size_t msg_len = ntohs(ofmph->header.header.length);
+ if (msg_len < sizeof(*ofmph)) {
+ VLOG_WARN_RL(&rl, "dropping short managment message: %zu\n", msg_len);
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
+ }
+
+ if (ofmph->type == htons(OFMPT_CAPABILITY_REQUEST)) {
+ struct ofmp_capability_request *ofmpcr;
+
+ if (msg_len < sizeof(struct ofmp_capability_request)) {
+ VLOG_WARN_RL(&rl, "dropping short capability request: %zu\n",
+ msg_len);
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
+ }
+
+ ofmpcr = (struct ofmp_capability_request *)ofmph;
+ if (ofmpcr->format != htonl(OFMPCAF_SIMPLE)) {
+ /* xxx Find a better type than bad subtype */
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE);
+ }
+
+ send_capability_reply(p, ofconn, ofmph->header.header.xid);
+ return 0;
+ } else {
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE);
+ }
+}
+
+static int
+handle_vendor(struct ofproto *p, struct ofconn *ofconn, void *msg)
+{
+ struct ofp_vendor_header *ovh = msg;
+ struct nicira_header *nh;
+
+ if (ntohs(ovh->header.length) < sizeof(struct ofp_vendor_header)) {
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
+ }
+ if (ovh->vendor != htonl(NX_VENDOR_ID)) {
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_VENDOR);
+ }
+ if (ntohs(ovh->header.length) < sizeof(struct nicira_header)) {
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
+ }
+
+ nh = msg;
+ switch (ntohl(nh->subtype)) {
+ case NXT_STATUS_REQUEST:
+ return switch_status_handle_request(p->switch_status, ofconn->rconn,
+ msg);
+
+ case NXT_ACT_SET_CONFIG:
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE); /* XXX */
+
+ case NXT_ACT_GET_CONFIG:
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE); /* XXX */
+
+ case NXT_COMMAND_REQUEST:
+ if (p->executer) {
+ return executer_handle_request(p->executer, ofconn->rconn, msg);
+ }
+ break;
+
+ case NXT_MGMT:
+ return handle_ofmp(p, ofconn, msg);
+ }
+
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE);
+}
+
+static void
+handle_openflow(struct ofconn *ofconn, struct ofproto *p,
+ struct ofpbuf *ofp_msg)
+{
+ struct ofp_header *oh = ofp_msg->data;
+ int error;
+
+ COVERAGE_INC(ofproto_recv_openflow);
+ switch (oh->type) {
+ case OFPT_ECHO_REQUEST:
+ error = handle_echo_request(ofconn, oh);
+ break;
+
+ case OFPT_ECHO_REPLY:
+ error = 0;
+ break;
+
+ case OFPT_FEATURES_REQUEST:
+ error = handle_features_request(p, ofconn, oh);
+ break;
+
+ case OFPT_GET_CONFIG_REQUEST:
+ error = handle_get_config_request(p, ofconn, oh);
+ break;
+
+ case OFPT_SET_CONFIG:
+ error = handle_set_config(p, ofconn, ofp_msg->data);
+ break;
+
+ case OFPT_PACKET_OUT:
+ error = handle_packet_out(p, ofconn, ofp_msg->data);
+ break;
+
+ case OFPT_PORT_MOD:
+ error = handle_port_mod(p, oh);
+ break;
+
+ case OFPT_FLOW_MOD:
+ error = handle_flow_mod(p, ofconn, ofp_msg->data);
+ break;
+
+ case OFPT_STATS_REQUEST:
+ error = handle_stats_request(p, ofconn, oh);
+ break;
+
+ case OFPT_VENDOR:
+ error = handle_vendor(p, ofconn, ofp_msg->data);
+ break;
+
+ default:
+ if (VLOG_IS_WARN_ENABLED()) {
+ char *s = ofp_to_string(oh, ntohs(oh->length), 2);
+ VLOG_DBG_RL(&rl, "OpenFlow message ignored: %s", s);
+ free(s);
+ }
+ error = ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
+ break;
+ }
+
+ if (error) {
+ send_error_oh(ofconn, ofp_msg->data, error);
+ }
+}
+\f
+static void
+handle_odp_msg(struct ofproto *p, struct ofpbuf *packet)
+{
+ struct odp_msg *msg = packet->data;
+ uint16_t in_port = odp_port_to_ofp_port(msg->port);
+ struct rule *rule;
+ struct ofpbuf payload;
+ flow_t flow;
+
+ /* Handle controller actions. */
+ if (msg->type == _ODPL_ACTION_NR) {
+ COVERAGE_INC(ofproto_ctlr_action);
+ pinsched_send(p->action_sched, in_port, packet,
+ send_packet_in_action, p);
+ return;
+ }
+
+ payload.data = msg + 1;
+ payload.size = msg->length - sizeof *msg;
+ flow_extract(&payload, msg->port, &flow);
+
+ /* Check with in-band control to see if this packet should be sent
+ * to the local port regardless of the flow table. */
+ if (in_band_msg_in_hook(p->in_band, &flow, &payload)) {
+ union odp_action action;
+
+ memset(&action, 0, sizeof(action));
+ action.output.type = ODPAT_OUTPUT;
+ action.output.port = ODPP_LOCAL;
+ dpif_execute(p->dpif, flow.in_port, &action, 1, &payload);
+ }
+
+ rule = lookup_valid_rule(p, &flow);
+ if (!rule) {
+ /* Don't send a packet-in if OFPPC_NO_PACKET_IN asserted. */
+ struct ofport *port = port_array_get(&p->ports, msg->port);
+ if (port) {
+ if (port->opp.config & OFPPC_NO_PACKET_IN) {
+ COVERAGE_INC(ofproto_no_packet_in);
+ /* XXX install 'drop' flow entry */
+ ofpbuf_delete(packet);
+ return;
+ }
+ } else {
+ VLOG_WARN_RL(&rl, "packet-in on unknown port %"PRIu16, msg->port);
+ }
+
+ COVERAGE_INC(ofproto_packet_in);
+ pinsched_send(p->miss_sched, in_port, packet, send_packet_in_miss, p);
+ return;
+ }
+
+ if (rule->cr.wc.wildcards) {
+ rule = rule_create_subrule(p, rule, &flow);
+ rule_make_actions(p, rule, packet);
+ } else {
+ if (!rule->may_install) {
+ /* The rule is not installable, that is, we need to process every
+ * packet, so process the current packet and set its actions into
+ * 'subrule'. */
+ rule_make_actions(p, rule, packet);
+ } else {
+ /* XXX revalidate rule if it needs it */
+ }
+ }
+
+ rule_execute(p, rule, &payload, &flow);
+ rule_reinstall(p, rule);
+
+ if (rule->super && rule->super->cr.priority == FAIL_OPEN_PRIORITY
+ && rconn_is_connected(p->controller->rconn)) {
+ /*
+ * Extra-special case for fail-open mode.
+ *
+ * We are in fail-open mode and the packet matched the fail-open rule,
+ * but we are connected to a controller too. We should send the packet
+ * up to the controller in the hope that it will try to set up a flow
+ * and thereby allow us to exit fail-open.
+ *
+ * See the top-level comment in fail-open.c for more information.
+ */
+ pinsched_send(p->miss_sched, in_port, packet, send_packet_in_miss, p);
+ } else {
+ ofpbuf_delete(packet);
+ }
+}
+\f
+static void
+revalidate_cb(struct cls_rule *sub_, void *cbdata_)
+{
+ struct rule *sub = rule_from_cls_rule(sub_);
+ struct revalidate_cbdata *cbdata = cbdata_;
+
+ if (cbdata->revalidate_all
+ || (cbdata->revalidate_subrules && sub->super)
+ || (tag_set_intersects(&cbdata->revalidate_set, sub->tags))) {
+ revalidate_rule(cbdata->ofproto, sub);
+ }
+}
+
+static bool
+revalidate_rule(struct ofproto *p, struct rule *rule)
+{
+ const flow_t *flow = &rule->cr.flow;
+
+ COVERAGE_INC(ofproto_revalidate_rule);
+ if (rule->super) {
+ struct rule *super;
+ super = rule_from_cls_rule(classifier_lookup_wild(&p->cls, flow));
+ if (!super) {
+ rule_remove(p, rule);
+ return false;
+ } else if (super != rule->super) {
+ COVERAGE_INC(ofproto_revalidate_moved);
+ list_remove(&rule->list);
+ list_push_back(&super->list, &rule->list);
+ rule->super = super;
+ rule->hard_timeout = super->hard_timeout;
+ rule->idle_timeout = super->idle_timeout;
+ rule->created = super->created;
+ rule->used = 0;
+ }
+ }
+
+ rule_update_actions(p, rule);
+ return true;
+}
+
+static struct ofpbuf *
+compose_flow_exp(const struct rule *rule, long long int now, uint8_t reason)
+{
+ struct ofp_flow_expired *ofe;
+ struct ofpbuf *buf;
+
+ ofe = make_openflow(sizeof *ofe, OFPT_FLOW_EXPIRED, &buf);
+ flow_to_match(&rule->cr.flow, rule->cr.wc.wildcards, &ofe->match);
+ ofe->priority = htons(rule->cr.priority);
+ ofe->reason = reason;
+ ofe->duration = htonl((now - rule->created) / 1000);
+ ofe->packet_count = htonll(rule->packet_count);
+ ofe->byte_count = htonll(rule->byte_count);
+
+ return buf;
+}
+
+static void
+send_flow_exp(struct ofproto *p, struct rule *rule,
+ long long int now, uint8_t reason)
+{
+ struct ofconn *ofconn;
+ struct ofconn *prev;
+ struct ofpbuf *buf = NULL;
+
+ /* We limit the maximum number of queued flow expirations it by accounting
+ * them under the counter for replies. That works because preventing
+ * OpenFlow requests from being processed also prevents new flows from
+ * being added (and expiring). (It also prevents processing OpenFlow
+ * requests that would not add new flows, so it is imperfect.) */
+
+ prev = NULL;
+ LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
+ if (ofconn->send_flow_exp && rconn_is_connected(ofconn->rconn)) {
+ if (prev) {
+ queue_tx(ofpbuf_clone(buf), prev, prev->reply_counter);
+ } else {
+ buf = compose_flow_exp(rule, now, reason);
+ }
+ prev = ofconn;
+ }
+ }
+ if (prev) {
+ queue_tx(buf, prev, prev->reply_counter);
+ }
+}
+
+static void
+uninstall_idle_flow(struct ofproto *ofproto, struct rule *rule)
+{
+ assert(rule->installed);
+ assert(!rule->cr.wc.wildcards);
+
+ if (rule->super) {
+ rule_remove(ofproto, rule);
+ } else {
+ rule_uninstall(ofproto, rule);
+ }
+}
+
+static void
+expire_rule(struct cls_rule *cls_rule, void *p_)
+{
+ struct ofproto *p = p_;
+ struct rule *rule = rule_from_cls_rule(cls_rule);
+ long long int hard_expire, idle_expire, expire, now;
+
+ hard_expire = (rule->hard_timeout
+ ? rule->created + rule->hard_timeout * 1000
+ : LLONG_MAX);
+ idle_expire = (rule->idle_timeout
+ && (rule->super || list_is_empty(&rule->list))
+ ? rule->used + rule->idle_timeout * 1000
+ : LLONG_MAX);
+ expire = MIN(hard_expire, idle_expire);
+
+ now = time_msec();
+ if (now < expire) {
+ if (rule->installed && now >= rule->used + 5000) {
+ uninstall_idle_flow(p, rule);
+ } else if (!rule->cr.wc.wildcards) {
+ active_timeout(p, rule);
+ }
+
+ return;
+ }
+
+ COVERAGE_INC(ofproto_expired);
+ if (rule->cr.wc.wildcards) {
+ /* Update stats. (This code will be a no-op if the rule expired
+ * due to an idle timeout, because in that case the rule has no
+ * subrules left.) */
+ struct rule *subrule, *next;
+ LIST_FOR_EACH_SAFE (subrule, next, struct rule, list, &rule->list) {
+ rule_remove(p, subrule);
+ }
+ }
+
+ send_flow_exp(p, rule, now,
+ (now >= hard_expire
+ ? OFPER_HARD_TIMEOUT : OFPER_IDLE_TIMEOUT));
+ rule_remove(p, rule);
+}
+
+static void
+active_timeout(struct ofproto *ofproto, struct rule *rule)
+{
+ if (ofproto->netflow && !is_controller_rule(rule) &&
+ netflow_active_timeout_expired(ofproto->netflow, &rule->nf_flow)) {
+ struct ofexpired expired;
+ struct odp_flow odp_flow;
+
+ /* Get updated flow stats. */
+ memset(&odp_flow, 0, sizeof odp_flow);
+ if (rule->installed) {
+ odp_flow.key = rule->cr.flow;
+ odp_flow.flags = ODPFF_ZERO_TCP_FLAGS;
+ dpif_flow_get(ofproto->dpif, &odp_flow);
+
+ if (odp_flow.stats.n_packets) {
+ update_time(ofproto, rule, &odp_flow.stats);
+ netflow_flow_update_flags(&rule->nf_flow, odp_flow.stats.ip_tos,
+ odp_flow.stats.tcp_flags);
+ }
+ }
+
+ expired.flow = rule->cr.flow;
+ expired.packet_count = rule->packet_count +
+ odp_flow.stats.n_packets;
+ expired.byte_count = rule->byte_count + odp_flow.stats.n_bytes;
+ expired.used = rule->used;
+
+ netflow_expire(ofproto->netflow, &rule->nf_flow, &expired);
+
+ /* Schedule us to send the accumulated records once we have
+ * collected all of them. */
+ poll_immediate_wake();
+ }
+}
+
+static void
+update_used(struct ofproto *p)
+{
+ struct odp_flow *flows;
+ size_t n_flows;
+ size_t i;
+ int error;
+
+ error = dpif_flow_list_all(p->dpif, &flows, &n_flows);
+ if (error) {
+ return;
+ }
+
+ for (i = 0; i < n_flows; i++) {
+ struct odp_flow *f = &flows[i];
+ struct rule *rule;
+
+ rule = rule_from_cls_rule(
+ classifier_find_rule_exactly(&p->cls, &f->key, 0, UINT16_MAX));
+ if (!rule || !rule->installed) {
+ COVERAGE_INC(ofproto_unexpected_rule);
+ dpif_flow_del(p->dpif, f);
+ continue;
+ }
+
+ update_time(p, rule, &f->stats);
+ rule_account(p, rule, f->stats.n_bytes);
+ }
+ free(flows);
+}
+
+static void
+do_send_packet_in(struct ofconn *ofconn, uint32_t buffer_id,
+ const struct ofpbuf *packet, int send_len)
+{
+ struct odp_msg *msg = packet->data;
+ struct ofpbuf payload;
+ struct ofpbuf *opi;
+ uint8_t reason;
+
+ /* Extract packet payload from 'msg'. */
+ payload.data = msg + 1;
+ payload.size = msg->length - sizeof *msg;
+
+ /* Construct ofp_packet_in message. */
+ reason = msg->type == _ODPL_ACTION_NR ? OFPR_ACTION : OFPR_NO_MATCH;
+ opi = make_packet_in(buffer_id, odp_port_to_ofp_port(msg->port), reason,
+ &payload, send_len);
+
+ /* Send. */
+ rconn_send_with_limit(ofconn->rconn, opi, ofconn->packet_in_counter, 100);
+}
+
+static void
+send_packet_in_action(struct ofpbuf *packet, void *p_)
+{
+ struct ofproto *p = p_;
+ struct ofconn *ofconn;
+ struct odp_msg *msg;
+
+ msg = packet->data;
+ LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
+ if (ofconn == p->controller || ofconn->miss_send_len) {
+ do_send_packet_in(ofconn, UINT32_MAX, packet, msg->arg);
+ }
+ }
+ ofpbuf_delete(packet);
+}
+
+static void
+send_packet_in_miss(struct ofpbuf *packet, void *p_)
+{
+ struct ofproto *p = p_;
+ bool in_fail_open = p->fail_open && fail_open_is_active(p->fail_open);
+ struct ofconn *ofconn;
+ struct ofpbuf payload;
+ struct odp_msg *msg;
+
+ msg = packet->data;
+ payload.data = msg + 1;
+ payload.size = msg->length - sizeof *msg;
+ LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
+ if (ofconn->miss_send_len) {
+ struct pktbuf *pb = ofconn->pktbuf;
+ uint32_t buffer_id = (in_fail_open
+ ? pktbuf_get_null()
+ : pktbuf_save(pb, &payload, msg->port));
+ int send_len = (buffer_id != UINT32_MAX ? ofconn->miss_send_len
+ : UINT32_MAX);
+ do_send_packet_in(ofconn, buffer_id, packet, send_len);
+ }
+ }
+ ofpbuf_delete(packet);
+}
+
+static uint64_t
+pick_datapath_id(const struct ofproto *ofproto)
+{
+ const struct ofport *port;
+
+ port = port_array_get(&ofproto->ports, ODPP_LOCAL);
+ if (port) {
+ uint8_t ea[ETH_ADDR_LEN];
+ int error;
+
+ error = netdev_get_etheraddr(port->netdev, ea);
+ if (!error) {
+ return eth_addr_to_uint64(ea);
+ }
+ VLOG_WARN("could not get MAC address for %s (%s)",
+ netdev_get_name(port->netdev), strerror(error));
+ }
+ return ofproto->fallback_dpid;
+}
+
+static uint64_t
+pick_fallback_dpid(void)
+{
+ uint8_t ea[ETH_ADDR_LEN];
+ eth_addr_random(ea);
+ ea[0] = 0x00; /* Set Nicira OUI. */
+ ea[1] = 0x23;
+ ea[2] = 0x20;
+ return eth_addr_to_uint64(ea);
+}
+\f
+static bool
+default_normal_ofhook_cb(const flow_t *flow, const struct ofpbuf *packet,
+ struct odp_actions *actions, tag_type *tags,
+ uint16_t *nf_output_iface, void *ofproto_)
+{
+ struct ofproto *ofproto = ofproto_;
+ int out_port;
+
+ /* Drop frames for reserved multicast addresses. */
+ if (eth_addr_is_reserved(flow->dl_dst)) {
+ return true;
+ }
+
+ /* Learn source MAC (but don't try to learn from revalidation). */
+ if (packet != NULL) {
+ tag_type rev_tag = mac_learning_learn(ofproto->ml, flow->dl_src,
+ 0, flow->in_port);
+ if (rev_tag) {
+ /* The log messages here could actually be useful in debugging,
+ * so keep the rate limit relatively high. */
+ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(30, 300);
+ VLOG_DBG_RL(&rl, "learned that "ETH_ADDR_FMT" is on port %"PRIu16,
+ ETH_ADDR_ARGS(flow->dl_src), flow->in_port);
+ ofproto_revalidate(ofproto, rev_tag);
+ }
+ }
+
+ /* Determine output port. */
+ out_port = mac_learning_lookup_tag(ofproto->ml, flow->dl_dst, 0, tags);
+ if (out_port < 0) {
+ add_output_group_action(actions, DP_GROUP_FLOOD, nf_output_iface);
+ } else if (out_port != flow->in_port) {
+ odp_actions_add(actions, ODPAT_OUTPUT)->output.port = out_port;
+ *nf_output_iface = out_port;
+ } else {
+ /* Drop. */
+ }
+
+ return true;
+}
+
+static const struct ofhooks default_ofhooks = {
+ NULL,
+ default_normal_ofhook_cb,
+ NULL,
+ NULL
+};
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef OFPROTO_H
+#define OFPROTO_H 1
+
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include "flow.h"
+#include "netflow.h"
+#include "tag.h"
+
+struct odp_actions;
+struct ofhooks;
+struct ofproto;
+struct svec;
+
+struct ofexpired {
+ flow_t flow;
+ uint64_t packet_count; /* Packets from subrules. */
+ uint64_t byte_count; /* Bytes from subrules. */
+ long long int used; /* Last-used time (0 if never used). */
+};
+
+int ofproto_create(const char *datapath, const struct ofhooks *, void *aux,
+ struct ofproto **ofprotop);
+void ofproto_destroy(struct ofproto *);
+int ofproto_run(struct ofproto *);
+int ofproto_run1(struct ofproto *);
+int ofproto_run2(struct ofproto *, bool revalidate_all);
+void ofproto_wait(struct ofproto *);
+bool ofproto_is_alive(const struct ofproto *);
+
+/* Configuration. */
+void ofproto_set_datapath_id(struct ofproto *, uint64_t datapath_id);
+void ofproto_set_mgmt_id(struct ofproto *, uint64_t mgmt_id);
+void ofproto_set_probe_interval(struct ofproto *, int probe_interval);
+void ofproto_set_max_backoff(struct ofproto *, int max_backoff);
+void ofproto_set_desc(struct ofproto *,
+ const char *manufacturer, const char *hardware,
+ const char *software, const char *serial);
+int ofproto_set_in_band(struct ofproto *, bool in_band);
+int ofproto_set_discovery(struct ofproto *, bool discovery,
+ const char *accept_controller_re,
+ bool update_resolv_conf);
+int ofproto_set_controller(struct ofproto *, const char *controller);
+int ofproto_set_listeners(struct ofproto *, const struct svec *listeners);
+int ofproto_set_snoops(struct ofproto *, const struct svec *snoops);
+int ofproto_set_netflow(struct ofproto *,
+ const struct netflow_options *nf_options);
+void ofproto_set_failure(struct ofproto *, bool fail_open);
+void ofproto_set_rate_limit(struct ofproto *, int rate_limit, int burst_limit);
+int ofproto_set_stp(struct ofproto *, bool enable_stp);
+int ofproto_set_remote_execution(struct ofproto *, const char *command_acl,
+ const char *command_dir);
+
+/* Configuration querying. */
+uint64_t ofproto_get_datapath_id(const struct ofproto *);
+uint64_t ofproto_get_mgmt_id(const struct ofproto *);
+int ofproto_get_probe_interval(const struct ofproto *);
+int ofproto_get_max_backoff(const struct ofproto *);
+bool ofproto_get_in_band(const struct ofproto *);
+bool ofproto_get_discovery(const struct ofproto *);
+const char *ofproto_get_controller(const struct ofproto *);
+void ofproto_get_listeners(const struct ofproto *, struct svec *);
+void ofproto_get_snoops(const struct ofproto *, struct svec *);
+void ofproto_get_all_flows(struct ofproto *p, struct ds *);
+
+/* Functions for use by ofproto implementation modules, not by clients. */
+int ofproto_send_packet(struct ofproto *, const flow_t *,
+ const union ofp_action *, size_t n_actions,
+ const struct ofpbuf *);
+void ofproto_add_flow(struct ofproto *, const flow_t *, uint32_t wildcards,
+ unsigned int priority,
+ const union ofp_action *, size_t n_actions,
+ int idle_timeout);
+void ofproto_delete_flow(struct ofproto *, const flow_t *, uint32_t wildcards,
+ unsigned int priority);
+void ofproto_flush_flows(struct ofproto *);
+
+/* Hooks for ovs-vswitchd. */
+struct ofhooks {
+ void (*port_changed_cb)(enum ofp_port_reason, const struct ofp_phy_port *,
+ void *aux);
+ bool (*normal_cb)(const flow_t *, const struct ofpbuf *packet,
+ struct odp_actions *, tag_type *,
+ uint16_t *nf_output_iface, void *aux);
+ void (*account_flow_cb)(const flow_t *, const union odp_action *,
+ size_t n_actions, unsigned long long int n_bytes,
+ void *aux);
+ void (*account_checkpoint_cb)(void *aux);
+};
+void ofproto_revalidate(struct ofproto *, tag_type);
+struct tag_set *ofproto_get_revalidate_set(struct ofproto *);
+
+#endif /* ofproto.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "pinsched.h"
+#include <arpa/inet.h>
+#include <stdlib.h>
+#include "ofpbuf.h"
+#include "openflow/openflow.h"
+#include "poll-loop.h"
+#include "port-array.h"
+#include "queue.h"
+#include "random.h"
+#include "rconn.h"
+#include "status.h"
+#include "timeval.h"
+#include "vconn.h"
+
+struct pinsched {
+ /* Client-supplied parameters. */
+ int rate_limit; /* Packets added to bucket per second. */
+ int burst_limit; /* Maximum token bucket size, in packets. */
+
+ /* One queue per physical port. */
+ struct port_array queues; /* Array of "struct ovs_queue *". */
+ int n_queued; /* Sum over queues[*].n. */
+ unsigned int last_tx_port; /* Last port checked in round-robin. */
+
+ /* Token bucket.
+ *
+ * It costs 1000 tokens to send a single packet_in message. A single token
+ * per message would be more straightforward, but this choice lets us avoid
+ * round-off error in refill_bucket()'s calculation of how many tokens to
+ * add to the bucket, since no division step is needed. */
+ long long int last_fill; /* Time at which we last added tokens. */
+ int tokens; /* Current number of tokens. */
+
+ /* Transmission queue. */
+ int n_txq; /* No. of packets waiting in rconn for tx. */
+
+ /* Statistics reporting. */
+ unsigned long long n_normal; /* # txed w/o rate limit queuing. */
+ unsigned long long n_limited; /* # queued for rate limiting. */
+ unsigned long long n_queue_dropped; /* # dropped due to queue overflow. */
+
+ /* Switch status. */
+ struct status_category *ss_cat;
+};
+
+static struct ofpbuf *
+dequeue_packet(struct pinsched *ps, struct ovs_queue *q,
+ unsigned int port_no)
+{
+ struct ofpbuf *packet = queue_pop_head(q);
+ if (!q->n) {
+ free(q);
+ port_array_set(&ps->queues, port_no, NULL);
+ }
+ ps->n_queued--;
+ return packet;
+}
+
+/* Drop a packet from the longest queue in 'ps'. */
+static void
+drop_packet(struct pinsched *ps)
+{
+ struct ovs_queue *longest; /* Queue currently selected as longest. */
+ int n_longest; /* # of queues of same length as 'longest'. */
+ unsigned int longest_port_no;
+ unsigned int port_no;
+ struct ovs_queue *q;
+
+ ps->n_queue_dropped++;
+
+ longest = port_array_first(&ps->queues, &port_no);
+ longest_port_no = port_no;
+ n_longest = 1;
+ while ((q = port_array_next(&ps->queues, &port_no)) != NULL) {
+ if (longest->n < q->n) {
+ longest = q;
+ n_longest = 1;
+ } else if (longest->n == q->n) {
+ n_longest++;
+
+ /* Randomly select one of the longest queues, with a uniform
+ * distribution (Knuth algorithm 3.4.2R). */
+ if (!random_range(n_longest)) {
+ longest = q;
+ longest_port_no = port_no;
+ }
+ }
+ }
+
+ /* FIXME: do we want to pop the tail instead? */
+ ofpbuf_delete(dequeue_packet(ps, longest, longest_port_no));
+}
+
+/* Remove and return the next packet to transmit (in round-robin order). */
+static struct ofpbuf *
+get_tx_packet(struct pinsched *ps)
+{
+ struct ovs_queue *q = port_array_next(&ps->queues, &ps->last_tx_port);
+ if (!q) {
+ q = port_array_first(&ps->queues, &ps->last_tx_port);
+ }
+ return dequeue_packet(ps, q, ps->last_tx_port);
+}
+
+/* Add tokens to the bucket based on elapsed time. */
+static void
+refill_bucket(struct pinsched *ps)
+{
+ long long int now = time_msec();
+ long long int tokens = (now - ps->last_fill) * ps->rate_limit + ps->tokens;
+ if (tokens >= 1000) {
+ ps->last_fill = now;
+ ps->tokens = MIN(tokens, ps->burst_limit * 1000);
+ }
+}
+
+/* Attempts to remove enough tokens from 'ps' to transmit a packet. Returns
+ * true if successful, false otherwise. (In the latter case no tokens are
+ * removed.) */
+static bool
+get_token(struct pinsched *ps)
+{
+ if (ps->tokens >= 1000) {
+ ps->tokens -= 1000;
+ return true;
+ } else {
+ return false;
+ }
+}
+
+void
+pinsched_send(struct pinsched *ps, uint16_t port_no,
+ struct ofpbuf *packet, pinsched_tx_cb *cb, void *aux)
+{
+ if (!ps) {
+ cb(packet, aux);
+ } else if (!ps->n_queued && get_token(ps)) {
+ /* In the common case where we are not constrained by the rate limit,
+ * let the packet take the normal path. */
+ ps->n_normal++;
+ cb(packet, aux);
+ } else {
+ /* Otherwise queue it up for the periodic callback to drain out. */
+ struct ovs_queue *q;
+
+ /* We are called with a buffer obtained from dpif_recv() that has much
+ * more allocated space than actual content most of the time. Since
+ * we're going to store the packet for some time, free up that
+ * otherwise wasted space. */
+ ofpbuf_trim(packet);
+
+ if (ps->n_queued >= ps->burst_limit) {
+ drop_packet(ps);
+ }
+ q = port_array_get(&ps->queues, port_no);
+ if (!q) {
+ q = xmalloc(sizeof *q);
+ queue_init(q);
+ port_array_set(&ps->queues, port_no, q);
+ }
+ queue_push_tail(q, packet);
+ ps->n_queued++;
+ ps->n_limited++;
+ }
+}
+
+static void
+pinsched_status_cb(struct status_reply *sr, void *ps_)
+{
+ struct pinsched *ps = ps_;
+
+ status_reply_put(sr, "normal=%llu", ps->n_normal);
+ status_reply_put(sr, "limited=%llu", ps->n_limited);
+ status_reply_put(sr, "queue-dropped=%llu", ps->n_queue_dropped);
+}
+
+void
+pinsched_run(struct pinsched *ps, pinsched_tx_cb *cb, void *aux)
+{
+ if (ps) {
+ int i;
+
+ /* Drain some packets out of the bucket if possible, but limit the
+ * number of iterations to allow other code to get work done too. */
+ refill_bucket(ps);
+ for (i = 0; ps->n_queued && get_token(ps) && i < 50; i++) {
+ cb(get_tx_packet(ps), aux);
+ }
+ }
+}
+
+void
+pinsched_wait(struct pinsched *ps)
+{
+ if (ps && ps->n_queued) {
+ if (ps->tokens >= 1000) {
+ /* We can transmit more packets as soon as we're called again. */
+ poll_immediate_wake();
+ } else {
+ /* We have to wait for the bucket to re-fill. We could calculate
+ * the exact amount of time here for increased smoothness. */
+ poll_timer_wait(TIME_UPDATE_INTERVAL / 2);
+ }
+ }
+}
+
+/* Creates and returns a scheduler for sending packet-in messages. */
+struct pinsched *
+pinsched_create(int rate_limit, int burst_limit, struct switch_status *ss)
+{
+ struct pinsched *ps;
+
+ ps = xcalloc(1, sizeof *ps);
+ port_array_init(&ps->queues);
+ ps->n_queued = 0;
+ ps->last_tx_port = PORT_ARRAY_SIZE;
+ ps->last_fill = time_msec();
+ ps->tokens = rate_limit * 100;
+ ps->n_txq = 0;
+ ps->n_normal = 0;
+ ps->n_limited = 0;
+ ps->n_queue_dropped = 0;
+ pinsched_set_limits(ps, rate_limit, burst_limit);
+
+ if (ss) {
+ ps->ss_cat = switch_status_register(ss, "rate-limit",
+ pinsched_status_cb, ps);
+ }
+
+ return ps;
+}
+
+void
+pinsched_destroy(struct pinsched *ps)
+{
+ if (ps) {
+ struct ovs_queue *queue;
+ unsigned int port_no;
+
+ PORT_ARRAY_FOR_EACH (queue, &ps->queues, port_no) {
+ queue_destroy(queue);
+ free(queue);
+ }
+ port_array_destroy(&ps->queues);
+ switch_status_unregister(ps->ss_cat);
+ free(ps);
+ }
+}
+
+void
+pinsched_set_limits(struct pinsched *ps, int rate_limit, int burst_limit)
+{
+ if (rate_limit <= 0) {
+ rate_limit = 1000;
+ }
+ if (burst_limit <= 0) {
+ burst_limit = rate_limit / 4;
+ }
+ burst_limit = MAX(burst_limit, 1);
+ burst_limit = MIN(burst_limit, INT_MAX / 1000);
+
+ ps->rate_limit = rate_limit;
+ ps->burst_limit = burst_limit;
+ while (ps->n_queued > burst_limit) {
+ drop_packet(ps);
+ }
+}
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef PINSCHED_H
+#define PINSCHED_H_H 1
+
+#include <stdint.h>
+
+struct ofpbuf;
+struct switch_status;
+
+typedef void pinsched_tx_cb(struct ofpbuf *, void *aux);
+struct pinsched *pinsched_create(int rate_limit, int burst_limit,
+ struct switch_status *);
+void pinsched_set_limits(struct pinsched *, int rate_limit, int burst_limit);
+void pinsched_destroy(struct pinsched *);
+void pinsched_send(struct pinsched *, uint16_t port_no, struct ofpbuf *,
+ pinsched_tx_cb *, void *aux);
+void pinsched_run(struct pinsched *, pinsched_tx_cb *, void *aux);
+void pinsched_wait(struct pinsched *);
+
+#endif /* pinsched.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "pktbuf.h"
+#include <inttypes.h>
+#include <stdlib.h>
+#include "coverage.h"
+#include "ofpbuf.h"
+#include "timeval.h"
+#include "util.h"
+#include "vconn.h"
+
+#define THIS_MODULE VLM_pktbuf
+#include "vlog.h"
+
+/* Buffers are identified by a 32-bit opaque ID. We divide the ID
+ * into a buffer number (low bits) and a cookie (high bits). The buffer number
+ * is an index into an array of buffers. The cookie distinguishes between
+ * different packets that have occupied a single buffer. Thus, the more
+ * buffers we have, the lower-quality the cookie... */
+#define PKTBUF_BITS 8
+#define PKTBUF_MASK (PKTBUF_CNT - 1)
+#define PKTBUF_CNT (1u << PKTBUF_BITS)
+
+#define COOKIE_BITS (32 - PKTBUF_BITS)
+#define COOKIE_MAX ((1u << COOKIE_BITS) - 1)
+
+#define OVERWRITE_MSECS 5000
+
+struct packet {
+ struct ofpbuf *buffer;
+ uint32_t cookie;
+ long long int timeout;
+ uint16_t in_port;
+};
+
+struct pktbuf {
+ struct packet packets[PKTBUF_CNT];
+ unsigned int buffer_idx;
+ unsigned int null_idx;
+};
+
+int
+pktbuf_capacity(void)
+{
+ return PKTBUF_CNT;
+}
+
+struct pktbuf *
+pktbuf_create(void)
+{
+ return xcalloc(1, sizeof *pktbuf_create());
+}
+
+void
+pktbuf_destroy(struct pktbuf *pb)
+{
+ if (pb) {
+ size_t i;
+
+ for (i = 0; i < PKTBUF_CNT; i++) {
+ ofpbuf_delete(pb->packets[i].buffer);
+ }
+ free(pb);
+ }
+}
+
+static unsigned int
+make_id(unsigned int buffer_idx, unsigned int cookie)
+{
+ return buffer_idx | (cookie << PKTBUF_BITS);
+}
+
+/* Attempts to allocate an OpenFlow packet buffer id within 'pb'. The packet
+ * buffer will store a copy of 'buffer' and the port number 'in_port', which
+ * should be the datapath port number on which 'buffer' was received.
+ *
+ * If successful, returns the packet buffer id (a number other than
+ * UINT32_MAX). pktbuf_retrieve() can later be used to retrieve the buffer and
+ * its input port number (buffers do expire after a time, so this is not
+ * guaranteed to be true forever). On failure, returns UINT32_MAX.
+ *
+ * The caller retains ownership of 'buffer'. */
+uint32_t
+pktbuf_save(struct pktbuf *pb, struct ofpbuf *buffer, uint16_t in_port)
+{
+ struct packet *p = &pb->packets[pb->buffer_idx];
+ pb->buffer_idx = (pb->buffer_idx + 1) & PKTBUF_MASK;
+ if (p->buffer) {
+ if (time_msec() < p->timeout) {
+ return UINT32_MAX;
+ }
+ ofpbuf_delete(p->buffer);
+ }
+
+ /* Don't use maximum cookie value since all-1-bits ID is special. */
+ if (++p->cookie >= COOKIE_MAX) {
+ p->cookie = 0;
+ }
+ p->buffer = ofpbuf_clone(buffer);
+ p->timeout = time_msec() + OVERWRITE_MSECS;
+ p->in_port = in_port;
+ return make_id(p - pb->packets, p->cookie);
+}
+
+/*
+ * Allocates and returns a "null" packet buffer id. The returned packet buffer
+ * id is considered valid by pktbuf_retrieve(), but it is not associated with
+ * actual buffered data.
+ *
+ * This function is always successful.
+ *
+ * This is useful in one special case: with the current OpenFlow design, the
+ * "fail-open" code cannot always know whether a connection to a controller is
+ * actually valid until it receives a OFPT_PACKET_OUT or OFPT_FLOW_MOD request,
+ * but at that point the packet in question has already been forwarded (since
+ * we are still in "fail-open" mode). If the packet was buffered in the usual
+ * way, then the OFPT_PACKET_OUT or OFPT_FLOW_MOD would cause a duplicate
+ * packet in the network. Null packet buffer ids identify such a packet that
+ * has already been forwarded, so that Open vSwitch can quietly ignore the
+ * request to re-send it. (After that happens, the switch exits fail-open
+ * mode.)
+ *
+ * See the top-level comment in fail-open.c for an overview.
+ */
+uint32_t
+pktbuf_get_null(void)
+{
+ return make_id(0, COOKIE_MAX);
+}
+
+/* Attempts to retrieve a saved packet with the given 'id' from 'pb'. Returns
+ * 0 if successful, otherwise an OpenFlow error code constructed with
+ * ofp_mkerr().
+ *
+ * On success, ordinarily stores the buffered packet in '*bufferp' and the
+ * datapath port number on which the packet was received in '*in_port'. The
+ * caller becomes responsible for freeing the buffer. However, if 'id'
+ * identifies a "null" packet buffer (created with pktbuf_get_null()), stores
+ * NULL in '*bufferp' and -1 in '*in_port'.
+ *
+ * On failure, stores NULL in in '*bufferp' and -1 in '*in_port'. */
+int
+pktbuf_retrieve(struct pktbuf *pb, uint32_t id, struct ofpbuf **bufferp,
+ uint16_t *in_port)
+{
+ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 20);
+ struct packet *p;
+ int error;
+
+ if (!pb) {
+ VLOG_WARN_RL(&rl, "attempt to send buffered packet via connection "
+ "without buffers");
+ return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_COOKIE);
+ }
+
+ p = &pb->packets[id & PKTBUF_MASK];
+ if (p->cookie == id >> PKTBUF_BITS) {
+ struct ofpbuf *buffer = p->buffer;
+ if (buffer) {
+ *bufferp = buffer;
+ *in_port = p->in_port;
+ p->buffer = NULL;
+ COVERAGE_INC(pktbuf_retrieved);
+ return 0;
+ } else {
+ COVERAGE_INC(pktbuf_reuse_error);
+ VLOG_WARN_RL(&rl, "attempt to reuse buffer %08"PRIx32, id);
+ error = ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BUFFER_EMPTY);
+ }
+ } else if (id >> PKTBUF_BITS != COOKIE_MAX) {
+ COVERAGE_INC(pktbuf_bad_cookie);
+ VLOG_WARN_RL(&rl, "cookie mismatch: %08"PRIx32" != %08"PRIx32,
+ id, (id & PKTBUF_MASK) | (p->cookie << PKTBUF_BITS));
+ error = ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_COOKIE);
+ } else {
+ COVERAGE_INC(pktbuf_null_cookie);
+ VLOG_INFO_RL(&rl, "Received null cookie %08"PRIx32" (this is normal "
+ "if the switch was recently in fail-open mode)", id);
+ error = 0;
+ }
+ *bufferp = NULL;
+ *in_port = -1;
+ return error;
+}
+
+void
+pktbuf_discard(struct pktbuf *pb, uint32_t id)
+{
+ struct packet *p = &pb->packets[id & PKTBUF_MASK];
+ if (p->cookie == id >> PKTBUF_BITS) {
+ ofpbuf_delete(p->buffer);
+ p->buffer = NULL;
+ }
+}
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef PKTBUF_H
+#define PKTBUF_H 1
+
+#include <stdint.h>
+
+struct pktbuf;
+struct ofpbuf;
+
+int pktbuf_capacity(void);
+
+struct pktbuf *pktbuf_create(void);
+void pktbuf_destroy(struct pktbuf *);
+uint32_t pktbuf_save(struct pktbuf *, struct ofpbuf *buffer, uint16_t in_port);
+uint32_t pktbuf_get_null(void);
+int pktbuf_retrieve(struct pktbuf *, uint32_t id, struct ofpbuf **bufferp,
+ uint16_t *in_port);
+void pktbuf_discard(struct pktbuf *, uint32_t id);
+
+#endif /* pktbuf.h */
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "status.h"
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <inttypes.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "dynamic-string.h"
+#include "list.h"
+#include "ofpbuf.h"
+#include "ofproto.h"
+#include "openflow/nicira-ext.h"
+#include "packets.h"
+#include "rconn.h"
+#include "svec.h"
+#include "timeval.h"
+#include "vconn.h"
+
+#define THIS_MODULE VLM_status
+#include "vlog.h"
+
+struct status_category {
+ struct list node;
+ char *name;
+ void (*cb)(struct status_reply *, void *aux);
+ void *aux;
+};
+
+struct switch_status {
+ time_t booted;
+ struct status_category *config_cat;
+ struct status_category *switch_cat;
+ struct list categories;
+};
+
+struct status_reply {
+ struct status_category *category;
+ struct ds request;
+ struct ds output;
+};
+
+int
+switch_status_handle_request(struct switch_status *ss, struct rconn *rconn,
+ struct nicira_header *request)
+{
+ struct status_category *c;
+ struct nicira_header *reply;
+ struct status_reply sr;
+ struct ofpbuf *b;
+ int retval;
+
+ sr.request.string = (void *) (request + 1);
+ sr.request.length = ntohs(request->header.length) - sizeof *request;
+ ds_init(&sr.output);
+ LIST_FOR_EACH (c, struct status_category, node, &ss->categories) {
+ if (!memcmp(c->name, sr.request.string,
+ MIN(strlen(c->name), sr.request.length))) {
+ sr.category = c;
+ c->cb(&sr, c->aux);
+ }
+ }
+ reply = make_openflow_xid(sizeof *reply + sr.output.length,
+ OFPT_VENDOR, request->header.xid, &b);
+ reply->vendor = htonl(NX_VENDOR_ID);
+ reply->subtype = htonl(NXT_STATUS_REPLY);
+ memcpy(reply + 1, sr.output.string, sr.output.length);
+ retval = rconn_send(rconn, b, NULL);
+ if (retval && retval != EAGAIN) {
+ VLOG_WARN("send failed (%s)", strerror(retval));
+ }
+ ds_destroy(&sr.output);
+ return 0;
+}
+
+void
+rconn_status_cb(struct status_reply *sr, void *rconn_)
+{
+ struct rconn *rconn = rconn_;
+ time_t now = time_now();
+ uint32_t remote_ip = rconn_get_remote_ip(rconn);
+ uint32_t local_ip = rconn_get_local_ip(rconn);
+
+ status_reply_put(sr, "name=%s", rconn_get_name(rconn));
+ if (remote_ip) {
+ status_reply_put(sr, "remote-ip="IP_FMT, IP_ARGS(&remote_ip));
+ status_reply_put(sr, "remote-port=%d",
+ ntohs(rconn_get_remote_port(rconn)));
+ status_reply_put(sr, "local-ip="IP_FMT, IP_ARGS(&local_ip));
+ status_reply_put(sr, "local-port=%d",
+ ntohs(rconn_get_local_port(rconn)));
+ }
+ status_reply_put(sr, "state=%s", rconn_get_state(rconn));
+ status_reply_put(sr, "backoff=%d", rconn_get_backoff(rconn));
+ status_reply_put(sr, "probe-interval=%d", rconn_get_probe_interval(rconn));
+ status_reply_put(sr, "is-connected=%s",
+ rconn_is_connected(rconn) ? "true" : "false");
+ status_reply_put(sr, "sent-msgs=%u", rconn_packets_sent(rconn));
+ status_reply_put(sr, "received-msgs=%u", rconn_packets_received(rconn));
+ status_reply_put(sr, "attempted-connections=%u",
+ rconn_get_attempted_connections(rconn));
+ status_reply_put(sr, "successful-connections=%u",
+ rconn_get_successful_connections(rconn));
+ status_reply_put(sr, "last-connection=%ld",
+ (long int) (now - rconn_get_last_connection(rconn)));
+ status_reply_put(sr, "last-received=%ld",
+ (long int) (now - rconn_get_last_received(rconn)));
+ status_reply_put(sr, "time-connected=%lu",
+ rconn_get_total_time_connected(rconn));
+ status_reply_put(sr, "state-elapsed=%u", rconn_get_state_elapsed(rconn));
+}
+
+static void
+config_status_cb(struct status_reply *sr, void *ofproto_)
+{
+ const struct ofproto *ofproto = ofproto_;
+ uint64_t datapath_id, mgmt_id;
+ struct svec listeners;
+ int probe_interval, max_backoff;
+ size_t i;
+
+ datapath_id = ofproto_get_datapath_id(ofproto);
+ if (datapath_id) {
+ status_reply_put(sr, "datapath-id=%"PRIx64, datapath_id);
+ }
+
+ mgmt_id = ofproto_get_mgmt_id(ofproto);
+ if (mgmt_id) {
+ status_reply_put(sr, "mgmt-id=%"PRIx64, mgmt_id);
+ }
+
+ svec_init(&listeners);
+ ofproto_get_listeners(ofproto, &listeners);
+ for (i = 0; i < listeners.n; i++) {
+ status_reply_put(sr, "management%zu=%s", i, listeners.names[i]);
+ }
+ svec_destroy(&listeners);
+
+ probe_interval = ofproto_get_probe_interval(ofproto);
+ if (probe_interval) {
+ status_reply_put(sr, "probe-interval=%d", probe_interval);
+ }
+
+ max_backoff = ofproto_get_max_backoff(ofproto);
+ if (max_backoff) {
+ status_reply_put(sr, "max-backoff=%d", max_backoff);
+ }
+}
+
+static void
+switch_status_cb(struct status_reply *sr, void *ss_)
+{
+ struct switch_status *ss = ss_;
+ time_t now = time_now();
+
+ status_reply_put(sr, "now=%ld", (long int) now);
+ status_reply_put(sr, "uptime=%ld", (long int) (now - ss->booted));
+ status_reply_put(sr, "pid=%ld", (long int) getpid());
+}
+
+struct switch_status *
+switch_status_create(const struct ofproto *ofproto)
+{
+ struct switch_status *ss = xcalloc(1, sizeof *ss);
+ ss->booted = time_now();
+ list_init(&ss->categories);
+ ss->config_cat = switch_status_register(ss, "config", config_status_cb,
+ (void *) ofproto);
+ ss->switch_cat = switch_status_register(ss, "switch", switch_status_cb,
+ ss);
+ return ss;
+}
+
+void
+switch_status_destroy(struct switch_status *ss)
+{
+ if (ss) {
+ /* Orphan any remaining categories, so that unregistering them later
+ * won't write to bad memory. */
+ struct status_category *c, *next;
+ LIST_FOR_EACH_SAFE (c, next,
+ struct status_category, node, &ss->categories) {
+ list_init(&c->node);
+ }
+ switch_status_unregister(ss->config_cat);
+ switch_status_unregister(ss->switch_cat);
+ free(ss);
+ }
+}
+
+struct status_category *
+switch_status_register(struct switch_status *ss,
+ const char *category,
+ status_cb_func *cb, void *aux)
+{
+ struct status_category *c = xmalloc(sizeof *c);
+ c->cb = cb;
+ c->aux = aux;
+ c->name = xstrdup(category);
+ list_push_back(&ss->categories, &c->node);
+ return c;
+}
+
+void
+switch_status_unregister(struct status_category *c)
+{
+ if (c) {
+ if (!list_is_empty(&c->node)) {
+ list_remove(&c->node);
+ }
+ free(c->name);
+ free(c);
+ }
+}
+
+void
+status_reply_put(struct status_reply *sr, const char *content, ...)
+{
+ size_t old_length = sr->output.length;
+ size_t added;
+ va_list args;
+
+ /* Append the status reply to the output. */
+ ds_put_format(&sr->output, "%s.", sr->category->name);
+ va_start(args, content);
+ ds_put_format_valist(&sr->output, content, args);
+ va_end(args);
+ if (ds_last(&sr->output) != '\n') {
+ ds_put_char(&sr->output, '\n');
+ }
+
+ /* Drop what we just added if it doesn't match the request. */
+ added = sr->output.length - old_length;
+ if (added < sr->request.length
+ || memcmp(&sr->output.string[old_length],
+ sr->request.string, sr->request.length)) {
+ ds_truncate(&sr->output, old_length);
+ }
+}
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef STATUS_H
+#define STATUS_H 1
+
+#include "compiler.h"
+
+struct nicira_header;
+struct rconn;
+struct ofproto;
+struct status_reply;
+
+struct switch_status *switch_status_create(const struct ofproto *);
+void switch_status_destroy(struct switch_status *);
+
+int switch_status_handle_request(struct switch_status *, struct rconn *,
+ struct nicira_header *);
+
+typedef void status_cb_func(struct status_reply *, void *aux);
+struct status_category *switch_status_register(struct switch_status *,
+ const char *category,
+ status_cb_func *, void *aux);
+void switch_status_unregister(struct status_category *);
+
+void status_reply_put(struct status_reply *, const char *, ...)
+ PRINTF_FORMAT(2, 3);
+
+void rconn_status_cb(struct status_reply *, void *rconn_);
+
+#endif /* status.h */
+++ /dev/null
-/Makefile
-/Makefile.in
-/secchan
-/secchan.8
+++ /dev/null
-# Copyright (C) 2009 Nicira Networks, Inc.
-#
-# Copying and distribution of this file, with or without modification,
-# are permitted in any medium without royalty provided the copyright
-# notice and this notice are preserved. This file is offered as-is,
-# without warranty of any kind.
-
-bin_PROGRAMS += secchan/secchan
-man_MANS += secchan/secchan.8
-
-secchan_secchan_SOURCES = secchan/main.c
-secchan_secchan_LDADD = \
- secchan/libsecchan.a \
- lib/libopenvswitch.a \
- $(FAULT_LIBS) \
- $(SSL_LIBS)
-
-noinst_LIBRARIES += secchan/libsecchan.a
-secchan_libsecchan_a_SOURCES = \
- secchan/discovery.c \
- secchan/discovery.h \
- secchan/executer.c \
- secchan/executer.h \
- secchan/fail-open.c \
- secchan/fail-open.h \
- secchan/in-band.c \
- secchan/in-band.h \
- secchan/netflow.c \
- secchan/netflow.h \
- secchan/ofproto.c \
- secchan/ofproto.h \
- secchan/pktbuf.c \
- secchan/pktbuf.h \
- secchan/pinsched.c \
- secchan/pinsched.h \
- secchan/status.c \
- secchan/status.h
-
-EXTRA_DIST += secchan/secchan.8.in
-DISTCLEANFILES += secchan/secchan.8
-
-include secchan/commands/automake.mk
+++ /dev/null
-commandsdir = ${pkgdatadir}/commands
-dist_commands_SCRIPTS = \
- secchan/commands/reboot
+++ /dev/null
-#! /bin/sh
-ovs-kill --force --signal=USR1 ovs-switchui.pid
-reboot
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "discovery.h"
-#include <errno.h>
-#include <inttypes.h>
-#include <net/if.h>
-#include <regex.h>
-#include <stdlib.h>
-#include <string.h>
-#include "dhcp-client.h"
-#include "dhcp.h"
-#include "dpif.h"
-#include "netdev.h"
-#include "openflow/openflow.h"
-#include "packets.h"
-#include "status.h"
-#include "vconn-ssl.h"
-
-#define THIS_MODULE VLM_discovery
-#include "vlog.h"
-
-struct discovery {
- char *re;
- bool update_resolv_conf;
- regex_t *regex;
- struct dhclient *dhcp;
- int n_changes;
- struct status_category *ss_cat;
-};
-
-static void modify_dhcp_request(struct dhcp_msg *, void *aux);
-static bool validate_dhcp_offer(const struct dhcp_msg *, void *aux);
-
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(60, 60);
-
-static void
-discovery_status_cb(struct status_reply *sr, void *d_)
-{
- struct discovery *d = d_;
-
- status_reply_put(sr, "accept-remote=%s", d->re);
- status_reply_put(sr, "n-changes=%d", d->n_changes);
- if (d->dhcp) {
- status_reply_put(sr, "state=%s", dhclient_get_state(d->dhcp));
- status_reply_put(sr, "state-elapsed=%u",
- dhclient_get_state_elapsed(d->dhcp));
- if (dhclient_is_bound(d->dhcp)) {
- uint32_t ip = dhclient_get_ip(d->dhcp);
- uint32_t netmask = dhclient_get_netmask(d->dhcp);
- uint32_t router = dhclient_get_router(d->dhcp);
-
- const struct dhcp_msg *cfg = dhclient_get_config(d->dhcp);
- uint32_t dns_server;
- char *domain_name;
- int i;
-
- status_reply_put(sr, "ip="IP_FMT, IP_ARGS(&ip));
- status_reply_put(sr, "netmask="IP_FMT, IP_ARGS(&netmask));
- if (router) {
- status_reply_put(sr, "router="IP_FMT, IP_ARGS(&router));
- }
-
- for (i = 0; dhcp_msg_get_ip(cfg, DHCP_CODE_DNS_SERVER, i,
- &dns_server);
- i++) {
- status_reply_put(sr, "dns%d="IP_FMT, i, IP_ARGS(&dns_server));
- }
-
- domain_name = dhcp_msg_get_string(cfg, DHCP_CODE_DOMAIN_NAME);
- if (domain_name) {
- status_reply_put(sr, "domain=%s", domain_name);
- free(domain_name);
- }
-
- status_reply_put(sr, "lease-remaining=%u",
- dhclient_get_lease_remaining(d->dhcp));
- }
- }
-}
-
-int
-discovery_create(const char *re, bool update_resolv_conf,
- struct dpif *dpif, struct switch_status *ss,
- struct discovery **discoveryp)
-{
- struct discovery *d;
- char local_name[IF_NAMESIZE];
- int error;
-
- d = xcalloc(1, sizeof *d);
-
- /* Controller regular expression. */
- error = discovery_set_accept_controller_re(d, re);
- if (error) {
- goto error_free;
- }
- d->update_resolv_conf = update_resolv_conf;
-
- /* Initialize DHCP client. */
- error = dpif_get_name(dpif, local_name, sizeof local_name);
- if (error) {
- VLOG_ERR("failed to query datapath local port: %s", strerror(error));
- goto error_regfree;
- }
- error = dhclient_create(local_name, modify_dhcp_request,
- validate_dhcp_offer, d, &d->dhcp);
- if (error) {
- VLOG_ERR("failed to initialize DHCP client: %s", strerror(error));
- goto error_regfree;
- }
- dhclient_set_max_timeout(d->dhcp, 3);
- dhclient_init(d->dhcp, 0);
-
- d->ss_cat = switch_status_register(ss, "discovery",
- discovery_status_cb, d);
-
- *discoveryp = d;
- return 0;
-
-error_regfree:
- regfree(d->regex);
- free(d->regex);
-error_free:
- free(d);
- *discoveryp = 0;
- return error;
-}
-
-void
-discovery_destroy(struct discovery *d)
-{
- if (d) {
- free(d->re);
- regfree(d->regex);
- free(d->regex);
- dhclient_destroy(d->dhcp);
- switch_status_unregister(d->ss_cat);
- free(d);
- }
-}
-
-void
-discovery_set_update_resolv_conf(struct discovery *d,
- bool update_resolv_conf)
-{
- d->update_resolv_conf = update_resolv_conf;
-}
-
-int
-discovery_set_accept_controller_re(struct discovery *d, const char *re_)
-{
- regex_t *regex;
- int error;
- char *re;
-
- re = (!re_ ? xstrdup(vconn_ssl_is_configured() ? "^ssl:.*" : ".*")
- : re_[0] == '^' ? xstrdup(re_) : xasprintf("^%s", re_));
- regex = xmalloc(sizeof *regex);
- error = regcomp(regex, re, REG_NOSUB | REG_EXTENDED);
- if (error) {
- size_t length = regerror(error, regex, NULL, 0);
- char *buffer = xmalloc(length);
- regerror(error, regex, buffer, length);
- VLOG_WARN("%s: %s", re, buffer);
- free(regex);
- free(re);
- return EINVAL;
- } else {
- if (d->regex) {
- regfree(d->regex);
- free(d->regex);
- }
- free(d->re);
-
- d->regex = regex;
- d->re = re;
- return 0;
- }
-}
-
-void
-discovery_question_connectivity(struct discovery *d)
-{
- if (d->dhcp) {
- dhclient_force_renew(d->dhcp, 15);
- }
-}
-
-bool
-discovery_run(struct discovery *d, char **controller_name)
-{
- if (!d->dhcp) {
- *controller_name = NULL;
- return true;
- }
-
- dhclient_run(d->dhcp);
- if (!dhclient_changed(d->dhcp)) {
- return false;
- }
-
- dhclient_configure_netdev(d->dhcp);
- if (d->update_resolv_conf) {
- dhclient_update_resolv_conf(d->dhcp);
- }
-
- if (dhclient_is_bound(d->dhcp)) {
- *controller_name = dhcp_msg_get_string(dhclient_get_config(d->dhcp),
- DHCP_CODE_OFP_CONTROLLER_VCONN);
- VLOG_INFO("%s: discovered controller", *controller_name);
- d->n_changes++;
- } else {
- *controller_name = NULL;
- if (d->n_changes) {
- VLOG_INFO("discovered controller no longer available");
- d->n_changes++;
- }
- }
- return true;
-}
-
-void
-discovery_wait(struct discovery *d)
-{
- if (d->dhcp) {
- dhclient_wait(d->dhcp);
- }
-}
-
-static void
-modify_dhcp_request(struct dhcp_msg *msg, void *aux UNUSED)
-{
- dhcp_msg_put_string(msg, DHCP_CODE_VENDOR_CLASS, "OpenFlow");
-}
-
-static bool
-validate_dhcp_offer(const struct dhcp_msg *msg, void *d_)
-{
- const struct discovery *d = d_;
- char *vconn_name;
- bool accept;
-
- vconn_name = dhcp_msg_get_string(msg, DHCP_CODE_OFP_CONTROLLER_VCONN);
- if (!vconn_name) {
- VLOG_WARN_RL(&rl, "rejecting DHCP offer missing controller vconn");
- return false;
- }
- accept = !regexec(d->regex, vconn_name, 0, NULL, 0);
- if (!accept) {
- VLOG_WARN_RL(&rl, "rejecting controller vconn that fails to match %s",
- d->re);
- }
- free(vconn_name);
- return accept;
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef DISCOVERY_H
-#define DISCOVERY_H 1
-
-#include <stdbool.h>
-
-struct dpif;
-struct discovery;
-struct settings;
-struct switch_status;
-
-int discovery_create(const char *accept_controller_re, bool update_resolv_conf,
- struct dpif *, struct switch_status *,
- struct discovery **);
-void discovery_destroy(struct discovery *);
-void discovery_set_update_resolv_conf(struct discovery *,
- bool update_resolv_conf);
-int discovery_set_accept_controller_re(struct discovery *, const char *re);
-void discovery_question_connectivity(struct discovery *);
-bool discovery_run(struct discovery *, char **controller_name);
-void discovery_wait(struct discovery *);
-
-#endif /* discovery.h */
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "executer.h"
-#include <errno.h>
-#include <fcntl.h>
-#include <fnmatch.h>
-#include <poll.h>
-#include <signal.h>
-#include <stdlib.h>
-#include <sys/stat.h>
-#include <sys/wait.h>
-#include <string.h>
-#include <unistd.h>
-#include "dirs.h"
-#include "dynamic-string.h"
-#include "fatal-signal.h"
-#include "openflow/nicira-ext.h"
-#include "ofpbuf.h"
-#include "openflow/openflow.h"
-#include "poll-loop.h"
-#include "rconn.h"
-#include "socket-util.h"
-#include "util.h"
-#include "vconn.h"
-
-#define THIS_MODULE VLM_executer
-#include "vlog.h"
-
-#define MAX_CHILDREN 8
-
-struct child {
- /* Information about child process. */
- char *name; /* argv[0] passed to child. */
- pid_t pid; /* Child's process ID. */
-
- /* For sending a reply to the controller when the child dies. */
- struct rconn *rconn;
- uint32_t xid; /* Transaction ID used by controller. */
-
- /* We read up to MAX_OUTPUT bytes of output and send them back to the
- * controller when the child dies. */
-#define MAX_OUTPUT 4096
- int output_fd; /* FD from which to read child's output. */
- uint8_t *output; /* Output data. */
- size_t output_size; /* Number of bytes of output data so far. */
-};
-
-struct executer {
- /* Settings. */
- char *command_acl; /* Command white/blacklist, as shell globs. */
- char *command_dir; /* Directory that contains commands. */
-
- /* Children. */
- struct child children[MAX_CHILDREN];
- size_t n_children;
-};
-
-/* File descriptors for waking up when a child dies. */
-static int signal_fds[2] = {-1, -1};
-
-static void send_child_status(struct rconn *, uint32_t xid, uint32_t status,
- const void *data, size_t size);
-static void send_child_message(struct rconn *, uint32_t xid, uint32_t status,
- const char *message);
-
-/* Returns true if 'cmd' is allowed by 'acl', which is a command-separated
- * access control list in the format described for --command-acl in
- * secchan(8). */
-static bool
-executer_is_permitted(const char *acl_, const char *cmd)
-{
- char *acl, *save_ptr, *pattern;
- bool allowed, denied;
-
- /* Verify that 'cmd' consists only of alphanumerics plus _ or -. */
- if (cmd[strspn(cmd, "abcdefghijklmnopqrstuvwxyz"
- "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-")] != '\0') {
- VLOG_WARN("rejecting command name \"%s\" that contain forbidden "
- "characters", cmd);
- return false;
- }
-
- /* Check 'cmd' against 'acl'. */
- acl = xstrdup(acl_);
- save_ptr = acl;
- allowed = denied = false;
- while ((pattern = strsep(&save_ptr, ",")) != NULL && !denied) {
- if (pattern[0] != '!' && !fnmatch(pattern, cmd, 0)) {
- allowed = true;
- } else if (pattern[0] == '!' && !fnmatch(pattern + 1, cmd, 0)) {
- denied = true;
- }
- }
- free(acl);
-
- /* Check the command white/blacklisted state. */
- if (allowed && !denied) {
- VLOG_INFO("permitting command execution: \"%s\" is whitelisted", cmd);
- } else if (allowed && denied) {
- VLOG_WARN("denying command execution: \"%s\" is both blacklisted "
- "and whitelisted", cmd);
- } else if (!allowed) {
- VLOG_WARN("denying command execution: \"%s\" is not whitelisted", cmd);
- } else if (denied) {
- VLOG_WARN("denying command execution: \"%s\" is blacklisted", cmd);
- }
- return allowed && !denied;
-}
-
-int
-executer_handle_request(struct executer *e, struct rconn *rconn,
- struct nicira_header *request)
-{
- char **argv;
- char *args;
- char *exec_file = NULL;
- int max_fds;
- struct stat s;
- size_t args_size;
- size_t argc;
- size_t i;
- pid_t pid;
- int output_fds[2];
-
- /* Verify limit on children not exceeded.
- * XXX should probably kill children when the connection drops? */
- if (e->n_children >= MAX_CHILDREN) {
- send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
- "too many child processes");
- return 0;
- }
-
- /* Copy argument buffer, adding a null terminator at the end. Now every
- * argument is null-terminated, instead of being merely null-delimited. */
- args_size = ntohs(request->header.length) - sizeof *request;
- args = xmemdup0((const void *) (request + 1), args_size);
-
- /* Count arguments. */
- argc = 0;
- for (i = 0; i <= args_size; i++) {
- argc += args[i] == '\0';
- }
-
- /* Set argv[*] to point to each argument. */
- argv = xmalloc((argc + 1) * sizeof *argv);
- argv[0] = args;
- for (i = 1; i < argc; i++) {
- argv[i] = strchr(argv[i - 1], '\0') + 1;
- }
- argv[argc] = NULL;
-
- /* Check permissions. */
- if (!executer_is_permitted(e->command_acl, argv[0])) {
- send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
- "command not allowed");
- goto done;
- }
-
- /* Find the executable. */
- exec_file = xasprintf("%s/%s", e->command_dir, argv[0]);
- if (stat(exec_file, &s)) {
- VLOG_WARN("failed to stat \"%s\": %s", exec_file, strerror(errno));
- send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
- "command not allowed");
- goto done;
- }
- if (!S_ISREG(s.st_mode)) {
- VLOG_WARN("\"%s\" is not a regular file", exec_file);
- send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
- "command not allowed");
- goto done;
- }
- argv[0] = exec_file;
-
- /* Arrange to capture output. */
- if (pipe(output_fds)) {
- VLOG_WARN("pipe failed: %s", strerror(errno));
- send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
- "internal error (pipe)");
- goto done;
- }
-
- pid = fork();
- if (!pid) {
- /* Running in child.
- * XXX should run in new process group so that we can signal all
- * subprocesses at once? Would also want to catch fatal signals and
- * kill them at the same time though. */
- fatal_signal_fork();
- dup2(get_null_fd(), 0);
- dup2(output_fds[1], 1);
- dup2(get_null_fd(), 2);
- max_fds = get_max_fds();
- for (i = 3; i < max_fds; i++) {
- close(i);
- }
- if (chdir(e->command_dir)) {
- printf("could not change directory to \"%s\": %s",
- e->command_dir, strerror(errno));
- exit(EXIT_FAILURE);
- }
- execv(argv[0], argv);
- printf("failed to start \"%s\": %s\n", argv[0], strerror(errno));
- exit(EXIT_FAILURE);
- } else if (pid > 0) {
- /* Running in parent. */
- struct child *child;
-
- VLOG_INFO("started \"%s\" subprocess", argv[0]);
- send_child_status(rconn, request->header.xid, NXT_STATUS_STARTED,
- NULL, 0);
- child = &e->children[e->n_children++];
- child->name = xstrdup(argv[0]);
- child->pid = pid;
- child->rconn = rconn;
- child->xid = request->header.xid;
- child->output_fd = output_fds[0];
- child->output = xmalloc(MAX_OUTPUT);
- child->output_size = 0;
- set_nonblocking(output_fds[0]);
- close(output_fds[1]);
- } else {
- VLOG_WARN("fork failed: %s", strerror(errno));
- send_child_message(rconn, request->header.xid, NXT_STATUS_ERROR,
- "internal error (fork)");
- close(output_fds[0]);
- close(output_fds[1]);
- }
-
-done:
- free(exec_file);
- free(args);
- free(argv);
- return 0;
-}
-
-static void
-send_child_status(struct rconn *rconn, uint32_t xid, uint32_t status,
- const void *data, size_t size)
-{
- if (rconn) {
- struct nx_command_reply *r;
- struct ofpbuf *buffer;
-
- r = make_openflow_xid(sizeof *r, OFPT_VENDOR, xid, &buffer);
- r->nxh.vendor = htonl(NX_VENDOR_ID);
- r->nxh.subtype = htonl(NXT_COMMAND_REPLY);
- r->status = htonl(status);
- ofpbuf_put(buffer, data, size);
- update_openflow_length(buffer);
- if (rconn_send(rconn, buffer, NULL)) {
- ofpbuf_delete(buffer);
- }
- }
-}
-
-static void
-send_child_message(struct rconn *rconn, uint32_t xid, uint32_t status,
- const char *message)
-{
- send_child_status(rconn, xid, status, message, strlen(message));
-}
-
-/* 'child' died with 'status' as its return code. Deal with it. */
-static void
-child_terminated(struct child *child, int status)
-{
- struct ds ds;
- uint32_t ofp_status;
-
- /* Log how it terminated. */
- ds_init(&ds);
- if (WIFEXITED(status)) {
- ds_put_format(&ds, "normally with status %d", WEXITSTATUS(status));
- } else if (WIFSIGNALED(status)) {
- const char *name = NULL;
-#ifdef HAVE_STRSIGNAL
- name = strsignal(WTERMSIG(status));
-#endif
- ds_put_format(&ds, "by signal %d", WTERMSIG(status));
- if (name) {
- ds_put_format(&ds, " (%s)", name);
- }
- }
- if (WCOREDUMP(status)) {
- ds_put_cstr(&ds, " (core dumped)");
- }
- VLOG_INFO("child process \"%s\" with pid %ld terminated %s",
- child->name, (long int) child->pid, ds_cstr(&ds));
- ds_destroy(&ds);
-
- /* Send a status message back to the controller that requested the
- * command. */
- if (WIFEXITED(status)) {
- ofp_status = WEXITSTATUS(status) | NXT_STATUS_EXITED;
- } else if (WIFSIGNALED(status)) {
- ofp_status = WTERMSIG(status) | NXT_STATUS_SIGNALED;
- } else {
- ofp_status = NXT_STATUS_UNKNOWN;
- }
- if (WCOREDUMP(status)) {
- ofp_status |= NXT_STATUS_COREDUMP;
- }
- send_child_status(child->rconn, child->xid, ofp_status,
- child->output, child->output_size);
-}
-
-/* Read output from 'child' and append it to its output buffer. */
-static void
-poll_child(struct child *child)
-{
- ssize_t n;
-
- if (child->output_fd < 0) {
- return;
- }
-
- do {
- n = read(child->output_fd, child->output + child->output_size,
- MAX_OUTPUT - child->output_size);
- } while (n < 0 && errno == EINTR);
- if (n > 0) {
- child->output_size += n;
- if (child->output_size < MAX_OUTPUT) {
- return;
- }
- } else if (n < 0 && errno == EAGAIN) {
- return;
- }
- close(child->output_fd);
- child->output_fd = -1;
-}
-
-void
-executer_run(struct executer *e)
-{
- char buffer[MAX_CHILDREN];
- size_t i;
-
- if (!e->n_children) {
- return;
- }
-
- /* Read output from children. */
- for (i = 0; i < e->n_children; i++) {
- struct child *child = &e->children[i];
- poll_child(child);
- }
-
- /* If SIGCHLD was received, reap dead children. */
- if (read(signal_fds[0], buffer, sizeof buffer) <= 0) {
- return;
- }
- for (;;) {
- int status;
- pid_t pid;
-
- /* Get dead child in 'pid' and its return code in 'status'. */
- pid = waitpid(WAIT_ANY, &status, WNOHANG);
- if (pid < 0 && errno == EINTR) {
- continue;
- } else if (pid <= 0) {
- return;
- }
-
- /* Find child with given 'pid' and drop it from the list. */
- for (i = 0; i < e->n_children; i++) {
- struct child *child = &e->children[i];
- if (child->pid == pid) {
- poll_child(child);
- child_terminated(child, status);
- free(child->name);
- free(child->output);
- *child = e->children[--e->n_children];
- goto found;
- }
- }
- VLOG_WARN("child with unknown pid %ld terminated", (long int) pid);
- found:;
- }
-
-}
-
-void
-executer_wait(struct executer *e)
-{
- if (e->n_children) {
- size_t i;
-
- /* Wake up on SIGCHLD. */
- poll_fd_wait(signal_fds[0], POLLIN);
-
- /* Wake up when we get output from a child. */
- for (i = 0; i < e->n_children; i++) {
- struct child *child = &e->children[i];
- if (child->output_fd >= 0) {
- poll_fd_wait(child->output_fd, POLLIN);
- }
- }
- }
-}
-
-void
-executer_rconn_closing(struct executer *e, struct rconn *rconn)
-{
- size_t i;
-
- /* If any of our children was connected to 'r', then disconnect it so we
- * don't try to reference a dead connection when the process terminates
- * later.
- * XXX kill the children started by 'r'? */
- for (i = 0; i < e->n_children; i++) {
- if (e->children[i].rconn == rconn) {
- e->children[i].rconn = NULL;
- }
- }
-}
-
-static void
-sigchld_handler(int signr UNUSED)
-{
- write(signal_fds[1], "", 1);
-}
-
-int
-executer_create(const char *command_acl, const char *command_dir,
- struct executer **executerp)
-{
- struct executer *e;
- struct sigaction sa;
-
- *executerp = NULL;
- if (signal_fds[0] == -1) {
- /* Make sure we can get a fd for /dev/null. */
- int null_fd = get_null_fd();
- if (null_fd < 0) {
- return -null_fd;
- }
-
- /* Create pipe for notifying us that SIGCHLD was invoked. */
- if (pipe(signal_fds)) {
- VLOG_ERR("pipe failed: %s", strerror(errno));
- return errno;
- }
- set_nonblocking(signal_fds[0]);
- set_nonblocking(signal_fds[1]);
- }
-
- /* Set up signal handler. */
- memset(&sa, 0, sizeof sa);
- sa.sa_handler = sigchld_handler;
- sigemptyset(&sa.sa_mask);
- sa.sa_flags = SA_NOCLDSTOP | SA_RESTART;
- if (sigaction(SIGCHLD, &sa, NULL)) {
- VLOG_ERR("sigaction(SIGCHLD) failed: %s", strerror(errno));
- return errno;
- }
-
- e = xcalloc(1, sizeof *e);
- e->command_acl = xstrdup(command_acl);
- e->command_dir = (command_dir
- ? xstrdup(command_dir)
- : xasprintf("%s/commands", ovs_pkgdatadir));
- e->n_children = 0;
- *executerp = e;
- return 0;
-}
-
-void
-executer_destroy(struct executer *e)
-{
- if (e) {
- size_t i;
-
- free(e->command_acl);
- free(e->command_dir);
- for (i = 0; i < e->n_children; i++) {
- struct child *child = &e->children[i];
-
- free(child->name);
- kill(child->pid, SIGHUP);
- /* We don't own child->rconn. */
- free(child->output);
- free(child);
- }
- free(e);
- }
-}
-
-void
-executer_set_acl(struct executer *e, const char *acl, const char *dir)
-{
- free(e->command_acl);
- e->command_acl = xstrdup(acl);
- free(e->command_dir);
- e->command_dir = xstrdup(dir);
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef EXECUTER_H
-#define EXECUTER_H 1
-
-struct executer;
-struct nicira_header;
-struct rconn;
-
-int executer_create(const char *acl, const char *dir, struct executer **);
-void executer_set_acl(struct executer *, const char *acl, const char *dir);
-void executer_destroy(struct executer *);
-void executer_run(struct executer *);
-void executer_wait(struct executer *);
-void executer_rconn_closing(struct executer *, struct rconn *);
-int executer_handle_request(struct executer *, struct rconn *,
- struct nicira_header *);
-
-#endif /* executer.h */
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "fail-open.h"
-#include <inttypes.h>
-#include <stdlib.h>
-#include "flow.h"
-#include "mac-learning.h"
-#include "odp-util.h"
-#include "ofpbuf.h"
-#include "ofproto.h"
-#include "pktbuf.h"
-#include "poll-loop.h"
-#include "rconn.h"
-#include "status.h"
-#include "timeval.h"
-#include "vconn.h"
-
-#define THIS_MODULE VLM_fail_open
-#include "vlog.h"
-
-/*
- * Fail-open mode.
- *
- * In fail-open mode, the switch detects when the controller cannot be
- * contacted or when the controller is dropping switch connections because the
- * switch does not pass its admission control policy. In those situations the
- * switch sets up flows itself using the "normal" action.
- *
- * There is a little subtlety to implementation, to properly handle the case
- * where the controller allows switch connections but drops them a few seconds
- * later for admission control reasons. Because of this case, we don't want to
- * just stop setting up flows when we connect to the controller: if we did,
- * then new flow setup and existing flows would stop during the duration of
- * connection to the controller, and thus the whole network would go down for
- * that period of time.
- *
- * So, instead, we add some special caseswhen we are connected to a controller,
- * but not yet sure that it has admitted us:
- *
- * - We set up flows immediately ourselves, but simultaneously send out an
- * OFPT_PACKET_IN to the controller. We put a special bogus buffer-id in
- * these OFPT_PACKET_IN messages so that duplicate packets don't get sent
- * out to the network when the controller replies.
- *
- * - We also send out OFPT_PACKET_IN messages for totally bogus packets
- * every so often, in case no real new flows are arriving in the network.
- *
- * - We don't flush the flow table at the time we connect, because this
- * could cause network stuttering in a switch with lots of flows or very
- * high-bandwidth flows by suddenly throwing lots of packets down to
- * userspace.
- */
-
-struct fail_open {
- struct ofproto *ofproto;
- struct rconn *controller;
- int trigger_duration;
- int last_disconn_secs;
- struct status_category *ss_cat;
- long long int next_bogus_packet_in;
- struct rconn_packet_counter *bogus_packet_counter;
-};
-
-/* Returns true if 'fo' should be in fail-open mode, otherwise false. */
-static inline bool
-should_fail_open(const struct fail_open *fo)
-{
- return rconn_failure_duration(fo->controller) >= fo->trigger_duration;
-}
-
-/* Returns true if 'fo' is currently in fail-open mode, otherwise false. */
-bool
-fail_open_is_active(const struct fail_open *fo)
-{
- return fo->last_disconn_secs != 0;
-}
-
-static void
-send_bogus_packet_in(struct fail_open *fo)
-{
- uint8_t mac[ETH_ADDR_LEN];
- struct ofpbuf *opi;
- struct ofpbuf b;
-
- /* Compose ofp_packet_in. */
- ofpbuf_init(&b, 128);
- eth_addr_random(mac);
- compose_benign_packet(&b, "Open vSwitch Controller Probe", 0xa033, mac);
- opi = make_packet_in(pktbuf_get_null(), OFPP_LOCAL, OFPR_NO_MATCH, &b, 64);
- ofpbuf_uninit(&b);
-
- /* Send. */
- rconn_send_with_limit(fo->controller, opi, fo->bogus_packet_counter, 1);
-}
-
-/* Enter fail-open mode if we should be in it. Handle reconnecting to a
- * controller from fail-open mode. */
-void
-fail_open_run(struct fail_open *fo)
-{
- /* Enter fail-open mode if 'fo' is not in it but should be. */
- if (should_fail_open(fo)) {
- int disconn_secs = rconn_failure_duration(fo->controller);
- if (!fail_open_is_active(fo)) {
- VLOG_WARN("Could not connect to controller (or switch failed "
- "controller's post-connection admission control "
- "policy) for %d seconds, failing open", disconn_secs);
- fo->last_disconn_secs = disconn_secs;
-
- /* Flush all OpenFlow and datapath flows. We will set up our
- * fail-open rule from fail_open_flushed() when
- * ofproto_flush_flows() calls back to us. */
- ofproto_flush_flows(fo->ofproto);
- } else if (disconn_secs > fo->last_disconn_secs + 60) {
- VLOG_INFO("Still in fail-open mode after %d seconds disconnected "
- "from controller", disconn_secs);
- fo->last_disconn_secs = disconn_secs;
- }
- }
-
- /* Schedule a bogus packet-in if we're connected and in fail-open. */
- if (fail_open_is_active(fo)) {
- if (rconn_is_connected(fo->controller)) {
- bool expired = time_msec() >= fo->next_bogus_packet_in;
- if (expired) {
- send_bogus_packet_in(fo);
- }
- if (expired || fo->next_bogus_packet_in == LLONG_MAX) {
- fo->next_bogus_packet_in = time_msec() + 2000;
- }
- } else {
- fo->next_bogus_packet_in = LLONG_MAX;
- }
- }
-
-}
-
-/* If 'fo' is currently in fail-open mode and its rconn has connected to the
- * controller, exits fail open mode. */
-void
-fail_open_maybe_recover(struct fail_open *fo)
-{
- if (fail_open_is_active(fo) && rconn_is_admitted(fo->controller)) {
- flow_t flow;
-
- VLOG_WARN("No longer in fail-open mode");
- fo->last_disconn_secs = 0;
- fo->next_bogus_packet_in = LLONG_MAX;
-
- memset(&flow, 0, sizeof flow);
- ofproto_delete_flow(fo->ofproto, &flow, OFPFW_ALL, FAIL_OPEN_PRIORITY);
- }
-}
-
-void
-fail_open_wait(struct fail_open *fo)
-{
- if (fo->next_bogus_packet_in != LLONG_MAX) {
- poll_timer_wait(fo->next_bogus_packet_in - time_msec());
- }
-}
-
-void
-fail_open_flushed(struct fail_open *fo)
-{
- int disconn_secs = rconn_failure_duration(fo->controller);
- bool open = disconn_secs >= fo->trigger_duration;
- if (open) {
- union ofp_action action;
- flow_t flow;
-
- /* Set up a flow that matches every packet and directs them to
- * OFPP_NORMAL. */
- memset(&action, 0, sizeof action);
- action.type = htons(OFPAT_OUTPUT);
- action.output.len = htons(sizeof action);
- action.output.port = htons(OFPP_NORMAL);
- memset(&flow, 0, sizeof flow);
- ofproto_add_flow(fo->ofproto, &flow, OFPFW_ALL, FAIL_OPEN_PRIORITY,
- &action, 1, 0);
- }
-}
-
-static void
-fail_open_status_cb(struct status_reply *sr, void *fo_)
-{
- struct fail_open *fo = fo_;
- int cur_duration = rconn_failure_duration(fo->controller);
-
- status_reply_put(sr, "trigger-duration=%d", fo->trigger_duration);
- status_reply_put(sr, "current-duration=%d", cur_duration);
- status_reply_put(sr, "triggered=%s",
- cur_duration >= fo->trigger_duration ? "true" : "false");
-}
-
-struct fail_open *
-fail_open_create(struct ofproto *ofproto,
- int trigger_duration, struct switch_status *switch_status,
- struct rconn *controller)
-{
- struct fail_open *fo = xmalloc(sizeof *fo);
- fo->ofproto = ofproto;
- fo->controller = controller;
- fo->trigger_duration = trigger_duration;
- fo->last_disconn_secs = 0;
- fo->ss_cat = switch_status_register(switch_status, "fail-open",
- fail_open_status_cb, fo);
- fo->next_bogus_packet_in = LLONG_MAX;
- fo->bogus_packet_counter = rconn_packet_counter_create();
- return fo;
-}
-
-void
-fail_open_set_trigger_duration(struct fail_open *fo, int trigger_duration)
-{
- fo->trigger_duration = trigger_duration;
-}
-
-void
-fail_open_destroy(struct fail_open *fo)
-{
- if (fo) {
- /* We don't own fo->controller. */
- switch_status_unregister(fo->ss_cat);
- rconn_packet_counter_destroy(fo->bogus_packet_counter);
- free(fo);
- }
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef FAIL_OPEN_H
-#define FAIL_OPEN_H 1
-
-#include <stdbool.h>
-#include <stdint.h>
-#include "flow.h"
-
-struct fail_open;
-struct ofproto;
-struct rconn;
-struct switch_status;
-
-/* Priority of the rule added by the fail-open subsystem when a switch enters
- * fail-open mode. This priority value uniquely identifies a fail-open flow
- * (OpenFlow priorities max out at 65535 and nothing else in Open vSwitch
- * creates flows with this priority). */
-#define FAIL_OPEN_PRIORITY 70000
-
-struct fail_open *fail_open_create(struct ofproto *, int trigger_duration,
- struct switch_status *,
- struct rconn *controller);
-void fail_open_set_trigger_duration(struct fail_open *, int trigger_duration);
-void fail_open_destroy(struct fail_open *);
-void fail_open_wait(struct fail_open *);
-bool fail_open_is_active(const struct fail_open *);
-void fail_open_run(struct fail_open *);
-void fail_open_maybe_recover(struct fail_open *);
-void fail_open_flushed(struct fail_open *);
-
-#endif /* fail-open.h */
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "in-band.h"
-#include <arpa/inet.h>
-#include <errno.h>
-#include <inttypes.h>
-#include <net/if.h>
-#include <string.h>
-#include <stdlib.h>
-#include "dhcp.h"
-#include "dpif.h"
-#include "flow.h"
-#include "mac-learning.h"
-#include "netdev.h"
-#include "odp-util.h"
-#include "ofp-print.h"
-#include "ofproto.h"
-#include "ofpbuf.h"
-#include "openflow/openflow.h"
-#include "openvswitch/datapath-protocol.h"
-#include "packets.h"
-#include "poll-loop.h"
-#include "rconn.h"
-#include "status.h"
-#include "timeval.h"
-#include "vconn.h"
-
-#define THIS_MODULE VLM_in_band
-#include "vlog.h"
-
-/* In-band control allows a single network to be used for OpenFlow
- * traffic and other data traffic. Refer to ovs-vswitchd.conf(5) and
- * secchan(8) for a description of configuring in-band control.
- *
- * This comment is an attempt to describe how in-band control works at a
- * wire- and implementation-level. Correctly implementing in-band
- * control has proven difficult due to its many subtleties, and has thus
- * gone through many iterations. Please read through and understand the
- * reasoning behind the chosen rules before making modifications.
- *
- * In Open vSwitch, in-band control is implemented as "hidden" flows (in
- * that they are not visible through OpenFlow) and at a higher priority
- * than wildcarded flows can be set up by the controller. This is done
- * so that the controller cannot interfere with them and possibly break
- * connectivity with its switches. It is possible to see all flows,
- * including in-band ones, with the ovs-appctl "bridge/dump-flows"
- * command.
- *
- * The following rules are always enabled with the "normal" action by a
- * switch with in-band control:
- *
- * a. DHCP requests sent from the local port.
- * b. ARP replies to the local port's MAC address.
- * c. ARP requests from the local port's MAC address.
- * d. ARP replies to the remote side's MAC address. Note that the
- * remote side is either the controller or the gateway to reach
- * the controller.
- * e. ARP requests from the remote side's MAC address. Note that
- * like (d), the MAC is either for the controller or gateway.
- * f. ARP replies containing the controller's IP address as a target.
- * g. ARP requests containing the controller's IP address as a source.
- * h. OpenFlow (6633/tcp) traffic to the controller's IP.
- * i. OpenFlow (6633/tcp) traffic from the controller's IP.
- *
- * The goal of these rules is to be as narrow as possible to allow a
- * switch to join a network and be able to communicate with a
- * controller. As mentioned earlier, these rules have higher priority
- * than the controller's rules, so if they are too broad, they may
- * prevent the controller from implementing its policy. As such,
- * in-band actively monitors some aspects of flow and packet processing
- * so that the rules can be made more precise.
- *
- * In-band control monitors attempts to add flows into the datapath that
- * could interfere with its duties. The datapath only allows exact
- * match entries, so in-band control is able to be very precise about
- * the flows it prevents. Flows that miss in the datapath are sent to
- * userspace to be processed, so preventing these flows from being
- * cached in the "fast path" does not affect correctness. The only type
- * of flow that is currently prevented is one that would prevent DHCP
- * replies from being seen by the local port. For example, a rule that
- * forwarded all DHCP traffic to the controller would not be allowed,
- * but one that forwarded to all ports (including the local port) would.
- *
- * As mentioned earlier, packets that miss in the datapath are sent to
- * the userspace for processing. The userspace has its own flow table,
- * the "classifier", so in-band checks whether any special processing
- * is needed before the classifier is consulted. If a packet is a DHCP
- * response to a request from the local port, the packet is forwarded to
- * the local port, regardless of the flow table. Note that this requires
- * L7 processing of DHCP replies to determine whether the 'chaddr' field
- * matches the MAC address of the local port.
- *
- * It is interesting to note that for an L3-based in-band control
- * mechanism, the majority of rules are devoted to ARP traffic. At first
- * glance, some of these rules appear redundant. However, each serves an
- * important role. First, in order to determine the MAC address of the
- * remote side (controller or gateway) for other ARP rules, we must allow
- * ARP traffic for our local port with rules (b) and (c). If we are
- * between a switch and its connection to the controller, we have to
- * allow the other switch's ARP traffic to through. This is done with
- * rules (d) and (e), since we do not know the addresses of the other
- * switches a priori, but do know the controller's or gateway's. Finally,
- * if the controller is running in a local guest VM that is not reached
- * through the local port, the switch that is connected to the VM must
- * allow ARP traffic based on the controller's IP address, since it will
- * not know the MAC address of the local port that is sending the traffic
- * or the MAC address of the controller in the guest VM.
- *
- * With a few notable exceptions below, in-band should work in most
- * network setups. The following are considered "supported' in the
- * current implementation:
- *
- * - Locally Connected. The switch and controller are on the same
- * subnet. This uses rules (a), (b), (c), (h), and (i).
- *
- * - Reached through Gateway. The switch and controller are on
- * different subnets and must go through a gateway. This uses
- * rules (a), (b), (c), (h), and (i).
- *
- * - Between Switch and Controller. This switch is between another
- * switch and the controller, and we want to allow the other
- * switch's traffic through. This uses rules (d), (e), (h), and
- * (i). It uses (b) and (c) indirectly in order to know the MAC
- * address for rules (d) and (e). Note that DHCP for the other
- * switch will not work unless the controller explicitly lets this
- * switch pass the traffic.
- *
- * - Between Switch and Gateway. This switch is between another
- * switch and the gateway, and we want to allow the other switch's
- * traffic through. This uses the same rules and logic as the
- * "Between Switch and Controller" configuration described earlier.
- *
- * - Controller on Local VM. The controller is a guest VM on the
- * system running in-band control. This uses rules (a), (b), (c),
- * (h), and (i).
- *
- * - Controller on Local VM with Different Networks. The controller
- * is a guest VM on the system running in-band control, but the
- * local port is not used to connect to the controller. For
- * example, an IP address is configured on eth0 of the switch. The
- * controller's VM is connected through eth1 of the switch, but an
- * IP address has not been configured for that port on the switch.
- * As such, the switch will use eth0 to connect to the controller,
- * and eth1's rules about the local port will not work. In the
- * example, the switch attached to eth0 would use rules (a), (b),
- * (c), (h), and (i) on eth0. The switch attached to eth1 would use
- * rules (f), (g), (h), and (i).
- *
- * The following are explicitly *not* supported by in-band control:
- *
- * - Specify Controller by Name. Currently, the controller must be
- * identified by IP address. A naive approach would be to permit
- * all DNS traffic. Unfortunately, this would prevent the
- * controller from defining any policy over DNS. Since switches
- * that are located behind us need to connect to the controller,
- * in-band cannot simply add a rule that allows DNS traffic from
- * the local port. The "correct" way to support this is to parse
- * DNS requests to allow all traffic related to a request for the
- * controller's name through. Due to the potential security
- * problems and amount of processing, we decided to hold off for
- * the time-being.
- *
- * - Multiple Controllers. There is nothing intrinsic in the high-
- * level design that prevents using multiple (known) controllers,
- * however, the current implementation's data structures assume
- * only one.
- *
- * - Differing Controllers for Switches. All switches must know
- * the L3 addresses for all the controllers that other switches
- * may use, since rules need to be set up to allow traffic related
- * to those controllers through. See rules (f), (g), (h), and (i).
- *
- * - Differing Routes for Switches. In order for the switch to
- * allow other switches to connect to a controller through a
- * gateway, it allows the gateway's traffic through with rules (d)
- * and (e). If the routes to the controller differ for the two
- * switches, we will not know the MAC address of the alternate
- * gateway.
- */
-
-#define IB_BASE_PRIORITY 18181800
-
-enum {
- IBR_FROM_LOCAL_DHCP, /* (a) From local port, DHCP. */
- IBR_TO_LOCAL_ARP, /* (b) To local port, ARP. */
- IBR_FROM_LOCAL_ARP, /* (c) From local port, ARP. */
- IBR_TO_REMOTE_ARP, /* (d) To remote MAC, ARP. */
- IBR_FROM_REMOTE_ARP, /* (e) From remote MAC, ARP. */
- IBR_TO_CTL_ARP, /* (f) To controller IP, ARP. */
- IBR_FROM_CTL_ARP, /* (g) From controller IP, ARP. */
- IBR_TO_CTL_OFP, /* (h) To controller, OpenFlow port. */
- IBR_FROM_CTL_OFP, /* (i) From controller, OpenFlow port. */
-#if OFP_TCP_PORT != OFP_SSL_PORT
-#error Need to support separate TCP and SSL flows.
-#endif
- N_IB_RULES
-};
-
-struct ib_rule {
- bool installed;
- flow_t flow;
- uint32_t wildcards;
- unsigned int priority;
-};
-
-struct in_band {
- struct ofproto *ofproto;
- struct rconn *controller;
- struct status_category *ss_cat;
-
- /* Keep track of local port's information. */
- uint8_t local_mac[ETH_ADDR_LEN]; /* Current MAC. */
- char local_name[IF_NAMESIZE]; /* Local device name. */
- time_t next_local_refresh;
-
- /* Keep track of controller and next hop's information. */
- uint32_t controller_ip; /* Controller IP, 0 if unknown. */
- uint8_t remote_mac[ETH_ADDR_LEN]; /* Remote MAC. */
- uint8_t last_remote_mac[ETH_ADDR_LEN]; /* Previous remote MAC. */
- time_t next_remote_refresh;
-
- /* Rules that we set up. */
- struct ib_rule rules[N_IB_RULES];
-};
-
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(60, 60);
-
-static const uint8_t *
-get_remote_mac(struct in_band *ib)
-{
- int retval;
- bool have_mac;
- struct in_addr c_in4, r_in4;
- char *dev_name;
- time_t now = time_now();
-
- if (now >= ib->next_remote_refresh) {
- c_in4.s_addr = ib->controller_ip;
- memset(ib->remote_mac, 0, sizeof ib->remote_mac);
- retval = netdev_get_next_hop(&c_in4, &r_in4, &dev_name);
- if (retval) {
- VLOG_WARN("cannot find route for controller ("IP_FMT"): %s",
- IP_ARGS(&ib->controller_ip), strerror(retval));
- ib->next_remote_refresh = now + 1;
- return NULL;
- }
- if (!r_in4.s_addr) {
- r_in4.s_addr = c_in4.s_addr;
- }
-
- retval = netdev_nodev_arp_lookup(dev_name, r_in4.s_addr,
- ib->remote_mac);
- if (retval) {
- VLOG_DBG_RL(&rl, "cannot look up remote MAC address ("IP_FMT"): %s",
- IP_ARGS(&r_in4.s_addr), strerror(retval));
- }
- have_mac = !eth_addr_is_zero(ib->remote_mac);
- free(dev_name);
-
- if (have_mac
- && !eth_addr_equals(ib->last_remote_mac, ib->remote_mac)) {
- VLOG_DBG("remote MAC address changed from "ETH_ADDR_FMT" to "
- ETH_ADDR_FMT,
- ETH_ADDR_ARGS(ib->last_remote_mac),
- ETH_ADDR_ARGS(ib->remote_mac));
- memcpy(ib->last_remote_mac, ib->remote_mac, ETH_ADDR_LEN);
- }
-
- /* Schedule next refresh.
- *
- * If we have an IP address but not a MAC address, then refresh
- * quickly, since we probably will get a MAC address soon (via ARP).
- * Otherwise, we can afford to wait a little while. */
- ib->next_remote_refresh
- = now + (!ib->controller_ip || have_mac ? 10 : 1);
- }
-
- return !eth_addr_is_zero(ib->remote_mac) ? ib->remote_mac : NULL;
-}
-
-static const uint8_t *
-get_local_mac(struct in_band *ib)
-{
- time_t now = time_now();
- if (now >= ib->next_local_refresh) {
- uint8_t ea[ETH_ADDR_LEN];
- if (!netdev_nodev_get_etheraddr(ib->local_name, ea)) {
- memcpy(ib->local_mac, ea, ETH_ADDR_LEN);
- }
- ib->next_local_refresh = now + 1;
- }
- return !eth_addr_is_zero(ib->local_mac) ? ib->local_mac : NULL;
-}
-
-static void
-in_band_status_cb(struct status_reply *sr, void *in_band_)
-{
- struct in_band *in_band = in_band_;
-
- if (!eth_addr_is_zero(in_band->local_mac)) {
- status_reply_put(sr, "local-mac="ETH_ADDR_FMT,
- ETH_ADDR_ARGS(in_band->local_mac));
- }
-
- if (!eth_addr_is_zero(in_band->remote_mac)) {
- status_reply_put(sr, "remote-mac="ETH_ADDR_FMT,
- ETH_ADDR_ARGS(in_band->remote_mac));
- }
-}
-
-static void
-drop_flow(struct in_band *in_band, int rule_idx)
-{
- struct ib_rule *rule = &in_band->rules[rule_idx];
-
- if (rule->installed) {
- rule->installed = false;
- ofproto_delete_flow(in_band->ofproto, &rule->flow, rule->wildcards,
- rule->priority);
- }
-}
-
-/* out_port and fixed_fields are assumed never to change. */
-static void
-set_up_flow(struct in_band *in_band, int rule_idx, const flow_t *flow,
- uint32_t fixed_fields, uint16_t out_port)
-{
- struct ib_rule *rule = &in_band->rules[rule_idx];
-
- if (!rule->installed || memcmp(flow, &rule->flow, sizeof *flow)) {
- union ofp_action action;
-
- drop_flow(in_band, rule_idx);
-
- rule->installed = true;
- rule->flow = *flow;
- rule->wildcards = OFPFW_ALL & ~fixed_fields;
- rule->priority = IB_BASE_PRIORITY + (N_IB_RULES - rule_idx);
-
- action.type = htons(OFPAT_OUTPUT);
- action.output.len = htons(sizeof action);
- action.output.port = htons(out_port);
- action.output.max_len = htons(0);
- ofproto_add_flow(in_band->ofproto, &rule->flow, rule->wildcards,
- rule->priority, &action, 1, 0);
- }
-}
-
-/* Returns true if 'packet' should be sent to the local port regardless
- * of the flow table. */
-bool
-in_band_msg_in_hook(struct in_band *in_band, const flow_t *flow,
- const struct ofpbuf *packet)
-{
- if (!in_band) {
- return false;
- }
-
- /* Regardless of how the flow table is configured, we want to be
- * able to see replies to our DHCP requests. */
- if (flow->dl_type == htons(ETH_TYPE_IP)
- && flow->nw_proto == IP_TYPE_UDP
- && flow->tp_src == htons(DHCP_SERVER_PORT)
- && flow->tp_dst == htons(DHCP_CLIENT_PORT)
- && packet->l7) {
- struct dhcp_header *dhcp;
- const uint8_t *local_mac;
-
- dhcp = ofpbuf_at(packet, (char *)packet->l7 - (char *)packet->data,
- sizeof *dhcp);
- if (!dhcp) {
- return false;
- }
-
- local_mac = get_local_mac(in_band);
- if (eth_addr_equals(dhcp->chaddr, local_mac)) {
- return true;
- }
- }
-
- return false;
-}
-
-/* Returns true if the rule that would match 'flow' with 'actions' is
- * allowed to be set up in the datapath. */
-bool
-in_band_rule_check(struct in_band *in_band, const flow_t *flow,
- const struct odp_actions *actions)
-{
- if (!in_band) {
- return true;
- }
-
- /* Don't allow flows that would prevent DHCP replies from being seen
- * by the local port. */
- if (flow->dl_type == htons(ETH_TYPE_IP)
- && flow->nw_proto == IP_TYPE_UDP
- && flow->tp_src == htons(DHCP_SERVER_PORT)
- && flow->tp_dst == htons(DHCP_CLIENT_PORT)) {
- int i;
-
- for (i=0; i<actions->n_actions; i++) {
- if (actions->actions[i].output.type == ODPAT_OUTPUT
- && actions->actions[i].output.port == ODPP_LOCAL) {
- return true;
- }
- }
- return false;
- }
-
- return true;
-}
-
-void
-in_band_run(struct in_band *in_band)
-{
- time_t now = time_now();
- uint32_t controller_ip;
- const uint8_t *remote_mac;
- const uint8_t *local_mac;
- flow_t flow;
-
- if (now < in_band->next_remote_refresh
- && now < in_band->next_local_refresh) {
- return;
- }
-
- controller_ip = rconn_get_remote_ip(in_band->controller);
- if (in_band->controller_ip && controller_ip != in_band->controller_ip) {
- VLOG_DBG("controller IP address changed from "IP_FMT" to "IP_FMT,
- IP_ARGS(&in_band->controller_ip),
- IP_ARGS(&controller_ip));
- }
- in_band->controller_ip = controller_ip;
-
- remote_mac = get_remote_mac(in_band);
- local_mac = get_local_mac(in_band);
-
- if (local_mac) {
- /* Allow DHCP requests to be sent from the local port. */
- memset(&flow, 0, sizeof flow);
- flow.in_port = ODPP_LOCAL;
- flow.dl_type = htons(ETH_TYPE_IP);
- memcpy(flow.dl_src, local_mac, ETH_ADDR_LEN);
- flow.nw_proto = IP_TYPE_UDP;
- flow.tp_src = htons(DHCP_CLIENT_PORT);
- flow.tp_dst = htons(DHCP_SERVER_PORT);
- set_up_flow(in_band, IBR_FROM_LOCAL_DHCP, &flow,
- (OFPFW_IN_PORT | OFPFW_DL_TYPE | OFPFW_DL_SRC
- | OFPFW_NW_PROTO | OFPFW_TP_SRC | OFPFW_TP_DST),
- OFPP_NORMAL);
-
- /* Allow the connection's interface to receive directed ARP traffic. */
- memset(&flow, 0, sizeof flow);
- flow.dl_type = htons(ETH_TYPE_ARP);
- memcpy(flow.dl_dst, local_mac, ETH_ADDR_LEN);
- flow.nw_proto = ARP_OP_REPLY;
- set_up_flow(in_band, IBR_TO_LOCAL_ARP, &flow,
- (OFPFW_DL_TYPE | OFPFW_DL_DST | OFPFW_NW_PROTO),
- OFPP_NORMAL);
-
- /* Allow the connection's interface to be the source of ARP traffic. */
- memset(&flow, 0, sizeof flow);
- flow.dl_type = htons(ETH_TYPE_ARP);
- memcpy(flow.dl_src, local_mac, ETH_ADDR_LEN);
- flow.nw_proto = ARP_OP_REQUEST;
- set_up_flow(in_band, IBR_FROM_LOCAL_ARP, &flow,
- (OFPFW_DL_TYPE | OFPFW_DL_SRC | OFPFW_NW_PROTO),
- OFPP_NORMAL);
- } else {
- drop_flow(in_band, IBR_TO_LOCAL_ARP);
- drop_flow(in_band, IBR_FROM_LOCAL_ARP);
- }
-
- if (remote_mac) {
- /* Allow ARP replies to the remote side's MAC. */
- memset(&flow, 0, sizeof flow);
- flow.dl_type = htons(ETH_TYPE_ARP);
- memcpy(flow.dl_dst, remote_mac, ETH_ADDR_LEN);
- flow.nw_proto = ARP_OP_REPLY;
- set_up_flow(in_band, IBR_TO_REMOTE_ARP, &flow,
- (OFPFW_DL_TYPE | OFPFW_DL_DST | OFPFW_NW_PROTO),
- OFPP_NORMAL);
-
- /* Allow ARP requests from the remote side's MAC. */
- memset(&flow, 0, sizeof flow);
- flow.dl_type = htons(ETH_TYPE_ARP);
- memcpy(flow.dl_src, remote_mac, ETH_ADDR_LEN);
- flow.nw_proto = ARP_OP_REQUEST;
- set_up_flow(in_band, IBR_FROM_REMOTE_ARP, &flow,
- (OFPFW_DL_TYPE | OFPFW_DL_SRC | OFPFW_NW_PROTO),
- OFPP_NORMAL);
- } else {
- drop_flow(in_band, IBR_TO_REMOTE_ARP);
- drop_flow(in_band, IBR_FROM_REMOTE_ARP);
- }
-
- if (controller_ip) {
- /* Allow ARP replies to the controller's IP. */
- memset(&flow, 0, sizeof flow);
- flow.dl_type = htons(ETH_TYPE_ARP);
- flow.nw_proto = ARP_OP_REPLY;
- flow.nw_dst = controller_ip;
- set_up_flow(in_band, IBR_TO_CTL_ARP, &flow,
- (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_DST_MASK),
- OFPP_NORMAL);
-
- /* Allow ARP requests from the controller's IP. */
- memset(&flow, 0, sizeof flow);
- flow.dl_type = htons(ETH_TYPE_ARP);
- flow.nw_proto = ARP_OP_REQUEST;
- flow.nw_src = controller_ip;
- set_up_flow(in_band, IBR_FROM_CTL_ARP, &flow,
- (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_SRC_MASK),
- OFPP_NORMAL);
-
- /* OpenFlow traffic to or from the controller.
- *
- * (A given field's value is completely ignored if it is wildcarded,
- * which is why we can get away with using a single 'flow' in each
- * case here.) */
- memset(&flow, 0, sizeof flow);
- flow.dl_type = htons(ETH_TYPE_IP);
- flow.nw_proto = IP_TYPE_TCP;
- flow.nw_src = controller_ip;
- flow.nw_dst = controller_ip;
- flow.tp_src = htons(OFP_TCP_PORT);
- flow.tp_dst = htons(OFP_TCP_PORT);
- set_up_flow(in_band, IBR_TO_CTL_OFP, &flow,
- (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_DST_MASK
- | OFPFW_TP_DST), OFPP_NORMAL);
- set_up_flow(in_band, IBR_FROM_CTL_OFP, &flow,
- (OFPFW_DL_TYPE | OFPFW_NW_PROTO | OFPFW_NW_SRC_MASK
- | OFPFW_TP_SRC), OFPP_NORMAL);
- } else {
- drop_flow(in_band, IBR_TO_CTL_ARP);
- drop_flow(in_band, IBR_FROM_CTL_ARP);
- drop_flow(in_band, IBR_TO_CTL_OFP);
- drop_flow(in_band, IBR_FROM_CTL_OFP);
- }
-}
-
-void
-in_band_wait(struct in_band *in_band)
-{
- time_t now = time_now();
- time_t wakeup
- = MIN(in_band->next_remote_refresh, in_band->next_local_refresh);
- if (wakeup > now) {
- poll_timer_wait((wakeup - now) * 1000);
- } else {
- poll_immediate_wake();
- }
-}
-
-void
-in_band_flushed(struct in_band *in_band)
-{
- int i;
-
- for (i = 0; i < N_IB_RULES; i++) {
- in_band->rules[i].installed = false;
- }
-}
-
-void
-in_band_create(struct ofproto *ofproto, struct dpif *dpif,
- struct switch_status *ss, struct rconn *controller,
- struct in_band **in_bandp)
-{
- struct in_band *in_band;
- int error;
-
- in_band = xcalloc(1, sizeof *in_band);
- error = dpif_port_get_name(dpif, ODPP_LOCAL, in_band->local_name,
- sizeof in_band->local_name);
- if (error) {
- free(in_band);
- return;
- }
-
- in_band->ofproto = ofproto;
- in_band->controller = controller;
- in_band->ss_cat = switch_status_register(ss, "in-band",
- in_band_status_cb, in_band);
- in_band->next_remote_refresh = TIME_MIN;
- in_band->next_local_refresh = TIME_MIN;
-
- *in_bandp = in_band;
-}
-
-void
-in_band_destroy(struct in_band *in_band)
-{
- if (in_band) {
- switch_status_unregister(in_band->ss_cat);
- /* We don't own the rconn. */
- }
-}
-
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef IN_BAND_H
-#define IN_BAND_H 1
-
-#include "flow.h"
-
-struct dpif;
-struct in_band;
-struct odp_actions;
-struct ofproto;
-struct rconn;
-struct secchan;
-struct settings;
-struct switch_status;
-
-void in_band_create(struct ofproto *, struct dpif *, struct switch_status *,
- struct rconn *controller, struct in_band **);
-void in_band_destroy(struct in_band *);
-void in_band_run(struct in_band *);
-bool in_band_msg_in_hook(struct in_band *, const flow_t *,
- const struct ofpbuf *packet);
-bool in_band_rule_check(struct in_band *, const flow_t *,
- const struct odp_actions *);
-void in_band_wait(struct in_band *);
-void in_band_flushed(struct in_band *);
-
-#endif /* in-band.h */
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include <assert.h>
-#include <errno.h>
-#include <getopt.h>
-#include <inttypes.h>
-#include <netinet/in.h>
-#include <stdlib.h>
-#include <signal.h>
-#include <string.h>
-
-#include "command-line.h"
-#include "compiler.h"
-#include "daemon.h"
-#include "dirs.h"
-#include "discovery.h"
-#include "dpif.h"
-#include "fail-open.h"
-#include "fault.h"
-#include "in-band.h"
-#include "leak-checker.h"
-#include "list.h"
-#include "netdev.h"
-#include "ofpbuf.h"
-#include "ofproto.h"
-#include "openflow/openflow.h"
-#include "packets.h"
-#include "poll-loop.h"
-#include "rconn.h"
-#include "status.h"
-#include "svec.h"
-#include "timeval.h"
-#include "unixctl.h"
-#include "util.h"
-#include "vconn-ssl.h"
-#include "vconn.h"
-
-#include "vlog.h"
-#define THIS_MODULE VLM_secchan
-
-/* Behavior when the connection to the controller fails. */
-enum fail_mode {
- FAIL_OPEN, /* Act as learning switch. */
- FAIL_CLOSED /* Drop all packets. */
-};
-
-/* Settings that may be configured by the user. */
-struct ofsettings {
- /* Overall mode of operation. */
- bool discovery; /* Discover the controller automatically? */
- bool in_band; /* Connect to controller in-band? */
-
- /* Datapath. */
- uint64_t datapath_id; /* Datapath ID. */
- const char *dp_name; /* Name of local datapath. */
-
- /* Description strings. */
- const char *mfr_desc; /* Manufacturer. */
- const char *hw_desc; /* Hardware. */
- const char *sw_desc; /* Software version. */
- const char *serial_desc; /* Serial number. */
-
- /* Related vconns and network devices. */
- const char *controller_name; /* Controller (if not discovery mode). */
- struct svec listeners; /* Listen for management connections. */
- struct svec snoops; /* Listen for controller snooping conns. */
-
- /* Failure behavior. */
- enum fail_mode fail_mode; /* Act as learning switch if no controller? */
- int max_idle; /* Idle time for flows in fail-open mode. */
- int probe_interval; /* # seconds idle before sending echo request. */
- int max_backoff; /* Max # seconds between connection attempts. */
-
- /* Packet-in rate-limiting. */
- int rate_limit; /* Tokens added to bucket per second. */
- int burst_limit; /* Maximum number token bucket size. */
-
- /* Discovery behavior. */
- const char *accept_controller_re; /* Controller vconns to accept. */
- bool update_resolv_conf; /* Update /etc/resolv.conf? */
-
- /* Spanning tree protocol. */
- bool enable_stp;
-
- /* Remote command execution. */
- char *command_acl; /* Command white/blacklist, as shell globs. */
- char *command_dir; /* Directory that contains commands. */
-
- /* Management. */
- uint64_t mgmt_id; /* Management ID. */
-
- /* NetFlow. */
- struct svec netflow; /* NetFlow targets. */
-};
-
-static void parse_options(int argc, char *argv[], struct ofsettings *);
-static void usage(void) NO_RETURN;
-
-int
-main(int argc, char *argv[])
-{
- struct unixctl_server *unixctl;
- struct ofproto *ofproto;
- struct ofsettings s;
- int error;
- struct netflow_options nf_options;
-
- set_program_name(argv[0]);
- register_fault_handlers();
- time_init();
- vlog_init();
- parse_options(argc, argv, &s);
- signal(SIGPIPE, SIG_IGN);
-
- die_if_already_running();
- daemonize();
-
- /* Start listening for ovs-appctl requests. */
- error = unixctl_server_create(NULL, &unixctl);
- if (error) {
- ovs_fatal(error, "Could not listen for unixctl connections");
- }
-
- VLOG_INFO("Open vSwitch version %s", VERSION BUILDNR);
- VLOG_INFO("OpenFlow protocol version 0x%02x", OFP_VERSION);
-
- /* Start OpenFlow processing. */
- error = ofproto_create(s.dp_name, NULL, NULL, &ofproto);
- if (error) {
- ovs_fatal(error, "could not initialize openflow switch");
- }
- error = ofproto_set_in_band(ofproto, s.in_band);
- if (error) {
- ovs_fatal(error, "failed to configure in-band control");
- }
- error = ofproto_set_discovery(ofproto, s.discovery, s.accept_controller_re,
- s.update_resolv_conf);
- if (error) {
- ovs_fatal(error, "failed to configure controller discovery");
- }
- if (s.datapath_id) {
- ofproto_set_datapath_id(ofproto, s.datapath_id);
- }
- if (s.mgmt_id) {
- ofproto_set_mgmt_id(ofproto, s.mgmt_id);
- }
- ofproto_set_desc(ofproto, s.mfr_desc, s.hw_desc, s.sw_desc, s.serial_desc);
- error = ofproto_set_listeners(ofproto, &s.listeners);
- if (error) {
- ovs_fatal(error, "failed to configure management connections");
- }
- error = ofproto_set_snoops(ofproto, &s.snoops);
- if (error) {
- ovs_fatal(error,
- "failed to configure controller snooping connections");
- }
- memset(&nf_options, 0, sizeof nf_options);
- nf_options.collectors = s.netflow;
- error = ofproto_set_netflow(ofproto, &nf_options);
- if (error) {
- ovs_fatal(error, "failed to configure NetFlow collectors");
- }
- ofproto_set_failure(ofproto, s.fail_mode == FAIL_OPEN);
- ofproto_set_probe_interval(ofproto, s.probe_interval);
- ofproto_set_max_backoff(ofproto, s.max_backoff);
- ofproto_set_rate_limit(ofproto, s.rate_limit, s.burst_limit);
- error = ofproto_set_stp(ofproto, s.enable_stp);
- if (error) {
- ovs_fatal(error, "failed to configure STP");
- }
- error = ofproto_set_remote_execution(ofproto, s.command_acl,
- s.command_dir);
- if (error) {
- ovs_fatal(error, "failed to configure remote command execution");
- }
- if (!s.discovery) {
- error = ofproto_set_controller(ofproto, s.controller_name);
- if (error) {
- ovs_fatal(error, "failed to configure controller");
- }
- }
-
- while (ofproto_is_alive(ofproto)) {
- error = ofproto_run(ofproto);
- if (error) {
- ovs_fatal(error, "unrecoverable datapath error");
- }
- unixctl_server_run(unixctl);
-
- ofproto_wait(ofproto);
- unixctl_server_wait(unixctl);
- poll_block();
- }
-
- return 0;
-}
-\f
-/* User interface. */
-
-static void
-parse_options(int argc, char *argv[], struct ofsettings *s)
-{
- enum {
- OPT_DATAPATH_ID = UCHAR_MAX + 1,
- OPT_MANUFACTURER,
- OPT_HARDWARE,
- OPT_SOFTWARE,
- OPT_SERIAL,
- OPT_ACCEPT_VCONN,
- OPT_NO_RESOLV_CONF,
- OPT_BR_NAME,
- OPT_FAIL_MODE,
- OPT_INACTIVITY_PROBE,
- OPT_MAX_IDLE,
- OPT_MAX_BACKOFF,
- OPT_SNOOP,
- OPT_RATE_LIMIT,
- OPT_BURST_LIMIT,
- OPT_BOOTSTRAP_CA_CERT,
- OPT_STP,
- OPT_NO_STP,
- OPT_OUT_OF_BAND,
- OPT_IN_BAND,
- OPT_COMMAND_ACL,
- OPT_COMMAND_DIR,
- OPT_NETFLOW,
- OPT_MGMT_ID,
- VLOG_OPTION_ENUMS,
- LEAK_CHECKER_OPTION_ENUMS
- };
- static struct option long_options[] = {
- {"datapath-id", required_argument, 0, OPT_DATAPATH_ID},
- {"manufacturer", required_argument, 0, OPT_MANUFACTURER},
- {"hardware", required_argument, 0, OPT_HARDWARE},
- {"software", required_argument, 0, OPT_SOFTWARE},
- {"serial", required_argument, 0, OPT_SERIAL},
- {"accept-vconn", required_argument, 0, OPT_ACCEPT_VCONN},
- {"no-resolv-conf", no_argument, 0, OPT_NO_RESOLV_CONF},
- {"config", required_argument, 0, 'F'},
- {"br-name", required_argument, 0, OPT_BR_NAME},
- {"fail", required_argument, 0, OPT_FAIL_MODE},
- {"inactivity-probe", required_argument, 0, OPT_INACTIVITY_PROBE},
- {"max-idle", required_argument, 0, OPT_MAX_IDLE},
- {"max-backoff", required_argument, 0, OPT_MAX_BACKOFF},
- {"listen", required_argument, 0, 'l'},
- {"snoop", required_argument, 0, OPT_SNOOP},
- {"rate-limit", optional_argument, 0, OPT_RATE_LIMIT},
- {"burst-limit", required_argument, 0, OPT_BURST_LIMIT},
- {"stp", no_argument, 0, OPT_STP},
- {"no-stp", no_argument, 0, OPT_NO_STP},
- {"out-of-band", no_argument, 0, OPT_OUT_OF_BAND},
- {"in-band", no_argument, 0, OPT_IN_BAND},
- {"command-acl", required_argument, 0, OPT_COMMAND_ACL},
- {"command-dir", required_argument, 0, OPT_COMMAND_DIR},
- {"netflow", required_argument, 0, OPT_NETFLOW},
- {"mgmt-id", required_argument, 0, OPT_MGMT_ID},
- {"verbose", optional_argument, 0, 'v'},
- {"help", no_argument, 0, 'h'},
- {"version", no_argument, 0, 'V'},
- DAEMON_LONG_OPTIONS,
- VLOG_LONG_OPTIONS,
- LEAK_CHECKER_LONG_OPTIONS,
-#ifdef HAVE_OPENSSL
- VCONN_SSL_LONG_OPTIONS
- {"bootstrap-ca-cert", required_argument, 0, OPT_BOOTSTRAP_CA_CERT},
-#endif
- {0, 0, 0, 0},
- };
- char *short_options = long_options_to_short_options(long_options);
-
- /* Set defaults that we can figure out before parsing options. */
- s->datapath_id = 0;
- s->mfr_desc = NULL;
- s->hw_desc = NULL;
- s->sw_desc = NULL;
- s->serial_desc = NULL;
- svec_init(&s->listeners);
- svec_init(&s->snoops);
- s->fail_mode = FAIL_OPEN;
- s->max_idle = 0;
- s->probe_interval = 0;
- s->max_backoff = 8;
- s->update_resolv_conf = true;
- s->rate_limit = 0;
- s->burst_limit = 0;
- s->accept_controller_re = NULL;
- s->enable_stp = false;
- s->in_band = true;
- s->command_acl = "";
- s->command_dir = NULL;
- svec_init(&s->netflow);
- s->mgmt_id = 0;
- for (;;) {
- int c;
-
- c = getopt_long(argc, argv, short_options, long_options, NULL);
- if (c == -1) {
- break;
- }
-
- switch (c) {
- case OPT_DATAPATH_ID:
- if (strlen(optarg) != 12
- || strspn(optarg, "0123456789abcdefABCDEF") != 12) {
- ovs_fatal(0, "argument to --datapath-id must be "
- "exactly 12 hex digits");
- }
- s->datapath_id = strtoll(optarg, NULL, 16);
- if (!s->datapath_id) {
- ovs_fatal(0, "argument to --datapath-id must be nonzero");
- }
- break;
-
- case OPT_MANUFACTURER:
- s->mfr_desc = optarg;
- break;
-
- case OPT_HARDWARE:
- s->hw_desc = optarg;
- break;
-
- case OPT_SOFTWARE:
- s->sw_desc = optarg;
- break;
-
- case OPT_SERIAL:
- s->serial_desc = optarg;
- break;
-
- case OPT_ACCEPT_VCONN:
- s->accept_controller_re = optarg;
- break;
-
- case OPT_NO_RESOLV_CONF:
- s->update_resolv_conf = false;
- break;
-
- case OPT_FAIL_MODE:
- if (!strcmp(optarg, "open")) {
- s->fail_mode = FAIL_OPEN;
- } else if (!strcmp(optarg, "closed")) {
- s->fail_mode = FAIL_CLOSED;
- } else {
- ovs_fatal(0, "--fail argument must be \"open\" or \"closed\"");
- }
- break;
-
- case OPT_INACTIVITY_PROBE:
- s->probe_interval = atoi(optarg);
- if (s->probe_interval < 5) {
- ovs_fatal(0, "--inactivity-probe argument must be at least 5");
- }
- break;
-
- case OPT_MAX_IDLE:
- if (!strcmp(optarg, "permanent")) {
- s->max_idle = OFP_FLOW_PERMANENT;
- } else {
- s->max_idle = atoi(optarg);
- if (s->max_idle < 1 || s->max_idle > 65535) {
- ovs_fatal(0, "--max-idle argument must be between 1 and "
- "65535 or the word 'permanent'");
- }
- }
- break;
-
- case OPT_MAX_BACKOFF:
- s->max_backoff = atoi(optarg);
- if (s->max_backoff < 1) {
- ovs_fatal(0, "--max-backoff argument must be at least 1");
- } else if (s->max_backoff > 3600) {
- s->max_backoff = 3600;
- }
- break;
-
- case OPT_RATE_LIMIT:
- if (optarg) {
- s->rate_limit = atoi(optarg);
- if (s->rate_limit < 1) {
- ovs_fatal(0, "--rate-limit argument must be at least 1");
- }
- } else {
- s->rate_limit = 1000;
- }
- break;
-
- case OPT_BURST_LIMIT:
- s->burst_limit = atoi(optarg);
- if (s->burst_limit < 1) {
- ovs_fatal(0, "--burst-limit argument must be at least 1");
- }
- break;
-
- case OPT_STP:
- s->enable_stp = true;
- break;
-
- case OPT_NO_STP:
- s->enable_stp = false;
- break;
-
- case OPT_OUT_OF_BAND:
- s->in_band = false;
- break;
-
- case OPT_IN_BAND:
- s->in_band = true;
- break;
-
- case OPT_COMMAND_ACL:
- s->command_acl = (s->command_acl[0]
- ? xasprintf("%s,%s", s->command_acl, optarg)
- : optarg);
- break;
-
- case OPT_COMMAND_DIR:
- s->command_dir = optarg;
- break;
-
- case OPT_NETFLOW:
- svec_add(&s->netflow, optarg);
- break;
-
- case OPT_MGMT_ID:
- if (strlen(optarg) != 12
- || strspn(optarg, "0123456789abcdefABCDEF") != 12) {
- ovs_fatal(0, "argument to --mgmt-id must be "
- "exactly 12 hex digits");
- }
- s->mgmt_id = strtoll(optarg, NULL, 16);
- if (!s->mgmt_id) {
- ovs_fatal(0, "argument to --mgmt-id must be nonzero");
- }
- break;
-
- case 'l':
- svec_add(&s->listeners, optarg);
- break;
-
- case OPT_SNOOP:
- svec_add(&s->snoops, optarg);
- break;
-
- case 'h':
- usage();
-
- case 'V':
- OVS_PRINT_VERSION(OFP_VERSION, OFP_VERSION);
- exit(EXIT_SUCCESS);
-
- DAEMON_OPTION_HANDLERS
-
- VLOG_OPTION_HANDLERS
-
- LEAK_CHECKER_OPTION_HANDLERS
-
-#ifdef HAVE_OPENSSL
- VCONN_SSL_OPTION_HANDLERS
-
- case OPT_BOOTSTRAP_CA_CERT:
- vconn_ssl_set_ca_cert_file(optarg, true);
- break;
-#endif
-
- case '?':
- exit(EXIT_FAILURE);
-
- default:
- abort();
- }
- }
- free(short_options);
-
- argc -= optind;
- argv += optind;
- if (argc < 1 || argc > 2) {
- ovs_fatal(0, "need one or two non-option arguments; "
- "use --help for usage");
- }
-
- /* Local and remote vconns. */
- s->dp_name = argv[0];
- s->controller_name = argc > 1 ? xstrdup(argv[1]) : NULL;
-
- /* Set accept_controller_regex. */
- if (!s->accept_controller_re) {
- s->accept_controller_re = vconn_ssl_is_configured() ? "^ssl:.*" : ".*";
- }
-
- /* Mode of operation. */
- s->discovery = s->controller_name == NULL;
- if (s->discovery && !s->in_band) {
- ovs_fatal(0, "Cannot perform discovery with out-of-band control");
- }
-
- /* Rate limiting. */
- if (s->rate_limit && s->rate_limit < 100) {
- VLOG_WARN("Rate limit set to unusually low value %d", s->rate_limit);
- }
-}
-
-static void
-usage(void)
-{
- printf("%s: an OpenFlow switch implementation.\n"
- "usage: %s [OPTIONS] DATAPATH [CONTROLLER]\n"
- "DATAPATH is a local datapath (e.g. \"dp0\").\n"
- "CONTROLLER is an active OpenFlow connection method; if it is\n"
- "omitted, then secchan performs controller discovery.\n",
- program_name, program_name);
- vconn_usage(true, true, true);
- printf("\nOpenFlow options:\n"
- " -d, --datapath-id=ID Use ID as the OpenFlow switch ID\n"
- " (ID must consist of 12 hex digits)\n"
- " --mgmt-id=ID Use ID as the management ID\n"
- " (ID must consist of 12 hex digits)\n"
- " --manufacturer=MFR Identify manufacturer as MFR\n"
- " --hardware=HW Identify hardware as HW\n"
- " --software=SW Identify software as SW\n"
- " --serial=SERIAL Identify serial number as SERIAL\n"
- "\nController discovery options:\n"
- " --accept-vconn=REGEX accept matching discovered controllers\n"
- " --no-resolv-conf do not update /etc/resolv.conf\n"
- "\nNetworking options:\n"
- " --fail=open|closed when controller connection fails:\n"
- " closed: drop all packets\n"
- " open (default): act as learning switch\n"
- " --inactivity-probe=SECS time between inactivity probes\n"
- " --max-idle=SECS max idle for flows set up by secchan\n"
- " --max-backoff=SECS max time between controller connection\n"
- " attempts (default: 8 seconds)\n"
- " -l, --listen=METHOD allow management connections on METHOD\n"
- " (a passive OpenFlow connection method)\n"
- " --snoop=METHOD allow controller snooping on METHOD\n"
- " (a passive OpenFlow connection method)\n"
- " --out-of-band controller connection is out-of-band\n"
- " --netflow=HOST:PORT configure NetFlow output target\n"
- "\nRate-limiting of \"packet-in\" messages to the controller:\n"
- " --rate-limit[=PACKETS] max rate, in packets/s (default: 1000)\n"
- " --burst-limit=BURST limit on packet credit for idle time\n"
- "\nRemote command execution options:\n"
- " --command-acl=[!]GLOB[,[!]GLOB...] set allowed/denied commands\n"
- " --command-dir=DIR set command dir (default: %s/commands)\n",
- ovs_pkgdatadir);
- daemon_usage();
- vlog_usage();
- printf("\nOther options:\n"
- " -h, --help display this help message\n"
- " -V, --version display version information\n");
- leak_checker_usage();
- exit(EXIT_SUCCESS);
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "netflow.h"
-#include <arpa/inet.h>
-#include <errno.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include "cfg.h"
-#include "flow.h"
-#include "netflow.h"
-#include "ofpbuf.h"
-#include "ofproto.h"
-#include "packets.h"
-#include "socket-util.h"
-#include "svec.h"
-#include "timeval.h"
-#include "util.h"
-#include "xtoxll.h"
-
-#define THIS_MODULE VLM_netflow
-#include "vlog.h"
-
-#define NETFLOW_V5_VERSION 5
-
-static const int ACTIVE_TIMEOUT_DEFAULT = 600;
-
-/* Every NetFlow v5 message contains the header that follows. This is
- * followed by up to thirty records that describe a terminating flow.
- * We only send a single record per NetFlow message.
- */
-struct netflow_v5_header {
- uint16_t version; /* NetFlow version is 5. */
- uint16_t count; /* Number of records in this message. */
- uint32_t sysuptime; /* System uptime in milliseconds. */
- uint32_t unix_secs; /* Number of seconds since Unix epoch. */
- uint32_t unix_nsecs; /* Number of residual nanoseconds
- after epoch seconds. */
- uint32_t flow_seq; /* Number of flows since sending
- messages began. */
- uint8_t engine_type; /* Engine type. */
- uint8_t engine_id; /* Engine id. */
- uint16_t sampling_interval; /* Set to zero. */
-};
-BUILD_ASSERT_DECL(sizeof(struct netflow_v5_header) == 24);
-
-/* A NetFlow v5 description of a terminating flow. It is preceded by a
- * NetFlow v5 header.
- */
-struct netflow_v5_record {
- uint32_t src_addr; /* Source IP address. */
- uint32_t dst_addr; /* Destination IP address. */
- uint32_t nexthop; /* IP address of next hop. Set to 0. */
- uint16_t input; /* Input interface index. */
- uint16_t output; /* Output interface index. */
- uint32_t packet_count; /* Number of packets. */
- uint32_t byte_count; /* Number of bytes. */
- uint32_t init_time; /* Value of sysuptime on first packet. */
- uint32_t used_time; /* Value of sysuptime on last packet. */
-
- /* The 'src_port' and 'dst_port' identify the source and destination
- * port, respectively, for TCP and UDP. For ICMP, the high-order
- * byte identifies the type and low-order byte identifies the code
- * in the 'dst_port' field. */
- uint16_t src_port;
- uint16_t dst_port;
-
- uint8_t pad1;
- uint8_t tcp_flags; /* Union of seen TCP flags. */
- uint8_t ip_proto; /* IP protocol. */
- uint8_t ip_tos; /* IP TOS value. */
- uint16_t src_as; /* Source AS ID. Set to 0. */
- uint16_t dst_as; /* Destination AS ID. Set to 0. */
- uint8_t src_mask; /* Source mask bits. Set to 0. */
- uint8_t dst_mask; /* Destination mask bits. Set to 0. */
- uint8_t pad[2];
-};
-BUILD_ASSERT_DECL(sizeof(struct netflow_v5_record) == 48);
-
-struct netflow {
- uint8_t engine_type; /* Value of engine_type to use. */
- uint8_t engine_id; /* Value of engine_id to use. */
- long long int boot_time; /* Time when netflow_create() was called. */
- int *fds; /* Sockets for NetFlow collectors. */
- size_t n_fds; /* Number of Netflow collectors. */
- bool add_id_to_iface; /* Put the 7 least signficiant bits of
- * 'engine_id' into the most signficant
- * bits of the interface fields. */
- uint32_t netflow_cnt; /* Flow sequence number for NetFlow. */
- struct ofpbuf packet; /* NetFlow packet being accumulated. */
- long long int active_timeout; /* Timeout for flows that are still active. */
- long long int reconfig_time; /* When we reconfigured the timeouts. */
-};
-
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
-
-static int
-open_collector(char *dst)
-{
- char *save_ptr = NULL;
- const char *host_name;
- const char *port_string;
- struct sockaddr_in sin;
- int retval;
- int fd;
-
- /* Glibc 2.7 has a bug in strtok_r when compiling with optimization that
- * can cause segfaults here:
- * http://sources.redhat.com/bugzilla/show_bug.cgi?id=5614.
- * Using "::" instead of the obvious ":" works around it. */
- host_name = strtok_r(dst, ":", &save_ptr);
- port_string = strtok_r(NULL, ":", &save_ptr);
- if (!host_name) {
- ovs_error(0, "%s: bad peer name format", dst);
- return -EAFNOSUPPORT;
- }
- if (!port_string) {
- ovs_error(0, "%s: bad port format", dst);
- return -EAFNOSUPPORT;
- }
-
- memset(&sin, 0, sizeof sin);
- sin.sin_family = AF_INET;
- if (lookup_ip(host_name, &sin.sin_addr)) {
- return -ENOENT;
- }
- sin.sin_port = htons(atoi(port_string));
-
- fd = socket(AF_INET, SOCK_DGRAM, 0);
- if (fd < 0) {
- VLOG_ERR("%s: socket: %s", dst, strerror(errno));
- return -errno;
- }
-
- retval = set_nonblocking(fd);
- if (retval) {
- close(fd);
- return -retval;
- }
-
- retval = connect(fd, (struct sockaddr *) &sin, sizeof sin);
- if (retval < 0) {
- int error = errno;
- VLOG_ERR("%s: connect: %s", dst, strerror(error));
- close(fd);
- return -error;
- }
-
- return fd;
-}
-
-void
-netflow_expire(struct netflow *nf, struct netflow_flow *nf_flow,
- struct ofexpired *expired)
-{
- struct netflow_v5_header *nf_hdr;
- struct netflow_v5_record *nf_rec;
- struct timeval now;
-
- nf_flow->last_expired += nf->active_timeout;
-
- /* NetFlow only reports on IP packets and we should only report flows
- * that actually have traffic. */
- if (expired->flow.dl_type != htons(ETH_TYPE_IP) ||
- expired->packet_count - nf_flow->packet_count_off == 0) {
- return;
- }
-
- time_timeval(&now);
-
- if (!nf->packet.size) {
- nf_hdr = ofpbuf_put_zeros(&nf->packet, sizeof *nf_hdr);
- nf_hdr->version = htons(NETFLOW_V5_VERSION);
- nf_hdr->count = htons(0);
- nf_hdr->sysuptime = htonl(time_msec() - nf->boot_time);
- nf_hdr->unix_secs = htonl(now.tv_sec);
- nf_hdr->unix_nsecs = htonl(now.tv_usec * 1000);
- nf_hdr->flow_seq = htonl(nf->netflow_cnt++);
- nf_hdr->engine_type = nf->engine_type;
- nf_hdr->engine_id = nf->engine_id;
- nf_hdr->sampling_interval = htons(0);
- }
-
- nf_hdr = nf->packet.data;
- nf_hdr->count = htons(ntohs(nf_hdr->count) + 1);
-
- nf_rec = ofpbuf_put_zeros(&nf->packet, sizeof *nf_rec);
- nf_rec->src_addr = expired->flow.nw_src;
- nf_rec->dst_addr = expired->flow.nw_dst;
- nf_rec->nexthop = htons(0);
- if (nf->add_id_to_iface) {
- uint16_t iface = (nf->engine_id & 0x7f) << 9;
- nf_rec->input = htons(iface | (expired->flow.in_port & 0x1ff));
- nf_rec->output = htons(iface | (nf_flow->output_iface & 0x1ff));
- } else {
- nf_rec->input = htons(expired->flow.in_port);
- nf_rec->output = htons(nf_flow->output_iface);
- }
- nf_rec->packet_count = htonl(MIN(expired->packet_count -
- nf_flow->packet_count_off, UINT32_MAX));
- nf_rec->byte_count = htonl(MIN(expired->byte_count -
- nf_flow->byte_count_off, UINT32_MAX));
- nf_rec->init_time = htonl(nf_flow->created - nf->boot_time);
- nf_rec->used_time = htonl(MAX(nf_flow->created, expired->used)
- - nf->boot_time);
- if (expired->flow.nw_proto == IP_TYPE_ICMP) {
- /* In NetFlow, the ICMP type and code are concatenated and
- * placed in the 'dst_port' field. */
- uint8_t type = ntohs(expired->flow.tp_src);
- uint8_t code = ntohs(expired->flow.tp_dst);
- nf_rec->src_port = htons(0);
- nf_rec->dst_port = htons((type << 8) | code);
- } else {
- nf_rec->src_port = expired->flow.tp_src;
- nf_rec->dst_port = expired->flow.tp_dst;
- }
- nf_rec->tcp_flags = nf_flow->tcp_flags;
- nf_rec->ip_proto = expired->flow.nw_proto;
- nf_rec->ip_tos = nf_flow->ip_tos;
-
- /* Update flow tracking data. */
- nf_flow->created = 0;
- nf_flow->packet_count_off = expired->packet_count;
- nf_flow->byte_count_off = expired->byte_count;
- nf_flow->tcp_flags = 0;
-
- /* NetFlow messages are limited to 30 records. */
- if (ntohs(nf_hdr->count) >= 30) {
- netflow_run(nf);
- }
-}
-
-void
-netflow_run(struct netflow *nf)
-{
- size_t i;
-
- if (!nf->packet.size) {
- return;
- }
-
- for (i = 0; i < nf->n_fds; i++) {
- if (send(nf->fds[i], nf->packet.data, nf->packet.size, 0) == -1) {
- VLOG_WARN_RL(&rl, "netflow message send failed: %s",
- strerror(errno));
- }
- }
- nf->packet.size = 0;
-}
-
-static void
-clear_collectors(struct netflow *nf)
-{
- size_t i;
-
- for (i = 0; i < nf->n_fds; i++) {
- close(nf->fds[i]);
- }
- free(nf->fds);
- nf->fds = NULL;
- nf->n_fds = 0;
-}
-
-int
-netflow_set_options(struct netflow *nf,
- const struct netflow_options *nf_options)
-{
- struct svec collectors;
- int error = 0;
- size_t i;
- long long int old_timeout;
-
- nf->engine_type = nf_options->engine_type;
- nf->engine_id = nf_options->engine_id;
- nf->add_id_to_iface = nf_options->add_id_to_iface;
-
- clear_collectors(nf);
-
- svec_clone(&collectors, &nf_options->collectors);
- svec_sort_unique(&collectors);
-
- nf->fds = xmalloc(sizeof *nf->fds * collectors.n);
- for (i = 0; i < collectors.n; i++) {
- const char *name = collectors.names[i];
- char *tmpname = xstrdup(name);
- int fd = open_collector(tmpname);
- free(tmpname);
- if (fd >= 0) {
- nf->fds[nf->n_fds++] = fd;
- } else {
- VLOG_WARN("couldn't open connection to collector (%s), "
- "ignoring %s\n", strerror(-fd), name);
- if (!error) {
- error = -fd;
- }
- }
- }
-
- svec_destroy(&collectors);
-
- old_timeout = nf->active_timeout;
- if (nf_options->active_timeout != -1) {
- nf->active_timeout = nf_options->active_timeout;
- } else {
- nf->active_timeout = ACTIVE_TIMEOUT_DEFAULT;
- }
- nf->active_timeout *= 1000;
- if (old_timeout != nf->active_timeout) {
- nf->reconfig_time = time_msec();
- }
-
- return error;
-}
-
-struct netflow *
-netflow_create(void)
-{
- struct netflow *nf = xmalloc(sizeof *nf);
- nf->engine_type = 0;
- nf->engine_id = 0;
- nf->boot_time = time_msec();
- nf->fds = NULL;
- nf->n_fds = 0;
- nf->add_id_to_iface = false;
- nf->netflow_cnt = 0;
- ofpbuf_init(&nf->packet, 1500);
- return nf;
-}
-
-void
-netflow_destroy(struct netflow *nf)
-{
- if (nf) {
- ofpbuf_uninit(&nf->packet);
- clear_collectors(nf);
- free(nf);
- }
-}
-
-void
-netflow_flow_clear(struct netflow_flow *nf_flow)
-{
- uint16_t output_iface = nf_flow->output_iface;
-
- memset(nf_flow, 0, sizeof *nf_flow);
- nf_flow->output_iface = output_iface;
-}
-
-void
-netflow_flow_update_time(struct netflow *nf, struct netflow_flow *nf_flow,
- long long int used)
-{
- if (!nf_flow->created) {
- nf_flow->created = used;
- }
-
- if (!nf || !nf->active_timeout || !nf_flow->last_expired ||
- nf->reconfig_time > nf_flow->last_expired) {
- /* Keep the time updated to prevent a flood of expiration in
- * the future. */
- nf_flow->last_expired = time_msec();
- }
-}
-
-void
-netflow_flow_update_flags(struct netflow_flow *nf_flow, uint8_t ip_tos,
- uint8_t tcp_flags)
-{
- nf_flow->ip_tos = ip_tos;
- nf_flow->tcp_flags |= tcp_flags;
-}
-
-bool
-netflow_active_timeout_expired(struct netflow *nf, struct netflow_flow *nf_flow)
-{
- if (nf->active_timeout) {
- return time_msec() > nf_flow->last_expired + nf->active_timeout;
- }
-
- return false;
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef NETFLOW_H
-#define NETFLOW_H 1
-
-#include "flow.h"
-#include "svec.h"
-
-struct ofexpired;
-
-struct netflow_options {
- struct svec collectors;
- uint8_t engine_type;
- uint8_t engine_id;
- int active_timeout;
- bool add_id_to_iface;
-};
-
-enum netflow_output_ports {
- NF_OUT_FLOOD = UINT16_MAX,
- NF_OUT_MULTI = UINT16_MAX - 1,
- NF_OUT_DROP = UINT16_MAX - 2
-};
-
-struct netflow_flow {
- long long int last_expired; /* Time this flow last timed out. */
- long long int created; /* Time flow was created since time out. */
-
- uint64_t packet_count_off; /* Packet count at last time out. */
- uint64_t byte_count_off; /* Byte count at last time out. */
-
- uint16_t output_iface; /* Output interface index. */
- uint8_t ip_tos; /* Last-seen IP type-of-service. */
- uint8_t tcp_flags; /* Bitwise-OR of all TCP flags seen. */
-};
-
-struct netflow *netflow_create(void);
-void netflow_destroy(struct netflow *);
-int netflow_set_options(struct netflow *, const struct netflow_options *);
-void netflow_expire(struct netflow *, struct netflow_flow *,
- struct ofexpired *);
-void netflow_run(struct netflow *);
-
-void netflow_flow_clear(struct netflow_flow *);
-void netflow_flow_update_time(struct netflow *, struct netflow_flow *,
- long long int used);
-void netflow_flow_update_flags(struct netflow_flow *, uint8_t ip_tos,
- uint8_t tcp_flags);
-bool netflow_active_timeout_expired(struct netflow *, struct netflow_flow *);
-
-#endif /* netflow.h */
+++ /dev/null
-/*
- * Copyright (c) 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "ofproto.h"
-#include <errno.h>
-#include <inttypes.h>
-#include <net/if.h>
-#include <netinet/in.h>
-#include <stdbool.h>
-#include <stdlib.h>
-#include "classifier.h"
-#include "coverage.h"
-#include "discovery.h"
-#include "dpif.h"
-#include "dynamic-string.h"
-#include "executer.h"
-#include "fail-open.h"
-#include "in-band.h"
-#include "mac-learning.h"
-#include "netdev.h"
-#include "netflow.h"
-#include "odp-util.h"
-#include "ofp-print.h"
-#include "ofpbuf.h"
-#include "openflow/nicira-ext.h"
-#include "openflow/openflow.h"
-#include "openflow/openflow-mgmt.h"
-#include "openvswitch/datapath-protocol.h"
-#include "packets.h"
-#include "pinsched.h"
-#include "pktbuf.h"
-#include "poll-loop.h"
-#include "port-array.h"
-#include "rconn.h"
-#include "shash.h"
-#include "status.h"
-#include "stp.h"
-#include "svec.h"
-#include "tag.h"
-#include "timeval.h"
-#include "unixctl.h"
-#include "vconn.h"
-#include "vconn-ssl.h"
-#include "xtoxll.h"
-
-#define THIS_MODULE VLM_ofproto
-#include "vlog.h"
-
-enum {
- DP_GROUP_FLOOD = 0,
- DP_GROUP_ALL = 1
-};
-
-enum {
- TABLEID_HASH = 0,
- TABLEID_CLASSIFIER = 1
-};
-
-struct ofport {
- struct netdev *netdev;
- struct ofp_phy_port opp; /* In host byte order. */
-};
-
-static void ofport_free(struct ofport *);
-static void hton_ofp_phy_port(struct ofp_phy_port *);
-
-static int xlate_actions(const union ofp_action *in, size_t n_in,
- const flow_t *flow, struct ofproto *ofproto,
- const struct ofpbuf *packet,
- struct odp_actions *out, tag_type *tags,
- bool *may_set_up_flow, uint16_t *nf_output_iface);
-
-struct rule {
- struct cls_rule cr;
-
- uint16_t idle_timeout; /* In seconds from time of last use. */
- uint16_t hard_timeout; /* In seconds from time of creation. */
- long long int used; /* Last-used time (0 if never used). */
- long long int created; /* Creation time. */
- uint64_t packet_count; /* Number of packets received. */
- uint64_t byte_count; /* Number of bytes received. */
- uint64_t accounted_bytes; /* Number of bytes passed to account_cb. */
- tag_type tags; /* Tags (set only by hooks). */
- struct netflow_flow nf_flow; /* Per-flow NetFlow tracking data. */
-
- /* If 'super' is non-NULL, this rule is a subrule, that is, it is an
- * exact-match rule (having cr.wc.wildcards of 0) generated from the
- * wildcard rule 'super'. In this case, 'list' is an element of the
- * super-rule's list.
- *
- * If 'super' is NULL, this rule is a super-rule, and 'list' is the head of
- * a list of subrules. A super-rule with no wildcards (where
- * cr.wc.wildcards is 0) will never have any subrules. */
- struct rule *super;
- struct list list;
-
- /* OpenFlow actions.
- *
- * A subrule has no actions (it uses the super-rule's actions). */
- int n_actions;
- union ofp_action *actions;
-
- /* Datapath actions.
- *
- * A super-rule with wildcard fields never has ODP actions (since the
- * datapath only supports exact-match flows). */
- bool installed; /* Installed in datapath? */
- bool may_install; /* True ordinarily; false if actions must
- * be reassessed for every packet. */
- int n_odp_actions;
- union odp_action *odp_actions;
-};
-
-static inline bool
-rule_is_hidden(const struct rule *rule)
-{
- /* Subrules are merely an implementation detail, so hide them from the
- * controller. */
- if (rule->super != NULL) {
- return true;
- }
-
- /* Rules with priority higher than UINT16_MAX are set up by secchan itself
- * (e.g. by in-band control) and are intentionally hidden from the
- * controller. */
- if (rule->cr.priority > UINT16_MAX) {
- return true;
- }
-
- return false;
-}
-
-static struct rule *rule_create(struct ofproto *, struct rule *super,
- const union ofp_action *, size_t n_actions,
- uint16_t idle_timeout, uint16_t hard_timeout);
-static void rule_free(struct rule *);
-static void rule_destroy(struct ofproto *, struct rule *);
-static struct rule *rule_from_cls_rule(const struct cls_rule *);
-static void rule_insert(struct ofproto *, struct rule *,
- struct ofpbuf *packet, uint16_t in_port);
-static void rule_remove(struct ofproto *, struct rule *);
-static bool rule_make_actions(struct ofproto *, struct rule *,
- const struct ofpbuf *packet);
-static void rule_install(struct ofproto *, struct rule *,
- struct rule *displaced_rule);
-static void rule_uninstall(struct ofproto *, struct rule *);
-static void rule_post_uninstall(struct ofproto *, struct rule *);
-
-struct ofconn {
- struct list node;
- struct rconn *rconn;
- struct pktbuf *pktbuf;
- bool send_flow_exp;
- int miss_send_len;
-
- struct rconn_packet_counter *packet_in_counter;
-
- /* Number of OpenFlow messages queued as replies to OpenFlow requests, and
- * the maximum number before we stop reading OpenFlow requests. */
-#define OFCONN_REPLY_MAX 100
- struct rconn_packet_counter *reply_counter;
-};
-
-static struct ofconn *ofconn_create(struct ofproto *, struct rconn *);
-static void ofconn_destroy(struct ofconn *, struct ofproto *);
-static void ofconn_run(struct ofconn *, struct ofproto *);
-static void ofconn_wait(struct ofconn *);
-static void queue_tx(struct ofpbuf *msg, const struct ofconn *ofconn,
- struct rconn_packet_counter *counter);
-
-struct ofproto {
- /* Settings. */
- uint64_t datapath_id; /* Datapath ID. */
- uint64_t fallback_dpid; /* Datapath ID if no better choice found. */
- uint64_t mgmt_id; /* Management channel identifier. */
- char *manufacturer; /* Manufacturer. */
- char *hardware; /* Hardware. */
- char *software; /* Software version. */
- char *serial; /* Serial number. */
-
- /* Datapath. */
- struct dpif dpif;
- struct dpifmon *dpifmon;
- struct port_array ports; /* Index is ODP port nr; ofport->opp.port_no is
- * OFP port nr. */
- struct shash port_by_name;
- uint32_t max_ports;
-
- /* Configuration. */
- struct switch_status *switch_status;
- struct status_category *ss_cat;
- struct in_band *in_band;
- struct discovery *discovery;
- struct fail_open *fail_open;
- struct pinsched *miss_sched, *action_sched;
- struct executer *executer;
- struct netflow *netflow;
-
- /* Flow table. */
- struct classifier cls;
- bool need_revalidate;
- long long int next_expiration;
- struct tag_set revalidate_set;
-
- /* OpenFlow connections. */
- struct list all_conns;
- struct ofconn *controller;
- struct pvconn **listeners;
- size_t n_listeners;
- struct pvconn **snoops;
- size_t n_snoops;
-
- /* Hooks for ovs-vswitchd. */
- const struct ofhooks *ofhooks;
- void *aux;
-
- /* Used by default ofhooks. */
- struct mac_learning *ml;
-};
-
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
-
-static const struct ofhooks default_ofhooks;
-
-static uint64_t pick_datapath_id(struct dpif *, uint64_t fallback_dpid);
-static uint64_t pick_fallback_dpid(void);
-static void send_packet_in_miss(struct ofpbuf *, void *ofproto);
-static void send_packet_in_action(struct ofpbuf *, void *ofproto);
-static void update_used(struct ofproto *);
-static void update_stats(struct ofproto *, struct rule *,
- const struct odp_flow_stats *);
-static void expire_rule(struct cls_rule *, void *ofproto);
-static void active_timeout(struct ofproto *ofproto, struct rule *rule);
-static bool revalidate_rule(struct ofproto *p, struct rule *rule);
-static void revalidate_cb(struct cls_rule *rule_, void *p_);
-
-static void handle_odp_msg(struct ofproto *, struct ofpbuf *);
-
-static void handle_openflow(struct ofconn *, struct ofproto *,
- struct ofpbuf *);
-
-static void refresh_port_group(struct ofproto *, unsigned int group);
-static void update_port(struct ofproto *, const char *devname);
-static int init_ports(struct ofproto *);
-static void reinit_ports(struct ofproto *);
-
-int
-ofproto_create(const char *datapath, const struct ofhooks *ofhooks, void *aux,
- struct ofproto **ofprotop)
-{
- struct dpifmon *dpifmon;
- struct odp_stats stats;
- struct ofproto *p;
- struct dpif dpif;
- int error;
-
- *ofprotop = NULL;
-
- /* Connect to datapath and start listening for messages. */
- error = dpif_open(datapath, &dpif);
- if (error) {
- VLOG_ERR("failed to open datapath %s: %s", datapath, strerror(error));
- return error;
- }
- error = dpif_get_dp_stats(&dpif, &stats);
- if (error) {
- VLOG_ERR("failed to obtain stats for datapath %s: %s",
- datapath, strerror(error));
- dpif_close(&dpif);
- return error;
- }
- error = dpif_set_listen_mask(&dpif, ODPL_MISS | ODPL_ACTION);
- if (error) {
- VLOG_ERR("failed to listen on datapath %s: %s",
- datapath, strerror(error));
- dpif_close(&dpif);
- return error;
- }
- dpif_flow_flush(&dpif);
- dpif_purge(&dpif);
-
- /* Start monitoring datapath ports for status changes. */
- error = dpifmon_create(datapath, &dpifmon);
- if (error) {
- VLOG_ERR("failed to starting monitoring datapath %s: %s",
- datapath, strerror(error));
- dpif_close(&dpif);
- return error;
- }
-
- /* Initialize settings. */
- p = xcalloc(1, sizeof *p);
- p->fallback_dpid = pick_fallback_dpid();
- p->datapath_id = pick_datapath_id(&dpif, p->fallback_dpid);
- VLOG_INFO("using datapath ID %012"PRIx64, p->datapath_id);
- p->manufacturer = xstrdup("Nicira Networks, Inc.");
- p->hardware = xstrdup("Reference Implementation");
- p->software = xstrdup(VERSION BUILDNR);
- p->serial = xstrdup("None");
-
- /* Initialize datapath. */
- p->dpif = dpif;
- p->dpifmon = dpifmon;
- port_array_init(&p->ports);
- shash_init(&p->port_by_name);
- p->max_ports = stats.max_ports;
-
- /* Initialize submodules. */
- p->switch_status = switch_status_create(p);
- p->in_band = NULL;
- p->discovery = NULL;
- p->fail_open = NULL;
- p->miss_sched = p->action_sched = NULL;
- p->executer = NULL;
- p->netflow = NULL;
-
- /* Initialize flow table. */
- classifier_init(&p->cls);
- p->need_revalidate = false;
- p->next_expiration = time_msec() + 1000;
- tag_set_init(&p->revalidate_set);
-
- /* Initialize OpenFlow connections. */
- list_init(&p->all_conns);
- p->controller = ofconn_create(p, rconn_create(5, 8));
- p->controller->pktbuf = pktbuf_create();
- p->controller->miss_send_len = OFP_DEFAULT_MISS_SEND_LEN;
- p->listeners = NULL;
- p->n_listeners = 0;
- p->snoops = NULL;
- p->n_snoops = 0;
-
- /* Initialize hooks. */
- if (ofhooks) {
- p->ofhooks = ofhooks;
- p->aux = aux;
- p->ml = NULL;
- } else {
- p->ofhooks = &default_ofhooks;
- p->aux = p;
- p->ml = mac_learning_create();
- }
-
- /* Register switch status category. */
- p->ss_cat = switch_status_register(p->switch_status, "remote",
- rconn_status_cb, p->controller->rconn);
-
- /* Almost done... */
- error = init_ports(p);
- if (error) {
- ofproto_destroy(p);
- return error;
- }
-
- *ofprotop = p;
- return 0;
-}
-
-void
-ofproto_set_datapath_id(struct ofproto *p, uint64_t datapath_id)
-{
- uint64_t old_dpid = p->datapath_id;
- p->datapath_id = (datapath_id
- ? datapath_id
- : pick_datapath_id(&p->dpif, p->fallback_dpid));
- if (p->datapath_id != old_dpid) {
- VLOG_INFO("datapath ID changed to %012"PRIx64, p->datapath_id);
- rconn_reconnect(p->controller->rconn);
- }
-}
-
-void
-ofproto_set_mgmt_id(struct ofproto *p, uint64_t mgmt_id)
-{
- p->mgmt_id = mgmt_id;
-}
-
-void
-ofproto_set_probe_interval(struct ofproto *p, int probe_interval)
-{
- probe_interval = probe_interval ? MAX(probe_interval, 5) : 0;
- rconn_set_probe_interval(p->controller->rconn, probe_interval);
- if (p->fail_open) {
- int trigger_duration = probe_interval ? probe_interval * 3 : 15;
- fail_open_set_trigger_duration(p->fail_open, trigger_duration);
- }
-}
-
-void
-ofproto_set_max_backoff(struct ofproto *p, int max_backoff)
-{
- rconn_set_max_backoff(p->controller->rconn, max_backoff);
-}
-
-void
-ofproto_set_desc(struct ofproto *p,
- const char *manufacturer, const char *hardware,
- const char *software, const char *serial)
-{
- if (manufacturer) {
- free(p->manufacturer);
- p->manufacturer = xstrdup(manufacturer);
- }
- if (hardware) {
- free(p->hardware);
- p->hardware = xstrdup(hardware);
- }
- if (software) {
- free(p->software);
- p->software = xstrdup(software);
- }
- if (serial) {
- free(p->serial);
- p->serial = xstrdup(serial);
- }
-}
-
-int
-ofproto_set_in_band(struct ofproto *p, bool in_band)
-{
- if (in_band != (p->in_band != NULL)) {
- if (in_band) {
- in_band_create(p, &p->dpif, p->switch_status,
- p->controller->rconn, &p->in_band);
- return 0;
- } else {
- ofproto_set_discovery(p, false, NULL, true);
- in_band_destroy(p->in_band);
- p->in_band = NULL;
- }
- rconn_reconnect(p->controller->rconn);
- }
- return 0;
-}
-
-int
-ofproto_set_discovery(struct ofproto *p, bool discovery,
- const char *re, bool update_resolv_conf)
-{
- if (discovery != (p->discovery != NULL)) {
- if (discovery) {
- int error = ofproto_set_in_band(p, true);
- if (error) {
- return error;
- }
- error = discovery_create(re, update_resolv_conf,
- &p->dpif, p->switch_status,
- &p->discovery);
- if (error) {
- return error;
- }
- } else {
- discovery_destroy(p->discovery);
- p->discovery = NULL;
- }
- rconn_disconnect(p->controller->rconn);
- } else if (discovery) {
- discovery_set_update_resolv_conf(p->discovery, update_resolv_conf);
- return discovery_set_accept_controller_re(p->discovery, re);
- }
- return 0;
-}
-
-int
-ofproto_set_controller(struct ofproto *ofproto, const char *controller)
-{
- if (ofproto->discovery) {
- return EINVAL;
- } else if (controller) {
- if (strcmp(rconn_get_name(ofproto->controller->rconn), controller)) {
- return rconn_connect(ofproto->controller->rconn, controller);
- } else {
- return 0;
- }
- } else {
- rconn_disconnect(ofproto->controller->rconn);
- return 0;
- }
-}
-
-static int
-set_pvconns(struct pvconn ***pvconnsp, size_t *n_pvconnsp,
- const struct svec *svec)
-{
- struct pvconn **pvconns = *pvconnsp;
- size_t n_pvconns = *n_pvconnsp;
- int retval = 0;
- size_t i;
-
- for (i = 0; i < n_pvconns; i++) {
- pvconn_close(pvconns[i]);
- }
- free(pvconns);
-
- pvconns = xmalloc(svec->n * sizeof *pvconns);
- n_pvconns = 0;
- for (i = 0; i < svec->n; i++) {
- const char *name = svec->names[i];
- struct pvconn *pvconn;
- int error;
-
- error = pvconn_open(name, &pvconn);
- if (!error) {
- pvconns[n_pvconns++] = pvconn;
- } else {
- VLOG_ERR("failed to listen on %s: %s", name, strerror(error));
- if (!retval) {
- retval = error;
- }
- }
- }
-
- *pvconnsp = pvconns;
- *n_pvconnsp = n_pvconns;
-
- return retval;
-}
-
-int
-ofproto_set_listeners(struct ofproto *ofproto, const struct svec *listeners)
-{
- return set_pvconns(&ofproto->listeners, &ofproto->n_listeners, listeners);
-}
-
-int
-ofproto_set_snoops(struct ofproto *ofproto, const struct svec *snoops)
-{
- return set_pvconns(&ofproto->snoops, &ofproto->n_snoops, snoops);
-}
-
-int
-ofproto_set_netflow(struct ofproto *ofproto,
- const struct netflow_options *nf_options)
-{
- if (nf_options->collectors.n) {
- if (!ofproto->netflow) {
- ofproto->netflow = netflow_create();
- }
- return netflow_set_options(ofproto->netflow, nf_options);
- } else {
- netflow_destroy(ofproto->netflow);
- ofproto->netflow = NULL;
- return 0;
- }
-}
-
-void
-ofproto_set_failure(struct ofproto *ofproto, bool fail_open)
-{
- if (fail_open) {
- struct rconn *rconn = ofproto->controller->rconn;
- int trigger_duration = rconn_get_probe_interval(rconn) * 3;
- if (!ofproto->fail_open) {
- ofproto->fail_open = fail_open_create(ofproto, trigger_duration,
- ofproto->switch_status,
- rconn);
- } else {
- fail_open_set_trigger_duration(ofproto->fail_open,
- trigger_duration);
- }
- } else {
- fail_open_destroy(ofproto->fail_open);
- ofproto->fail_open = NULL;
- }
-}
-
-void
-ofproto_set_rate_limit(struct ofproto *ofproto,
- int rate_limit, int burst_limit)
-{
- if (rate_limit > 0) {
- if (!ofproto->miss_sched) {
- ofproto->miss_sched = pinsched_create(rate_limit, burst_limit,
- ofproto->switch_status);
- ofproto->action_sched = pinsched_create(rate_limit, burst_limit,
- NULL);
- } else {
- pinsched_set_limits(ofproto->miss_sched, rate_limit, burst_limit);
- pinsched_set_limits(ofproto->action_sched,
- rate_limit, burst_limit);
- }
- } else {
- pinsched_destroy(ofproto->miss_sched);
- ofproto->miss_sched = NULL;
- pinsched_destroy(ofproto->action_sched);
- ofproto->action_sched = NULL;
- }
-}
-
-int
-ofproto_set_stp(struct ofproto *ofproto UNUSED, bool enable_stp)
-{
- /* XXX */
- if (enable_stp) {
- VLOG_WARN("STP is not yet implemented");
- return EINVAL;
- } else {
- return 0;
- }
-}
-
-int
-ofproto_set_remote_execution(struct ofproto *ofproto, const char *command_acl,
- const char *command_dir)
-{
- if (command_acl) {
- if (!ofproto->executer) {
- return executer_create(command_acl, command_dir,
- &ofproto->executer);
- } else {
- executer_set_acl(ofproto->executer, command_acl, command_dir);
- }
- } else {
- executer_destroy(ofproto->executer);
- ofproto->executer = NULL;
- }
- return 0;
-}
-
-uint64_t
-ofproto_get_datapath_id(const struct ofproto *ofproto)
-{
- return ofproto->datapath_id;
-}
-
-uint64_t
-ofproto_get_mgmt_id(const struct ofproto *ofproto)
-{
- return ofproto->mgmt_id;
-}
-
-int
-ofproto_get_probe_interval(const struct ofproto *ofproto)
-{
- return rconn_get_probe_interval(ofproto->controller->rconn);
-}
-
-int
-ofproto_get_max_backoff(const struct ofproto *ofproto)
-{
- return rconn_get_max_backoff(ofproto->controller->rconn);
-}
-
-bool
-ofproto_get_in_band(const struct ofproto *ofproto)
-{
- return ofproto->in_band != NULL;
-}
-
-bool
-ofproto_get_discovery(const struct ofproto *ofproto)
-{
- return ofproto->discovery != NULL;
-}
-
-const char *
-ofproto_get_controller(const struct ofproto *ofproto)
-{
- return rconn_get_name(ofproto->controller->rconn);
-}
-
-void
-ofproto_get_listeners(const struct ofproto *ofproto, struct svec *listeners)
-{
- size_t i;
-
- for (i = 0; i < ofproto->n_listeners; i++) {
- svec_add(listeners, pvconn_get_name(ofproto->listeners[i]));
- }
-}
-
-void
-ofproto_get_snoops(const struct ofproto *ofproto, struct svec *snoops)
-{
- size_t i;
-
- for (i = 0; i < ofproto->n_snoops; i++) {
- svec_add(snoops, pvconn_get_name(ofproto->snoops[i]));
- }
-}
-
-void
-ofproto_destroy(struct ofproto *p)
-{
- struct ofconn *ofconn, *next_ofconn;
- struct ofport *ofport;
- unsigned int port_no;
- size_t i;
-
- if (!p) {
- return;
- }
-
- ofproto_flush_flows(p);
- classifier_destroy(&p->cls);
-
- LIST_FOR_EACH_SAFE (ofconn, next_ofconn, struct ofconn, node,
- &p->all_conns) {
- ofconn_destroy(ofconn, p);
- }
-
- dpif_close(&p->dpif);
- dpifmon_destroy(p->dpifmon);
- PORT_ARRAY_FOR_EACH (ofport, &p->ports, port_no) {
- ofport_free(ofport);
- }
- shash_destroy(&p->port_by_name);
-
- switch_status_destroy(p->switch_status);
- in_band_destroy(p->in_band);
- discovery_destroy(p->discovery);
- fail_open_destroy(p->fail_open);
- pinsched_destroy(p->miss_sched);
- pinsched_destroy(p->action_sched);
- executer_destroy(p->executer);
- netflow_destroy(p->netflow);
-
- switch_status_unregister(p->ss_cat);
-
- for (i = 0; i < p->n_listeners; i++) {
- pvconn_close(p->listeners[i]);
- }
- free(p->listeners);
-
- for (i = 0; i < p->n_snoops; i++) {
- pvconn_close(p->snoops[i]);
- }
- free(p->snoops);
-
- mac_learning_destroy(p->ml);
-
- free(p);
-}
-
-int
-ofproto_run(struct ofproto *p)
-{
- int error = ofproto_run1(p);
- if (!error) {
- error = ofproto_run2(p, false);
- }
- return error;
-}
-
-int
-ofproto_run1(struct ofproto *p)
-{
- struct ofconn *ofconn, *next_ofconn;
- char *devname;
- int error;
- int i;
-
- for (i = 0; i < 50; i++) {
- struct ofpbuf *buf;
- int error;
-
- error = dpif_recv(&p->dpif, &buf);
- if (error) {
- if (error == ENODEV) {
- /* Someone destroyed the datapath behind our back. The caller
- * better destroy us and give up, because we're just going to
- * spin from here on out. */
- static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
- VLOG_ERR_RL(&rl, "dp%u: datapath was destroyed externally",
- dpif_id(&p->dpif));
- return ENODEV;
- }
- break;
- }
-
- handle_odp_msg(p, buf);
- }
-
- while ((error = dpifmon_poll(p->dpifmon, &devname)) != EAGAIN) {
- if (error == ENOBUFS) {
- reinit_ports(p);
- } else if (!error) {
- update_port(p, devname);
- free(devname);
- }
- }
-
- if (p->in_band) {
- in_band_run(p->in_band);
- }
- if (p->discovery) {
- char *controller_name;
- if (rconn_is_connectivity_questionable(p->controller->rconn)) {
- discovery_question_connectivity(p->discovery);
- }
- if (discovery_run(p->discovery, &controller_name)) {
- if (controller_name) {
- rconn_connect(p->controller->rconn, controller_name);
- } else {
- rconn_disconnect(p->controller->rconn);
- }
- }
- }
- pinsched_run(p->miss_sched, send_packet_in_miss, p);
- pinsched_run(p->action_sched, send_packet_in_action, p);
- if (p->executer) {
- executer_run(p->executer);
- }
-
- LIST_FOR_EACH_SAFE (ofconn, next_ofconn, struct ofconn, node,
- &p->all_conns) {
- ofconn_run(ofconn, p);
- }
-
- /* Fail-open maintenance. Do this after processing the ofconns since
- * fail-open checks the status of the controller rconn. */
- if (p->fail_open) {
- fail_open_run(p->fail_open);
- }
-
- for (i = 0; i < p->n_listeners; i++) {
- struct vconn *vconn;
- int retval;
-
- retval = pvconn_accept(p->listeners[i], OFP_VERSION, &vconn);
- if (!retval) {
- ofconn_create(p, rconn_new_from_vconn("passive", vconn));
- } else if (retval != EAGAIN) {
- VLOG_WARN_RL(&rl, "accept failed (%s)", strerror(retval));
- }
- }
-
- for (i = 0; i < p->n_snoops; i++) {
- struct vconn *vconn;
- int retval;
-
- retval = pvconn_accept(p->snoops[i], OFP_VERSION, &vconn);
- if (!retval) {
- rconn_add_monitor(p->controller->rconn, vconn);
- } else if (retval != EAGAIN) {
- VLOG_WARN_RL(&rl, "accept failed (%s)", strerror(retval));
- }
- }
-
- if (time_msec() >= p->next_expiration) {
- COVERAGE_INC(ofproto_expiration);
- p->next_expiration = time_msec() + 1000;
- update_used(p);
-
- classifier_for_each(&p->cls, CLS_INC_ALL, expire_rule, p);
-
- /* Let the hook know that we're at a stable point: all outstanding data
- * in existing flows has been accounted to the account_cb. Thus, the
- * hook can now reasonably do operations that depend on having accurate
- * flow volume accounting (currently, that's just bond rebalancing). */
- if (p->ofhooks->account_checkpoint_cb) {
- p->ofhooks->account_checkpoint_cb(p->aux);
- }
- }
-
- if (p->netflow) {
- netflow_run(p->netflow);
- }
-
- return 0;
-}
-
-struct revalidate_cbdata {
- struct ofproto *ofproto;
- bool revalidate_all; /* Revalidate all exact-match rules? */
- bool revalidate_subrules; /* Revalidate all exact-match subrules? */
- struct tag_set revalidate_set; /* Set of tags to revalidate. */
-};
-
-int
-ofproto_run2(struct ofproto *p, bool revalidate_all)
-{
- if (p->need_revalidate || revalidate_all
- || !tag_set_is_empty(&p->revalidate_set)) {
- struct revalidate_cbdata cbdata;
- cbdata.ofproto = p;
- cbdata.revalidate_all = revalidate_all;
- cbdata.revalidate_subrules = p->need_revalidate;
- cbdata.revalidate_set = p->revalidate_set;
- tag_set_init(&p->revalidate_set);
- COVERAGE_INC(ofproto_revalidate);
- classifier_for_each(&p->cls, CLS_INC_EXACT, revalidate_cb, &cbdata);
- p->need_revalidate = false;
- }
-
- return 0;
-}
-
-void
-ofproto_wait(struct ofproto *p)
-{
- struct ofconn *ofconn;
- size_t i;
-
- dpif_recv_wait(&p->dpif);
- dpifmon_wait(p->dpifmon);
- LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
- ofconn_wait(ofconn);
- }
- if (p->in_band) {
- in_band_wait(p->in_band);
- }
- if (p->discovery) {
- discovery_wait(p->discovery);
- }
- if (p->fail_open) {
- fail_open_wait(p->fail_open);
- }
- pinsched_wait(p->miss_sched);
- pinsched_wait(p->action_sched);
- if (p->executer) {
- executer_wait(p->executer);
- }
- if (!tag_set_is_empty(&p->revalidate_set)) {
- poll_immediate_wake();
- }
- if (p->need_revalidate) {
- /* Shouldn't happen, but if it does just go around again. */
- VLOG_DBG_RL(&rl, "need revalidate in ofproto_wait_cb()");
- poll_immediate_wake();
- } else if (p->next_expiration != LLONG_MAX) {
- poll_timer_wait(p->next_expiration - time_msec());
- }
- for (i = 0; i < p->n_listeners; i++) {
- pvconn_wait(p->listeners[i]);
- }
- for (i = 0; i < p->n_snoops; i++) {
- pvconn_wait(p->snoops[i]);
- }
-}
-
-void
-ofproto_revalidate(struct ofproto *ofproto, tag_type tag)
-{
- tag_set_add(&ofproto->revalidate_set, tag);
-}
-
-struct tag_set *
-ofproto_get_revalidate_set(struct ofproto *ofproto)
-{
- return &ofproto->revalidate_set;
-}
-
-bool
-ofproto_is_alive(const struct ofproto *p)
-{
- return p->discovery || rconn_is_alive(p->controller->rconn);
-}
-
-int
-ofproto_send_packet(struct ofproto *p, const flow_t *flow,
- const union ofp_action *actions, size_t n_actions,
- const struct ofpbuf *packet)
-{
- struct odp_actions odp_actions;
- int error;
-
- error = xlate_actions(actions, n_actions, flow, p, packet, &odp_actions,
- NULL, NULL, NULL);
- if (error) {
- return error;
- }
-
- /* XXX Should we translate the dpif_execute() errno value into an OpenFlow
- * error code? */
- dpif_execute(&p->dpif, flow->in_port, odp_actions.actions,
- odp_actions.n_actions, packet);
- return 0;
-}
-
-void
-ofproto_add_flow(struct ofproto *p,
- const flow_t *flow, uint32_t wildcards, unsigned int priority,
- const union ofp_action *actions, size_t n_actions,
- int idle_timeout)
-{
- struct rule *rule;
- rule = rule_create(p, NULL, actions, n_actions,
- idle_timeout >= 0 ? idle_timeout : 5 /* XXX */, 0);
- cls_rule_from_flow(&rule->cr, flow, wildcards, priority);
- rule_insert(p, rule, NULL, 0);
-}
-
-void
-ofproto_delete_flow(struct ofproto *ofproto, const flow_t *flow,
- uint32_t wildcards, unsigned int priority)
-{
- struct rule *rule;
-
- rule = rule_from_cls_rule(classifier_find_rule_exactly(&ofproto->cls,
- flow, wildcards,
- priority));
- if (rule) {
- rule_remove(ofproto, rule);
- }
-}
-
-static void
-destroy_rule(struct cls_rule *rule_, void *ofproto_)
-{
- struct rule *rule = rule_from_cls_rule(rule_);
- struct ofproto *ofproto = ofproto_;
-
- /* Mark the flow as not installed, even though it might really be
- * installed, so that rule_remove() doesn't bother trying to uninstall it.
- * There is no point in uninstalling it individually since we are about to
- * blow away all the flows with dpif_flow_flush(). */
- rule->installed = false;
-
- rule_remove(ofproto, rule);
-}
-
-void
-ofproto_flush_flows(struct ofproto *ofproto)
-{
- COVERAGE_INC(ofproto_flush);
- classifier_for_each(&ofproto->cls, CLS_INC_ALL, destroy_rule, ofproto);
- dpif_flow_flush(&ofproto->dpif);
- if (ofproto->in_band) {
- in_band_flushed(ofproto->in_band);
- }
- if (ofproto->fail_open) {
- fail_open_flushed(ofproto->fail_open);
- }
-}
-\f
-static void
-reinit_ports(struct ofproto *p)
-{
- struct svec devnames;
- struct ofport *ofport;
- unsigned int port_no;
- struct odp_port *odp_ports;
- size_t n_odp_ports;
- size_t i;
-
- svec_init(&devnames);
- PORT_ARRAY_FOR_EACH (ofport, &p->ports, port_no) {
- svec_add (&devnames, (char *) ofport->opp.name);
- }
- dpif_port_list(&p->dpif, &odp_ports, &n_odp_ports);
- for (i = 0; i < n_odp_ports; i++) {
- svec_add (&devnames, odp_ports[i].devname);
- }
- free(odp_ports);
-
- svec_sort_unique(&devnames);
- for (i = 0; i < devnames.n; i++) {
- update_port(p, devnames.names[i]);
- }
- svec_destroy(&devnames);
-}
-
-static void
-refresh_port_group(struct ofproto *p, unsigned int group)
-{
- uint16_t *ports;
- size_t n_ports;
- struct ofport *port;
- unsigned int port_no;
-
- assert(group == DP_GROUP_ALL || group == DP_GROUP_FLOOD);
-
- ports = xmalloc(port_array_count(&p->ports) * sizeof *ports);
- n_ports = 0;
- PORT_ARRAY_FOR_EACH (port, &p->ports, port_no) {
- if (group == DP_GROUP_ALL || !(port->opp.config & OFPPC_NO_FLOOD)) {
- ports[n_ports++] = port_no;
- }
- }
- dpif_port_group_set(&p->dpif, group, ports, n_ports);
- free(ports);
-}
-
-static void
-refresh_port_groups(struct ofproto *p)
-{
- refresh_port_group(p, DP_GROUP_FLOOD);
- refresh_port_group(p, DP_GROUP_ALL);
-}
-
-static struct ofport *
-make_ofport(const struct odp_port *odp_port)
-{
- enum netdev_flags flags;
- struct ofport *ofport;
- struct netdev *netdev;
- bool carrier;
- int error;
-
- error = netdev_open(odp_port->devname, NETDEV_ETH_TYPE_NONE, &netdev);
- if (error) {
- VLOG_WARN_RL(&rl, "ignoring port %s (%"PRIu16") because netdev %s "
- "cannot be opened (%s)",
- odp_port->devname, odp_port->port,
- odp_port->devname, strerror(error));
- return NULL;
- }
-
- ofport = xmalloc(sizeof *ofport);
- ofport->netdev = netdev;
- ofport->opp.port_no = odp_port_to_ofp_port(odp_port->port);
- memcpy(ofport->opp.hw_addr, netdev_get_etheraddr(netdev), ETH_ALEN);
- memcpy(ofport->opp.name, odp_port->devname,
- MIN(sizeof ofport->opp.name, sizeof odp_port->devname));
- ofport->opp.name[sizeof ofport->opp.name - 1] = '\0';
-
- netdev_get_flags(netdev, &flags);
- ofport->opp.config = flags & NETDEV_UP ? 0 : OFPPC_PORT_DOWN;
-
- netdev_get_carrier(netdev, &carrier);
- ofport->opp.state = carrier ? 0 : OFPPS_LINK_DOWN;
-
- netdev_get_features(netdev,
- &ofport->opp.curr, &ofport->opp.advertised,
- &ofport->opp.supported, &ofport->opp.peer);
- return ofport;
-}
-
-static bool
-ofport_conflicts(const struct ofproto *p, const struct odp_port *odp_port)
-{
- if (port_array_get(&p->ports, odp_port->port)) {
- VLOG_WARN_RL(&rl, "ignoring duplicate port %"PRIu16" in datapath",
- odp_port->port);
- return true;
- } else if (shash_find(&p->port_by_name, odp_port->devname)) {
- VLOG_WARN_RL(&rl, "ignoring duplicate device %s in datapath",
- odp_port->devname);
- return true;
- } else {
- return false;
- }
-}
-
-static int
-ofport_equal(const struct ofport *a_, const struct ofport *b_)
-{
- const struct ofp_phy_port *a = &a_->opp;
- const struct ofp_phy_port *b = &b_->opp;
-
- BUILD_ASSERT_DECL(sizeof *a == 48); /* Detect ofp_phy_port changes. */
- return (a->port_no == b->port_no
- && !memcmp(a->hw_addr, b->hw_addr, sizeof a->hw_addr)
- && !strcmp((char *) a->name, (char *) b->name)
- && a->state == b->state
- && a->config == b->config
- && a->curr == b->curr
- && a->advertised == b->advertised
- && a->supported == b->supported
- && a->peer == b->peer);
-}
-
-static void
-send_port_status(struct ofproto *p, const struct ofport *ofport,
- uint8_t reason)
-{
- /* XXX Should limit the number of queued port status change messages. */
- struct ofconn *ofconn;
- LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
- struct ofp_port_status *ops;
- struct ofpbuf *b;
-
- ops = make_openflow_xid(sizeof *ops, OFPT_PORT_STATUS, 0, &b);
- ops->reason = reason;
- ops->desc = ofport->opp;
- hton_ofp_phy_port(&ops->desc);
- queue_tx(b, ofconn, NULL);
- }
- if (p->ofhooks->port_changed_cb) {
- p->ofhooks->port_changed_cb(reason, &ofport->opp, p->aux);
- }
-}
-
-static void
-ofport_install(struct ofproto *p, struct ofport *ofport)
-{
- port_array_set(&p->ports, ofp_port_to_odp_port(ofport->opp.port_no),
- ofport);
- shash_add(&p->port_by_name, (char *) ofport->opp.name, ofport);
-}
-
-static void
-ofport_remove(struct ofproto *p, struct ofport *ofport)
-{
- port_array_set(&p->ports, ofp_port_to_odp_port(ofport->opp.port_no), NULL);
- shash_delete(&p->port_by_name,
- shash_find(&p->port_by_name, (char *) ofport->opp.name));
-}
-
-static void
-ofport_free(struct ofport *ofport)
-{
- if (ofport) {
- netdev_close(ofport->netdev);
- free(ofport);
- }
-}
-
-static void
-update_port(struct ofproto *p, const char *devname)
-{
- struct odp_port odp_port;
- struct ofport *old_ofport;
- struct ofport *new_ofport;
- int error;
-
- COVERAGE_INC(ofproto_update_port);
-
- /* Query the datapath for port information. */
- error = dpif_port_query_by_name(&p->dpif, devname, &odp_port);
-
- /* Find the old ofport. */
- old_ofport = shash_find_data(&p->port_by_name, devname);
- if (!error) {
- if (!old_ofport) {
- /* There's no port named 'devname' but there might be a port with
- * the same port number. This could happen if a port is deleted
- * and then a new one added in its place very quickly, or if a port
- * is renamed. In the former case we want to send an OFPPR_DELETE
- * and an OFPPR_ADD, and in the latter case we want to send a
- * single OFPPR_MODIFY. We can distinguish the cases by comparing
- * the old port's ifindex against the new port, or perhaps less
- * reliably but more portably by comparing the old port's MAC
- * against the new port's MAC. However, this code isn't that smart
- * and always sends an OFPPR_MODIFY (XXX). */
- old_ofport = port_array_get(&p->ports, odp_port.port);
- }
- } else if (error != ENOENT && error != ENODEV) {
- VLOG_WARN_RL(&rl, "dpif_port_query_by_name returned unexpected error "
- "%s", strerror(error));
- return;
- }
-
- /* Create a new ofport. */
- new_ofport = !error ? make_ofport(&odp_port) : NULL;
-
- /* Eliminate a few pathological cases. */
- if (!old_ofport && !new_ofport) {
- return;
- } else if (old_ofport && new_ofport) {
- /* Most of the 'config' bits are OpenFlow soft state, but
- * OFPPC_PORT_DOWN is maintained the kernel. So transfer the OpenFlow
- * bits from old_ofport. (make_ofport() only sets OFPPC_PORT_DOWN and
- * leaves the other bits 0.) */
- new_ofport->opp.config |= old_ofport->opp.config & ~OFPPC_PORT_DOWN;
-
- if (ofport_equal(old_ofport, new_ofport)) {
- /* False alarm--no change. */
- ofport_free(new_ofport);
- return;
- }
- }
-
- /* Now deal with the normal cases. */
- if (old_ofport) {
- ofport_remove(p, old_ofport);
- }
- if (new_ofport) {
- ofport_install(p, new_ofport);
- }
- send_port_status(p, new_ofport ? new_ofport : old_ofport,
- (!old_ofport ? OFPPR_ADD
- : !new_ofport ? OFPPR_DELETE
- : OFPPR_MODIFY));
- ofport_free(old_ofport);
-
- /* Update port groups. */
- refresh_port_groups(p);
-}
-
-static int
-init_ports(struct ofproto *p)
-{
- struct odp_port *ports;
- size_t n_ports;
- size_t i;
- int error;
-
- error = dpif_port_list(&p->dpif, &ports, &n_ports);
- if (error) {
- return error;
- }
-
- for (i = 0; i < n_ports; i++) {
- const struct odp_port *odp_port = &ports[i];
- if (!ofport_conflicts(p, odp_port)) {
- struct ofport *ofport = make_ofport(odp_port);
- if (ofport) {
- ofport_install(p, ofport);
- }
- }
- }
- free(ports);
- refresh_port_groups(p);
- return 0;
-}
-\f
-static struct ofconn *
-ofconn_create(struct ofproto *p, struct rconn *rconn)
-{
- struct ofconn *ofconn = xmalloc(sizeof *ofconn);
- list_push_back(&p->all_conns, &ofconn->node);
- ofconn->rconn = rconn;
- ofconn->pktbuf = NULL;
- ofconn->send_flow_exp = false;
- ofconn->miss_send_len = 0;
- ofconn->packet_in_counter = rconn_packet_counter_create ();
- ofconn->reply_counter = rconn_packet_counter_create ();
- return ofconn;
-}
-
-static void
-ofconn_destroy(struct ofconn *ofconn, struct ofproto *p)
-{
- if (p->executer) {
- executer_rconn_closing(p->executer, ofconn->rconn);
- }
-
- list_remove(&ofconn->node);
- rconn_destroy(ofconn->rconn);
- rconn_packet_counter_destroy(ofconn->packet_in_counter);
- rconn_packet_counter_destroy(ofconn->reply_counter);
- pktbuf_destroy(ofconn->pktbuf);
- free(ofconn);
-}
-
-static void
-ofconn_run(struct ofconn *ofconn, struct ofproto *p)
-{
- int iteration;
-
- rconn_run(ofconn->rconn);
-
- if (rconn_packet_counter_read (ofconn->reply_counter) < OFCONN_REPLY_MAX) {
- /* Limit the number of iterations to prevent other tasks from
- * starving. */
- for (iteration = 0; iteration < 50; iteration++) {
- struct ofpbuf *of_msg = rconn_recv(ofconn->rconn);
- if (!of_msg) {
- break;
- }
- if (p->fail_open) {
- fail_open_maybe_recover(p->fail_open);
- }
- handle_openflow(ofconn, p, of_msg);
- ofpbuf_delete(of_msg);
- }
- }
-
- if (ofconn != p->controller && !rconn_is_alive(ofconn->rconn)) {
- ofconn_destroy(ofconn, p);
- }
-}
-
-static void
-ofconn_wait(struct ofconn *ofconn)
-{
- rconn_run_wait(ofconn->rconn);
- if (rconn_packet_counter_read (ofconn->reply_counter) < OFCONN_REPLY_MAX) {
- rconn_recv_wait(ofconn->rconn);
- } else {
- COVERAGE_INC(ofproto_ofconn_stuck);
- }
-}
-\f
-/* Caller is responsible for initializing the 'cr' member of the returned
- * rule. */
-static struct rule *
-rule_create(struct ofproto *ofproto, struct rule *super,
- const union ofp_action *actions, size_t n_actions,
- uint16_t idle_timeout, uint16_t hard_timeout)
-{
- struct rule *rule = xcalloc(1, sizeof *rule);
- rule->idle_timeout = idle_timeout;
- rule->hard_timeout = hard_timeout;
- rule->used = rule->created = time_msec();
- rule->super = super;
- if (super) {
- list_push_back(&super->list, &rule->list);
- } else {
- list_init(&rule->list);
- }
- rule->n_actions = n_actions;
- rule->actions = xmemdup(actions, n_actions * sizeof *actions);
- netflow_flow_clear(&rule->nf_flow);
- netflow_flow_update_time(ofproto->netflow, &rule->nf_flow, rule->created);
-
- return rule;
-}
-
-static struct rule *
-rule_from_cls_rule(const struct cls_rule *cls_rule)
-{
- return cls_rule ? CONTAINER_OF(cls_rule, struct rule, cr) : NULL;
-}
-
-static void
-rule_free(struct rule *rule)
-{
- free(rule->actions);
- free(rule->odp_actions);
- free(rule);
-}
-
-/* Destroys 'rule'. If 'rule' is a subrule, also removes it from its
- * super-rule's list of subrules. If 'rule' is a super-rule, also iterates
- * through all of its subrules and revalidates them, destroying any that no
- * longer has a super-rule (which is probably all of them).
- *
- * Before calling this function, the caller must make have removed 'rule' from
- * the classifier. If 'rule' is an exact-match rule, the caller is also
- * responsible for ensuring that it has been uninstalled from the datapath. */
-static void
-rule_destroy(struct ofproto *ofproto, struct rule *rule)
-{
- if (!rule->super) {
- struct rule *subrule, *next;
- LIST_FOR_EACH_SAFE (subrule, next, struct rule, list, &rule->list) {
- revalidate_rule(ofproto, subrule);
- }
- } else {
- list_remove(&rule->list);
- }
- rule_free(rule);
-}
-
-static bool
-rule_has_out_port(const struct rule *rule, uint16_t out_port)
-{
- const union ofp_action *oa;
- struct actions_iterator i;
-
- if (out_port == htons(OFPP_NONE)) {
- return true;
- }
- for (oa = actions_first(&i, rule->actions, rule->n_actions); oa;
- oa = actions_next(&i)) {
- if (oa->type == htons(OFPAT_OUTPUT) && oa->output.port == out_port) {
- return true;
- }
- }
- return false;
-}
-
-/* Executes the actions indicated by 'rule' on 'packet', which is in flow
- * 'flow' and is considered to have arrived on ODP port 'in_port'.
- *
- * The flow that 'packet' actually contains does not need to actually match
- * 'rule'; the actions in 'rule' will be applied to it either way. Likewise,
- * the packet and byte counters for 'rule' will be credited for the packet sent
- * out whether or not the packet actually matches 'rule'.
- *
- * If 'rule' is an exact-match rule and 'flow' actually equals the rule's flow,
- * the caller must already have accurately composed ODP actions for it given
- * 'packet' using rule_make_actions(). If 'rule' is a wildcard rule, or if
- * 'rule' is an exact-match rule but 'flow' is not the rule's flow, then this
- * function will compose a set of ODP actions based on 'rule''s OpenFlow
- * actions and apply them to 'packet'. */
-static void
-rule_execute(struct ofproto *ofproto, struct rule *rule,
- struct ofpbuf *packet, const flow_t *flow)
-{
- const union odp_action *actions;
- size_t n_actions;
- struct odp_actions a;
-
- /* Grab or compose the ODP actions.
- *
- * The special case for an exact-match 'rule' where 'flow' is not the
- * rule's flow is important to avoid, e.g., sending a packet out its input
- * port simply because the ODP actions were composed for the wrong
- * scenario. */
- if (rule->cr.wc.wildcards || !flow_equal(flow, &rule->cr.flow)) {
- struct rule *super = rule->super ? rule->super : rule;
- if (xlate_actions(super->actions, super->n_actions, flow, ofproto,
- packet, &a, NULL, 0, NULL)) {
- return;
- }
- actions = a.actions;
- n_actions = a.n_actions;
- } else {
- actions = rule->odp_actions;
- n_actions = rule->n_odp_actions;
- }
-
- /* Execute the ODP actions. */
- if (!dpif_execute(&ofproto->dpif, flow->in_port,
- actions, n_actions, packet)) {
- struct odp_flow_stats stats;
- flow_extract_stats(flow, packet, &stats);
- update_stats(ofproto, rule, &stats);
- rule->used = time_msec();
- netflow_flow_update_time(ofproto->netflow, &rule->nf_flow, rule->used);
- }
-}
-
-static void
-rule_insert(struct ofproto *p, struct rule *rule, struct ofpbuf *packet,
- uint16_t in_port)
-{
- struct rule *displaced_rule;
-
- /* Insert the rule in the classifier. */
- displaced_rule = rule_from_cls_rule(classifier_insert(&p->cls, &rule->cr));
- if (!rule->cr.wc.wildcards) {
- rule_make_actions(p, rule, packet);
- }
-
- /* Send the packet and credit it to the rule. */
- if (packet) {
- flow_t flow;
- flow_extract(packet, in_port, &flow);
- rule_execute(p, rule, packet, &flow);
- }
-
- /* Install the rule in the datapath only after sending the packet, to
- * avoid packet reordering. */
- if (rule->cr.wc.wildcards) {
- COVERAGE_INC(ofproto_add_wc_flow);
- p->need_revalidate = true;
- } else {
- rule_install(p, rule, displaced_rule);
- }
-
- /* Free the rule that was displaced, if any. */
- if (displaced_rule) {
- rule_destroy(p, displaced_rule);
- }
-}
-
-static struct rule *
-rule_create_subrule(struct ofproto *ofproto, struct rule *rule,
- const flow_t *flow)
-{
- struct rule *subrule = rule_create(ofproto, rule, NULL, 0,
- rule->idle_timeout, rule->hard_timeout);
- COVERAGE_INC(ofproto_subrule_create);
- cls_rule_from_flow(&subrule->cr, flow, 0,
- (rule->cr.priority <= UINT16_MAX ? UINT16_MAX
- : rule->cr.priority));
- classifier_insert_exact(&ofproto->cls, &subrule->cr);
-
- return subrule;
-}
-
-static void
-rule_remove(struct ofproto *ofproto, struct rule *rule)
-{
- if (rule->cr.wc.wildcards) {
- COVERAGE_INC(ofproto_del_wc_flow);
- ofproto->need_revalidate = true;
- } else {
- rule_uninstall(ofproto, rule);
- }
- classifier_remove(&ofproto->cls, &rule->cr);
- rule_destroy(ofproto, rule);
-}
-
-/* Returns true if the actions changed, false otherwise. */
-static bool
-rule_make_actions(struct ofproto *p, struct rule *rule,
- const struct ofpbuf *packet)
-{
- const struct rule *super;
- struct odp_actions a;
- size_t actions_len;
-
- assert(!rule->cr.wc.wildcards);
-
- super = rule->super ? rule->super : rule;
- rule->tags = 0;
- xlate_actions(super->actions, super->n_actions, &rule->cr.flow, p,
- packet, &a, &rule->tags, &rule->may_install,
- &rule->nf_flow.output_iface);
-
- actions_len = a.n_actions * sizeof *a.actions;
- if (rule->n_odp_actions != a.n_actions
- || memcmp(rule->odp_actions, a.actions, actions_len)) {
- COVERAGE_INC(ofproto_odp_unchanged);
- free(rule->odp_actions);
- rule->n_odp_actions = a.n_actions;
- rule->odp_actions = xmemdup(a.actions, actions_len);
- return true;
- } else {
- return false;
- }
-}
-
-static int
-do_put_flow(struct ofproto *ofproto, struct rule *rule, int flags,
- struct odp_flow_put *put)
-{
- memset(&put->flow.stats, 0, sizeof put->flow.stats);
- put->flow.key = rule->cr.flow;
- put->flow.actions = rule->odp_actions;
- put->flow.n_actions = rule->n_odp_actions;
- put->flags = flags;
- return dpif_flow_put(&ofproto->dpif, put);
-}
-
-static void
-rule_install(struct ofproto *p, struct rule *rule, struct rule *displaced_rule)
-{
- assert(!rule->cr.wc.wildcards);
-
- if (rule->may_install) {
- struct odp_flow_put put;
- if (!do_put_flow(p, rule,
- ODPPF_CREATE | ODPPF_MODIFY | ODPPF_ZERO_STATS,
- &put)) {
- rule->installed = true;
- if (displaced_rule) {
- update_stats(p, rule, &put.flow.stats);
- rule_post_uninstall(p, displaced_rule);
- }
- }
- } else if (displaced_rule) {
- rule_uninstall(p, displaced_rule);
- }
-}
-
-static void
-rule_reinstall(struct ofproto *ofproto, struct rule *rule)
-{
- if (rule->installed) {
- struct odp_flow_put put;
- COVERAGE_INC(ofproto_dp_missed);
- do_put_flow(ofproto, rule, ODPPF_CREATE | ODPPF_MODIFY, &put);
- } else {
- rule_install(ofproto, rule, NULL);
- }
-}
-
-static void
-rule_update_actions(struct ofproto *ofproto, struct rule *rule)
-{
- bool actions_changed = rule_make_actions(ofproto, rule, NULL);
- if (rule->may_install) {
- if (rule->installed) {
- if (actions_changed) {
- /* XXX should really do rule_post_uninstall() for the *old* set
- * of actions, and distinguish the old stats from the new. */
- struct odp_flow_put put;
- do_put_flow(ofproto, rule, ODPPF_CREATE | ODPPF_MODIFY, &put);
- }
- } else {
- rule_install(ofproto, rule, NULL);
- }
- } else {
- rule_uninstall(ofproto, rule);
- }
-}
-
-static void
-rule_account(struct ofproto *ofproto, struct rule *rule, uint64_t extra_bytes)
-{
- uint64_t total_bytes = rule->byte_count + extra_bytes;
-
- if (ofproto->ofhooks->account_flow_cb
- && total_bytes > rule->accounted_bytes)
- {
- ofproto->ofhooks->account_flow_cb(
- &rule->cr.flow, rule->odp_actions, rule->n_odp_actions,
- total_bytes - rule->accounted_bytes, ofproto->aux);
- rule->accounted_bytes = total_bytes;
- }
-}
-
-static void
-rule_uninstall(struct ofproto *p, struct rule *rule)
-{
- assert(!rule->cr.wc.wildcards);
- if (rule->installed) {
- struct odp_flow odp_flow;
-
- odp_flow.key = rule->cr.flow;
- odp_flow.actions = NULL;
- odp_flow.n_actions = 0;
- if (!dpif_flow_del(&p->dpif, &odp_flow)) {
- update_stats(p, rule, &odp_flow.stats);
- }
- rule->installed = false;
-
- rule_post_uninstall(p, rule);
- }
-}
-
-static bool
-is_controller_rule(struct rule *rule)
-{
- /* If the only action is send to the controller then don't report
- * NetFlow expiration messages since it is just part of the control
- * logic for the network and not real traffic. */
-
- if (rule && rule->super) {
- struct rule *super = rule->super;
-
- return super->n_actions == 1 &&
- super->actions[0].type == htons(OFPAT_OUTPUT) &&
- super->actions[0].output.port == htons(OFPP_CONTROLLER);
- }
-
- return false;
-}
-
-static void
-rule_post_uninstall(struct ofproto *ofproto, struct rule *rule)
-{
- struct rule *super = rule->super;
-
- rule_account(ofproto, rule, 0);
-
- if (ofproto->netflow && !is_controller_rule(rule)) {
- struct ofexpired expired;
- expired.flow = rule->cr.flow;
- expired.packet_count = rule->packet_count;
- expired.byte_count = rule->byte_count;
- expired.used = rule->used;
- netflow_expire(ofproto->netflow, &rule->nf_flow, &expired);
- }
- if (super) {
- super->packet_count += rule->packet_count;
- super->byte_count += rule->byte_count;
-
- /* Reset counters to prevent double counting if the rule ever gets
- * reinstalled. */
- rule->packet_count = 0;
- rule->byte_count = 0;
- rule->accounted_bytes = 0;
-
- netflow_flow_clear(&rule->nf_flow);
- }
-}
-\f
-static void
-queue_tx(struct ofpbuf *msg, const struct ofconn *ofconn,
- struct rconn_packet_counter *counter)
-{
- update_openflow_length(msg);
- if (rconn_send(ofconn->rconn, msg, counter)) {
- ofpbuf_delete(msg);
- }
-}
-
-static void
-send_error(const struct ofconn *ofconn, const struct ofp_header *oh,
- int error, const void *data, size_t len)
-{
- struct ofpbuf *buf;
- struct ofp_error_msg *oem;
-
- if (!(error >> 16)) {
- VLOG_WARN_RL(&rl, "not sending bad error code %d to controller",
- error);
- return;
- }
-
- COVERAGE_INC(ofproto_error);
- oem = make_openflow_xid(len + sizeof *oem, OFPT_ERROR,
- oh ? oh->xid : 0, &buf);
- oem->type = htons((unsigned int) error >> 16);
- oem->code = htons(error & 0xffff);
- memcpy(oem->data, data, len);
- queue_tx(buf, ofconn, ofconn->reply_counter);
-}
-
-static void
-send_error_oh(const struct ofconn *ofconn, const struct ofp_header *oh,
- int error)
-{
- size_t oh_length = ntohs(oh->length);
- send_error(ofconn, oh, error, oh, MIN(oh_length, 64));
-}
-
-static void
-hton_ofp_phy_port(struct ofp_phy_port *opp)
-{
- opp->port_no = htons(opp->port_no);
- opp->config = htonl(opp->config);
- opp->state = htonl(opp->state);
- opp->curr = htonl(opp->curr);
- opp->advertised = htonl(opp->advertised);
- opp->supported = htonl(opp->supported);
- opp->peer = htonl(opp->peer);
-}
-
-static int
-handle_echo_request(struct ofconn *ofconn, struct ofp_header *oh)
-{
- struct ofp_header *rq = oh;
- queue_tx(make_echo_reply(rq), ofconn, ofconn->reply_counter);
- return 0;
-}
-
-static int
-handle_features_request(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_header *oh)
-{
- struct ofp_switch_features *osf;
- struct ofpbuf *buf;
- unsigned int port_no;
- struct ofport *port;
-
- osf = make_openflow_xid(sizeof *osf, OFPT_FEATURES_REPLY, oh->xid, &buf);
- osf->datapath_id = htonll(p->datapath_id);
- osf->n_buffers = htonl(pktbuf_capacity());
- osf->n_tables = 2;
- osf->capabilities = htonl(OFPC_FLOW_STATS | OFPC_TABLE_STATS |
- OFPC_PORT_STATS | OFPC_MULTI_PHY_TX);
- osf->actions = htonl((1u << OFPAT_OUTPUT) |
- (1u << OFPAT_SET_VLAN_VID) |
- (1u << OFPAT_SET_VLAN_PCP) |
- (1u << OFPAT_STRIP_VLAN) |
- (1u << OFPAT_SET_DL_SRC) |
- (1u << OFPAT_SET_DL_DST) |
- (1u << OFPAT_SET_NW_SRC) |
- (1u << OFPAT_SET_NW_DST) |
- (1u << OFPAT_SET_TP_SRC) |
- (1u << OFPAT_SET_TP_DST));
-
- PORT_ARRAY_FOR_EACH (port, &p->ports, port_no) {
- hton_ofp_phy_port(ofpbuf_put(buf, &port->opp, sizeof port->opp));
- }
-
- queue_tx(buf, ofconn, ofconn->reply_counter);
- return 0;
-}
-
-static int
-handle_get_config_request(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_header *oh)
-{
- struct ofpbuf *buf;
- struct ofp_switch_config *osc;
- uint16_t flags;
- bool drop_frags;
-
- /* Figure out flags. */
- dpif_get_drop_frags(&p->dpif, &drop_frags);
- flags = drop_frags ? OFPC_FRAG_DROP : OFPC_FRAG_NORMAL;
- if (ofconn->send_flow_exp) {
- flags |= OFPC_SEND_FLOW_EXP;
- }
-
- /* Send reply. */
- osc = make_openflow_xid(sizeof *osc, OFPT_GET_CONFIG_REPLY, oh->xid, &buf);
- osc->flags = htons(flags);
- osc->miss_send_len = htons(ofconn->miss_send_len);
- queue_tx(buf, ofconn, ofconn->reply_counter);
-
- return 0;
-}
-
-static int
-handle_set_config(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_switch_config *osc)
-{
- uint16_t flags;
- int error;
-
- error = check_ofp_message(&osc->header, OFPT_SET_CONFIG, sizeof *osc);
- if (error) {
- return error;
- }
- flags = ntohs(osc->flags);
-
- ofconn->send_flow_exp = (flags & OFPC_SEND_FLOW_EXP) != 0;
-
- if (ofconn == p->controller) {
- switch (flags & OFPC_FRAG_MASK) {
- case OFPC_FRAG_NORMAL:
- dpif_set_drop_frags(&p->dpif, false);
- break;
- case OFPC_FRAG_DROP:
- dpif_set_drop_frags(&p->dpif, true);
- break;
- default:
- VLOG_WARN_RL(&rl, "requested bad fragment mode (flags=%"PRIx16")",
- osc->flags);
- break;
- }
- }
-
- if ((ntohs(osc->miss_send_len) != 0) != (ofconn->miss_send_len != 0)) {
- if (ntohs(osc->miss_send_len) != 0) {
- ofconn->pktbuf = pktbuf_create();
- } else {
- pktbuf_destroy(ofconn->pktbuf);
- }
- }
-
- ofconn->miss_send_len = ntohs(osc->miss_send_len);
-
- return 0;
-}
-
-static void
-add_output_group_action(struct odp_actions *actions, uint16_t group,
- uint16_t *nf_output_iface)
-{
- odp_actions_add(actions, ODPAT_OUTPUT_GROUP)->output_group.group = group;
-
- if (group == DP_GROUP_ALL || group == DP_GROUP_FLOOD) {
- *nf_output_iface = NF_OUT_FLOOD;
- }
-}
-
-static void
-add_controller_action(struct odp_actions *actions,
- const struct ofp_action_output *oao)
-{
- union odp_action *a = odp_actions_add(actions, ODPAT_CONTROLLER);
- a->controller.arg = oao->max_len ? ntohs(oao->max_len) : UINT32_MAX;
-}
-
-struct action_xlate_ctx {
- /* Input. */
- const flow_t *flow; /* Flow to which these actions correspond. */
- int recurse; /* Recursion level, via xlate_table_action. */
- struct ofproto *ofproto;
- const struct ofpbuf *packet; /* The packet corresponding to 'flow', or a
- * null pointer if we are revalidating
- * without a packet to refer to. */
-
- /* Output. */
- struct odp_actions *out; /* Datapath actions. */
- tag_type *tags; /* Tags associated with OFPP_NORMAL actions. */
- bool may_set_up_flow; /* True ordinarily; false if the actions must
- * be reassessed for every packet. */
- uint16_t nf_output_iface; /* Output interface index for NetFlow. */
-};
-
-static void do_xlate_actions(const union ofp_action *in, size_t n_in,
- struct action_xlate_ctx *ctx);
-
-static void
-add_output_action(struct action_xlate_ctx *ctx, uint16_t port)
-{
- const struct ofport *ofport = port_array_get(&ctx->ofproto->ports, port);
-
- if (ofport) {
- if (ofport->opp.config & OFPPC_NO_FWD) {
- /* Forwarding disabled on port. */
- return;
- }
- } else {
- /*
- * We don't have an ofport record for this port, but it doesn't hurt to
- * allow forwarding to it anyhow. Maybe such a port will appear later
- * and we're pre-populating the flow table.
- */
- }
-
- odp_actions_add(ctx->out, ODPAT_OUTPUT)->output.port = port;
- ctx->nf_output_iface = port;
-}
-
-static struct rule *
-lookup_valid_rule(struct ofproto *ofproto, const flow_t *flow)
-{
- struct rule *rule;
- rule = rule_from_cls_rule(classifier_lookup(&ofproto->cls, flow));
-
- /* The rule we found might not be valid, since we could be in need of
- * revalidation. If it is not valid, don't return it. */
- if (rule
- && rule->super
- && ofproto->need_revalidate
- && !revalidate_rule(ofproto, rule)) {
- COVERAGE_INC(ofproto_invalidated);
- return NULL;
- }
-
- return rule;
-}
-
-static void
-xlate_table_action(struct action_xlate_ctx *ctx, uint16_t in_port)
-{
- if (!ctx->recurse) {
- struct rule *rule;
- flow_t flow;
-
- flow = *ctx->flow;
- flow.in_port = in_port;
-
- rule = lookup_valid_rule(ctx->ofproto, &flow);
- if (rule) {
- if (rule->super) {
- rule = rule->super;
- }
-
- ctx->recurse++;
- do_xlate_actions(rule->actions, rule->n_actions, ctx);
- ctx->recurse--;
- }
- }
-}
-
-static void
-xlate_output_action(struct action_xlate_ctx *ctx,
- const struct ofp_action_output *oao)
-{
- uint16_t odp_port;
- uint16_t prev_nf_output_iface = ctx->nf_output_iface;
-
- ctx->nf_output_iface = NF_OUT_DROP;
-
- switch (ntohs(oao->port)) {
- case OFPP_IN_PORT:
- add_output_action(ctx, ctx->flow->in_port);
- break;
- case OFPP_TABLE:
- xlate_table_action(ctx, ctx->flow->in_port);
- break;
- case OFPP_NORMAL:
- if (!ctx->ofproto->ofhooks->normal_cb(ctx->flow, ctx->packet,
- ctx->out, ctx->tags,
- &ctx->nf_output_iface,
- ctx->ofproto->aux)) {
- COVERAGE_INC(ofproto_uninstallable);
- ctx->may_set_up_flow = false;
- }
- break;
- case OFPP_FLOOD:
- add_output_group_action(ctx->out, DP_GROUP_FLOOD,
- &ctx->nf_output_iface);
- break;
- case OFPP_ALL:
- add_output_group_action(ctx->out, DP_GROUP_ALL, &ctx->nf_output_iface);
- break;
- case OFPP_CONTROLLER:
- add_controller_action(ctx->out, oao);
- break;
- case OFPP_LOCAL:
- add_output_action(ctx, ODPP_LOCAL);
- break;
- default:
- odp_port = ofp_port_to_odp_port(ntohs(oao->port));
- if (odp_port != ctx->flow->in_port) {
- add_output_action(ctx, odp_port);
- }
- break;
- }
-
- if (prev_nf_output_iface == NF_OUT_FLOOD) {
- ctx->nf_output_iface = NF_OUT_FLOOD;
- } else if (ctx->nf_output_iface == NF_OUT_DROP) {
- ctx->nf_output_iface = prev_nf_output_iface;
- } else if (prev_nf_output_iface != NF_OUT_DROP &&
- ctx->nf_output_iface != NF_OUT_FLOOD) {
- ctx->nf_output_iface = NF_OUT_MULTI;
- }
-}
-
-static void
-xlate_nicira_action(struct action_xlate_ctx *ctx,
- const struct nx_action_header *nah)
-{
- const struct nx_action_resubmit *nar;
- int subtype = ntohs(nah->subtype);
-
- assert(nah->vendor == htonl(NX_VENDOR_ID));
- switch (subtype) {
- case NXAST_RESUBMIT:
- nar = (const struct nx_action_resubmit *) nah;
- xlate_table_action(ctx, ofp_port_to_odp_port(ntohs(nar->in_port)));
- break;
-
- default:
- VLOG_DBG_RL(&rl, "unknown Nicira action type %"PRIu16, subtype);
- break;
- }
-}
-
-static void
-do_xlate_actions(const union ofp_action *in, size_t n_in,
- struct action_xlate_ctx *ctx)
-{
- struct actions_iterator iter;
- const union ofp_action *ia;
- const struct ofport *port;
-
- port = port_array_get(&ctx->ofproto->ports, ctx->flow->in_port);
- if (port && port->opp.config & (OFPPC_NO_RECV | OFPPC_NO_RECV_STP) &&
- port->opp.config & (eth_addr_equals(ctx->flow->dl_dst, stp_eth_addr)
- ? OFPPC_NO_RECV_STP : OFPPC_NO_RECV)) {
- /* Drop this flow. */
- return;
- }
-
- for (ia = actions_first(&iter, in, n_in); ia; ia = actions_next(&iter)) {
- uint16_t type = ntohs(ia->type);
- union odp_action *oa;
-
- switch (type) {
- case OFPAT_OUTPUT:
- xlate_output_action(ctx, &ia->output);
- break;
-
- case OFPAT_SET_VLAN_VID:
- oa = odp_actions_add(ctx->out, ODPAT_SET_VLAN_VID);
- oa->vlan_vid.vlan_vid = ia->vlan_vid.vlan_vid;
- break;
-
- case OFPAT_SET_VLAN_PCP:
- oa = odp_actions_add(ctx->out, ODPAT_SET_VLAN_PCP);
- oa->vlan_pcp.vlan_pcp = ia->vlan_pcp.vlan_pcp;
- break;
-
- case OFPAT_STRIP_VLAN:
- odp_actions_add(ctx->out, ODPAT_STRIP_VLAN);
- break;
-
- case OFPAT_SET_DL_SRC:
- oa = odp_actions_add(ctx->out, ODPAT_SET_DL_SRC);
- memcpy(oa->dl_addr.dl_addr,
- ((struct ofp_action_dl_addr *) ia)->dl_addr, ETH_ADDR_LEN);
- break;
-
- case OFPAT_SET_DL_DST:
- oa = odp_actions_add(ctx->out, ODPAT_SET_DL_DST);
- memcpy(oa->dl_addr.dl_addr,
- ((struct ofp_action_dl_addr *) ia)->dl_addr, ETH_ADDR_LEN);
- break;
-
- case OFPAT_SET_NW_SRC:
- oa = odp_actions_add(ctx->out, ODPAT_SET_NW_SRC);
- oa->nw_addr.nw_addr = ia->nw_addr.nw_addr;
- break;
-
- case OFPAT_SET_TP_SRC:
- oa = odp_actions_add(ctx->out, ODPAT_SET_TP_SRC);
- oa->tp_port.tp_port = ia->tp_port.tp_port;
- break;
-
- case OFPAT_VENDOR:
- xlate_nicira_action(ctx, (const struct nx_action_header *) ia);
- break;
-
- default:
- VLOG_DBG_RL(&rl, "unknown action type %"PRIu16, type);
- break;
- }
- }
-}
-
-static int
-xlate_actions(const union ofp_action *in, size_t n_in,
- const flow_t *flow, struct ofproto *ofproto,
- const struct ofpbuf *packet,
- struct odp_actions *out, tag_type *tags, bool *may_set_up_flow,
- uint16_t *nf_output_iface)
-{
- tag_type no_tags = 0;
- struct action_xlate_ctx ctx;
- COVERAGE_INC(ofproto_ofp2odp);
- odp_actions_init(out);
- ctx.flow = flow;
- ctx.recurse = 0;
- ctx.ofproto = ofproto;
- ctx.packet = packet;
- ctx.out = out;
- ctx.tags = tags ? tags : &no_tags;
- ctx.may_set_up_flow = true;
- ctx.nf_output_iface = NF_OUT_DROP;
- do_xlate_actions(in, n_in, &ctx);
-
- /* Check with in-band control to see if we're allowed to set up this
- * flow. */
- if (!in_band_rule_check(ofproto->in_band, flow, out)) {
- ctx.may_set_up_flow = false;
- }
-
- if (may_set_up_flow) {
- *may_set_up_flow = ctx.may_set_up_flow;
- }
- if (nf_output_iface) {
- *nf_output_iface = ctx.nf_output_iface;
- }
- if (odp_actions_overflow(out)) {
- odp_actions_init(out);
- return ofp_mkerr(OFPET_BAD_ACTION, OFPBAC_TOO_MANY);
- }
- return 0;
-}
-
-static int
-handle_packet_out(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_header *oh)
-{
- struct ofp_packet_out *opo;
- struct ofpbuf payload, *buffer;
- struct odp_actions actions;
- int n_actions;
- uint16_t in_port;
- flow_t flow;
- int error;
-
- error = check_ofp_packet_out(oh, &payload, &n_actions, p->max_ports);
- if (error) {
- return error;
- }
- opo = (struct ofp_packet_out *) oh;
-
- COVERAGE_INC(ofproto_packet_out);
- if (opo->buffer_id != htonl(UINT32_MAX)) {
- error = pktbuf_retrieve(ofconn->pktbuf, ntohl(opo->buffer_id),
- &buffer, &in_port);
- if (error || !buffer) {
- return error;
- }
- payload = *buffer;
- } else {
- buffer = NULL;
- }
-
- flow_extract(&payload, ofp_port_to_odp_port(ntohs(opo->in_port)), &flow);
- error = xlate_actions((const union ofp_action *) opo->actions, n_actions,
- &flow, p, &payload, &actions, NULL, NULL, NULL);
- if (error) {
- return error;
- }
-
- dpif_execute(&p->dpif, flow.in_port, actions.actions, actions.n_actions,
- &payload);
- ofpbuf_delete(buffer);
-
- return 0;
-}
-
-static void
-update_port_config(struct ofproto *p, struct ofport *port,
- uint32_t config, uint32_t mask)
-{
- mask &= config ^ port->opp.config;
- if (mask & OFPPC_PORT_DOWN) {
- if (config & OFPPC_PORT_DOWN) {
- netdev_turn_flags_off(port->netdev, NETDEV_UP, true);
- } else {
- netdev_turn_flags_on(port->netdev, NETDEV_UP, true);
- }
- }
-#define REVALIDATE_BITS (OFPPC_NO_RECV | OFPPC_NO_RECV_STP | OFPPC_NO_FWD)
- if (mask & REVALIDATE_BITS) {
- COVERAGE_INC(ofproto_costly_flags);
- port->opp.config ^= mask & REVALIDATE_BITS;
- p->need_revalidate = true;
- }
-#undef REVALIDATE_BITS
- if (mask & OFPPC_NO_FLOOD) {
- port->opp.config ^= OFPPC_NO_FLOOD;
- refresh_port_group(p, DP_GROUP_FLOOD);
- }
- if (mask & OFPPC_NO_PACKET_IN) {
- port->opp.config ^= OFPPC_NO_PACKET_IN;
- }
-}
-
-static int
-handle_port_mod(struct ofproto *p, struct ofp_header *oh)
-{
- const struct ofp_port_mod *opm;
- struct ofport *port;
- int error;
-
- error = check_ofp_message(oh, OFPT_PORT_MOD, sizeof *opm);
- if (error) {
- return error;
- }
- opm = (struct ofp_port_mod *) oh;
-
- port = port_array_get(&p->ports,
- ofp_port_to_odp_port(ntohs(opm->port_no)));
- if (!port) {
- return ofp_mkerr(OFPET_PORT_MOD_FAILED, OFPPMFC_BAD_PORT);
- } else if (memcmp(port->opp.hw_addr, opm->hw_addr, OFP_ETH_ALEN)) {
- return ofp_mkerr(OFPET_PORT_MOD_FAILED, OFPPMFC_BAD_HW_ADDR);
- } else {
- update_port_config(p, port, ntohl(opm->config), ntohl(opm->mask));
- if (opm->advertise) {
- netdev_set_advertisements(port->netdev, ntohl(opm->advertise));
- }
- }
- return 0;
-}
-
-static struct ofpbuf *
-make_stats_reply(uint32_t xid, uint16_t type, size_t body_len)
-{
- struct ofp_stats_reply *osr;
- struct ofpbuf *msg;
-
- msg = ofpbuf_new(MIN(sizeof *osr + body_len, UINT16_MAX));
- osr = put_openflow_xid(sizeof *osr, OFPT_STATS_REPLY, xid, msg);
- osr->type = type;
- osr->flags = htons(0);
- return msg;
-}
-
-static struct ofpbuf *
-start_stats_reply(const struct ofp_stats_request *request, size_t body_len)
-{
- return make_stats_reply(request->header.xid, request->type, body_len);
-}
-
-static void *
-append_stats_reply(size_t nbytes, struct ofconn *ofconn, struct ofpbuf **msgp)
-{
- struct ofpbuf *msg = *msgp;
- assert(nbytes <= UINT16_MAX - sizeof(struct ofp_stats_reply));
- if (nbytes + msg->size > UINT16_MAX) {
- struct ofp_stats_reply *reply = msg->data;
- reply->flags = htons(OFPSF_REPLY_MORE);
- *msgp = make_stats_reply(reply->header.xid, reply->type, nbytes);
- queue_tx(msg, ofconn, ofconn->reply_counter);
- }
- return ofpbuf_put_uninit(*msgp, nbytes);
-}
-
-static int
-handle_desc_stats_request(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_stats_request *request)
-{
- struct ofp_desc_stats *ods;
- struct ofpbuf *msg;
-
- msg = start_stats_reply(request, sizeof *ods);
- ods = append_stats_reply(sizeof *ods, ofconn, &msg);
- strncpy(ods->mfr_desc, p->manufacturer, sizeof ods->mfr_desc);
- strncpy(ods->hw_desc, p->hardware, sizeof ods->hw_desc);
- strncpy(ods->sw_desc, p->software, sizeof ods->sw_desc);
- strncpy(ods->serial_num, p->serial, sizeof ods->serial_num);
- queue_tx(msg, ofconn, ofconn->reply_counter);
-
- return 0;
-}
-
-static void
-count_subrules(struct cls_rule *cls_rule, void *n_subrules_)
-{
- struct rule *rule = rule_from_cls_rule(cls_rule);
- int *n_subrules = n_subrules_;
-
- if (rule->super) {
- (*n_subrules)++;
- }
-}
-
-static int
-handle_table_stats_request(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_stats_request *request)
-{
- struct ofp_table_stats *ots;
- struct ofpbuf *msg;
- struct odp_stats dpstats;
- int n_exact, n_subrules, n_wild;
-
- msg = start_stats_reply(request, sizeof *ots * 2);
-
- /* Count rules of various kinds. */
- n_subrules = 0;
- classifier_for_each(&p->cls, CLS_INC_EXACT, count_subrules, &n_subrules);
- n_exact = classifier_count_exact(&p->cls) - n_subrules;
- n_wild = classifier_count(&p->cls) - classifier_count_exact(&p->cls);
-
- /* Hash table. */
- dpif_get_dp_stats(&p->dpif, &dpstats);
- ots = append_stats_reply(sizeof *ots, ofconn, &msg);
- memset(ots, 0, sizeof *ots);
- ots->table_id = TABLEID_HASH;
- strcpy(ots->name, "hash");
- ots->wildcards = htonl(0);
- ots->max_entries = htonl(dpstats.max_capacity);
- ots->active_count = htonl(n_exact);
- ots->lookup_count = htonll(dpstats.n_frags + dpstats.n_hit +
- dpstats.n_missed);
- ots->matched_count = htonll(dpstats.n_hit); /* XXX */
-
- /* Classifier table. */
- ots = append_stats_reply(sizeof *ots, ofconn, &msg);
- memset(ots, 0, sizeof *ots);
- ots->table_id = TABLEID_CLASSIFIER;
- strcpy(ots->name, "classifier");
- ots->wildcards = htonl(OFPFW_ALL);
- ots->max_entries = htonl(65536);
- ots->active_count = htonl(n_wild);
- ots->lookup_count = htonll(0); /* XXX */
- ots->matched_count = htonll(0); /* XXX */
-
- queue_tx(msg, ofconn, ofconn->reply_counter);
- return 0;
-}
-
-static int
-handle_port_stats_request(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_stats_request *request)
-{
- struct ofp_port_stats *ops;
- struct ofpbuf *msg;
- struct ofport *port;
- unsigned int port_no;
-
- msg = start_stats_reply(request, sizeof *ops * 16);
- PORT_ARRAY_FOR_EACH (port, &p->ports, port_no) {
- struct netdev_stats stats;
-
- /* Intentionally ignore return value, since errors will set 'stats' to
- * all-1s, which is correct for OpenFlow, and netdev_get_stats() will
- * log errors. */
- netdev_get_stats(port->netdev, &stats);
-
- ops = append_stats_reply(sizeof *ops, ofconn, &msg);
- ops->port_no = htons(odp_port_to_ofp_port(port_no));
- memset(ops->pad, 0, sizeof ops->pad);
- ops->rx_packets = htonll(stats.rx_packets);
- ops->tx_packets = htonll(stats.tx_packets);
- ops->rx_bytes = htonll(stats.rx_bytes);
- ops->tx_bytes = htonll(stats.tx_bytes);
- ops->rx_dropped = htonll(stats.rx_dropped);
- ops->tx_dropped = htonll(stats.tx_dropped);
- ops->rx_errors = htonll(stats.rx_errors);
- ops->tx_errors = htonll(stats.tx_errors);
- ops->rx_frame_err = htonll(stats.rx_frame_errors);
- ops->rx_over_err = htonll(stats.rx_over_errors);
- ops->rx_crc_err = htonll(stats.rx_crc_errors);
- ops->collisions = htonll(stats.collisions);
- }
-
- queue_tx(msg, ofconn, ofconn->reply_counter);
- return 0;
-}
-
-struct flow_stats_cbdata {
- struct ofproto *ofproto;
- struct ofconn *ofconn;
- uint16_t out_port;
- struct ofpbuf *msg;
-};
-
-static void
-query_stats(struct ofproto *p, struct rule *rule,
- uint64_t *packet_countp, uint64_t *byte_countp)
-{
- uint64_t packet_count, byte_count;
- struct rule *subrule;
- struct odp_flow *odp_flows;
- size_t n_odp_flows;
-
- packet_count = rule->packet_count;
- byte_count = rule->byte_count;
-
- n_odp_flows = rule->cr.wc.wildcards ? list_size(&rule->list) : 1;
- odp_flows = xcalloc(1, n_odp_flows * sizeof *odp_flows);
- if (rule->cr.wc.wildcards) {
- size_t i = 0;
- LIST_FOR_EACH (subrule, struct rule, list, &rule->list) {
- odp_flows[i++].key = subrule->cr.flow;
- packet_count += subrule->packet_count;
- byte_count += subrule->byte_count;
- }
- } else {
- odp_flows[0].key = rule->cr.flow;
- }
-
- if (!dpif_flow_get_multiple(&p->dpif, odp_flows, n_odp_flows)) {
- size_t i;
- for (i = 0; i < n_odp_flows; i++) {
- struct odp_flow *odp_flow = &odp_flows[i];
- packet_count += odp_flow->stats.n_packets;
- byte_count += odp_flow->stats.n_bytes;
- }
- }
- free(odp_flows);
-
- *packet_countp = packet_count;
- *byte_countp = byte_count;
-}
-
-static void
-flow_stats_cb(struct cls_rule *rule_, void *cbdata_)
-{
- struct rule *rule = rule_from_cls_rule(rule_);
- struct flow_stats_cbdata *cbdata = cbdata_;
- struct ofp_flow_stats *ofs;
- uint64_t packet_count, byte_count;
- size_t act_len, len;
-
- if (rule_is_hidden(rule) || !rule_has_out_port(rule, cbdata->out_port)) {
- return;
- }
-
- act_len = sizeof *rule->actions * rule->n_actions;
- len = offsetof(struct ofp_flow_stats, actions) + act_len;
-
- query_stats(cbdata->ofproto, rule, &packet_count, &byte_count);
-
- ofs = append_stats_reply(len, cbdata->ofconn, &cbdata->msg);
- ofs->length = htons(len);
- ofs->table_id = rule->cr.wc.wildcards ? TABLEID_CLASSIFIER : TABLEID_HASH;
- ofs->pad = 0;
- flow_to_match(&rule->cr.flow, rule->cr.wc.wildcards, &ofs->match);
- ofs->duration = htonl((time_msec() - rule->created) / 1000);
- ofs->priority = htons(rule->cr.priority);
- ofs->idle_timeout = htons(rule->idle_timeout);
- ofs->hard_timeout = htons(rule->hard_timeout);
- memset(ofs->pad2, 0, sizeof ofs->pad2);
- ofs->packet_count = htonll(packet_count);
- ofs->byte_count = htonll(byte_count);
- memcpy(ofs->actions, rule->actions, act_len);
-}
-
-static int
-table_id_to_include(uint8_t table_id)
-{
- return (table_id == TABLEID_HASH ? CLS_INC_EXACT
- : table_id == TABLEID_CLASSIFIER ? CLS_INC_WILD
- : table_id == 0xff ? CLS_INC_ALL
- : 0);
-}
-
-static int
-handle_flow_stats_request(struct ofproto *p, struct ofconn *ofconn,
- const struct ofp_stats_request *osr,
- size_t arg_size)
-{
- struct ofp_flow_stats_request *fsr;
- struct flow_stats_cbdata cbdata;
- struct cls_rule target;
-
- if (arg_size != sizeof *fsr) {
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
- }
- fsr = (struct ofp_flow_stats_request *) osr->body;
-
- COVERAGE_INC(ofproto_flows_req);
- cbdata.ofproto = p;
- cbdata.ofconn = ofconn;
- cbdata.out_port = fsr->out_port;
- cbdata.msg = start_stats_reply(osr, 1024);
- cls_rule_from_match(&target, &fsr->match, 0);
- classifier_for_each_match(&p->cls, &target,
- table_id_to_include(fsr->table_id),
- flow_stats_cb, &cbdata);
- queue_tx(cbdata.msg, ofconn, ofconn->reply_counter);
- return 0;
-}
-
-struct flow_stats_ds_cbdata {
- struct ofproto *ofproto;
- struct ds *results;
-};
-
-static void
-flow_stats_ds_cb(struct cls_rule *rule_, void *cbdata_)
-{
- struct rule *rule = rule_from_cls_rule(rule_);
- struct flow_stats_ds_cbdata *cbdata = cbdata_;
- struct ds *results = cbdata->results;
- struct ofp_match match;
- uint64_t packet_count, byte_count;
- size_t act_len = sizeof *rule->actions * rule->n_actions;
-
- /* Don't report on subrules. */
- if (rule->super != NULL) {
- return;
- }
-
- query_stats(cbdata->ofproto, rule, &packet_count, &byte_count);
- flow_to_ovs_match(&rule->cr.flow, rule->cr.wc.wildcards, &match);
-
- ds_put_format(results, "duration=%llds, ",
- (time_msec() - rule->created) / 1000);
- ds_put_format(results, "priority=%u, ", rule->cr.priority);
- ds_put_format(results, "n_packets=%"PRIu64", ", packet_count);
- ds_put_format(results, "n_bytes=%"PRIu64", ", byte_count);
- ofp_print_match(results, &match, true);
- ofp_print_actions(results, &rule->actions->header, act_len);
- ds_put_cstr(results, "\n");
-}
-
-/* Adds a pretty-printed description of all flows to 'results', including
- * those marked hidden by secchan (e.g., by in-band control). */
-void
-ofproto_get_all_flows(struct ofproto *p, struct ds *results)
-{
- struct ofp_match match;
- struct cls_rule target;
- struct flow_stats_ds_cbdata cbdata;
-
- memset(&match, 0, sizeof match);
- match.wildcards = htonl(OFPFW_ALL);
-
- cbdata.ofproto = p;
- cbdata.results = results;
-
- cls_rule_from_match(&target, &match, 0);
- classifier_for_each_match(&p->cls, &target, CLS_INC_ALL,
- flow_stats_ds_cb, &cbdata);
-}
-
-struct aggregate_stats_cbdata {
- struct ofproto *ofproto;
- uint16_t out_port;
- uint64_t packet_count;
- uint64_t byte_count;
- uint32_t n_flows;
-};
-
-static void
-aggregate_stats_cb(struct cls_rule *rule_, void *cbdata_)
-{
- struct rule *rule = rule_from_cls_rule(rule_);
- struct aggregate_stats_cbdata *cbdata = cbdata_;
- uint64_t packet_count, byte_count;
-
- if (rule_is_hidden(rule) || !rule_has_out_port(rule, cbdata->out_port)) {
- return;
- }
-
- query_stats(cbdata->ofproto, rule, &packet_count, &byte_count);
-
- cbdata->packet_count += packet_count;
- cbdata->byte_count += byte_count;
- cbdata->n_flows++;
-}
-
-static int
-handle_aggregate_stats_request(struct ofproto *p, struct ofconn *ofconn,
- const struct ofp_stats_request *osr,
- size_t arg_size)
-{
- struct ofp_aggregate_stats_request *asr;
- struct ofp_aggregate_stats_reply *reply;
- struct aggregate_stats_cbdata cbdata;
- struct cls_rule target;
- struct ofpbuf *msg;
-
- if (arg_size != sizeof *asr) {
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
- }
- asr = (struct ofp_aggregate_stats_request *) osr->body;
-
- COVERAGE_INC(ofproto_agg_request);
- cbdata.ofproto = p;
- cbdata.out_port = asr->out_port;
- cbdata.packet_count = 0;
- cbdata.byte_count = 0;
- cbdata.n_flows = 0;
- cls_rule_from_match(&target, &asr->match, 0);
- classifier_for_each_match(&p->cls, &target,
- table_id_to_include(asr->table_id),
- aggregate_stats_cb, &cbdata);
-
- msg = start_stats_reply(osr, sizeof *reply);
- reply = append_stats_reply(sizeof *reply, ofconn, &msg);
- reply->flow_count = htonl(cbdata.n_flows);
- reply->packet_count = htonll(cbdata.packet_count);
- reply->byte_count = htonll(cbdata.byte_count);
- queue_tx(msg, ofconn, ofconn->reply_counter);
- return 0;
-}
-
-static int
-handle_stats_request(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_header *oh)
-{
- struct ofp_stats_request *osr;
- size_t arg_size;
- int error;
-
- error = check_ofp_message_array(oh, OFPT_STATS_REQUEST, sizeof *osr,
- 1, &arg_size);
- if (error) {
- return error;
- }
- osr = (struct ofp_stats_request *) oh;
-
- switch (ntohs(osr->type)) {
- case OFPST_DESC:
- return handle_desc_stats_request(p, ofconn, osr);
-
- case OFPST_FLOW:
- return handle_flow_stats_request(p, ofconn, osr, arg_size);
-
- case OFPST_AGGREGATE:
- return handle_aggregate_stats_request(p, ofconn, osr, arg_size);
-
- case OFPST_TABLE:
- return handle_table_stats_request(p, ofconn, osr);
-
- case OFPST_PORT:
- return handle_port_stats_request(p, ofconn, osr);
-
- case OFPST_VENDOR:
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_VENDOR);
-
- default:
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_STAT);
- }
-}
-
-static long long int
-msec_from_nsec(uint64_t sec, uint32_t nsec)
-{
- return !sec ? 0 : sec * 1000 + nsec / 1000000;
-}
-
-static void
-update_time(struct ofproto *ofproto, struct rule *rule,
- const struct odp_flow_stats *stats)
-{
- long long int used = msec_from_nsec(stats->used_sec, stats->used_nsec);
- if (used > rule->used) {
- rule->used = used;
- netflow_flow_update_time(ofproto->netflow, &rule->nf_flow, used);
- }
-}
-
-static void
-update_stats(struct ofproto *ofproto, struct rule *rule,
- const struct odp_flow_stats *stats)
-{
- if (stats->n_packets) {
- update_time(ofproto, rule, stats);
- rule->packet_count += stats->n_packets;
- rule->byte_count += stats->n_bytes;
- netflow_flow_update_flags(&rule->nf_flow, stats->ip_tos,
- stats->tcp_flags);
- }
-}
-
-static int
-add_flow(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_flow_mod *ofm, size_t n_actions)
-{
- struct ofpbuf *packet;
- struct rule *rule;
- uint16_t in_port;
- int error;
-
- rule = rule_create(p, NULL, (const union ofp_action *) ofm->actions,
- n_actions, ntohs(ofm->idle_timeout),
- ntohs(ofm->hard_timeout));
- cls_rule_from_match(&rule->cr, &ofm->match, ntohs(ofm->priority));
-
- packet = NULL;
- error = 0;
- if (ofm->buffer_id != htonl(UINT32_MAX)) {
- error = pktbuf_retrieve(ofconn->pktbuf, ntohl(ofm->buffer_id),
- &packet, &in_port);
- }
-
- rule_insert(p, rule, packet, in_port);
- ofpbuf_delete(packet);
- return error;
-}
-
-static int
-modify_flow(struct ofproto *p, const struct ofp_flow_mod *ofm,
- size_t n_actions, uint16_t command, struct rule *rule)
-{
- if (rule_is_hidden(rule)) {
- return 0;
- }
-
- if (command == OFPFC_DELETE) {
- rule_remove(p, rule);
- } else {
- size_t actions_len = n_actions * sizeof *rule->actions;
-
- if (n_actions == rule->n_actions
- && !memcmp(ofm->actions, rule->actions, actions_len))
- {
- return 0;
- }
-
- free(rule->actions);
- rule->actions = xmemdup(ofm->actions, actions_len);
- rule->n_actions = n_actions;
-
- if (rule->cr.wc.wildcards) {
- COVERAGE_INC(ofproto_mod_wc_flow);
- p->need_revalidate = true;
- } else {
- rule_update_actions(p, rule);
- }
- }
-
- return 0;
-}
-
-static int
-modify_flows_strict(struct ofproto *p, const struct ofp_flow_mod *ofm,
- size_t n_actions, uint16_t command)
-{
- struct rule *rule;
- uint32_t wildcards;
- flow_t flow;
-
- flow_from_match(&flow, &wildcards, &ofm->match);
- rule = rule_from_cls_rule(classifier_find_rule_exactly(
- &p->cls, &flow, wildcards,
- ntohs(ofm->priority)));
-
- if (rule) {
- if (command == OFPFC_DELETE
- && ofm->out_port != htons(OFPP_NONE)
- && !rule_has_out_port(rule, ofm->out_port)) {
- return 0;
- }
-
- modify_flow(p, ofm, n_actions, command, rule);
- }
- return 0;
-}
-
-struct modify_flows_cbdata {
- struct ofproto *ofproto;
- const struct ofp_flow_mod *ofm;
- uint16_t out_port;
- size_t n_actions;
- uint16_t command;
-};
-
-static void
-modify_flows_cb(struct cls_rule *rule_, void *cbdata_)
-{
- struct rule *rule = rule_from_cls_rule(rule_);
- struct modify_flows_cbdata *cbdata = cbdata_;
-
- if (cbdata->out_port != htons(OFPP_NONE)
- && !rule_has_out_port(rule, cbdata->out_port)) {
- return;
- }
-
- modify_flow(cbdata->ofproto, cbdata->ofm, cbdata->n_actions,
- cbdata->command, rule);
-}
-
-static int
-modify_flows_loose(struct ofproto *p, const struct ofp_flow_mod *ofm,
- size_t n_actions, uint16_t command)
-{
- struct modify_flows_cbdata cbdata;
- struct cls_rule target;
-
- cbdata.ofproto = p;
- cbdata.ofm = ofm;
- cbdata.out_port = (command == OFPFC_DELETE ? ofm->out_port
- : htons(OFPP_NONE));
- cbdata.n_actions = n_actions;
- cbdata.command = command;
-
- cls_rule_from_match(&target, &ofm->match, 0);
-
- classifier_for_each_match(&p->cls, &target, CLS_INC_ALL,
- modify_flows_cb, &cbdata);
- return 0;
-}
-
-static int
-handle_flow_mod(struct ofproto *p, struct ofconn *ofconn,
- struct ofp_flow_mod *ofm)
-{
- size_t n_actions;
- int error;
-
- error = check_ofp_message_array(&ofm->header, OFPT_FLOW_MOD, sizeof *ofm,
- sizeof *ofm->actions, &n_actions);
- if (error) {
- return error;
- }
-
- normalize_match(&ofm->match);
- if (!ofm->match.wildcards) {
- ofm->priority = htons(UINT16_MAX);
- }
-
- error = validate_actions((const union ofp_action *) ofm->actions,
- n_actions, p->max_ports);
- if (error) {
- return error;
- }
-
- switch (ntohs(ofm->command)) {
- case OFPFC_ADD:
- return add_flow(p, ofconn, ofm, n_actions);
-
- case OFPFC_MODIFY:
- return modify_flows_loose(p, ofm, n_actions, OFPFC_MODIFY);
-
- case OFPFC_MODIFY_STRICT:
- return modify_flows_strict(p, ofm, n_actions, OFPFC_MODIFY);
-
- case OFPFC_DELETE:
- return modify_flows_loose(p, ofm, n_actions, OFPFC_DELETE);
-
- case OFPFC_DELETE_STRICT:
- return modify_flows_strict(p, ofm, n_actions, OFPFC_DELETE);
-
- default:
- return ofp_mkerr(OFPET_FLOW_MOD_FAILED, OFPFMFC_BAD_COMMAND);
- }
-}
-
-static void
-send_capability_reply(struct ofproto *p, struct ofconn *ofconn, uint32_t xid)
-{
- struct ofmp_capability_reply *ocr;
- struct ofpbuf *b;
- char capabilities[] = "com.nicira.mgmt.manager=false\n";
-
- ocr = make_openflow_xid(sizeof(*ocr), OFPT_VENDOR, xid, &b);
- ocr->header.header.vendor = htonl(NX_VENDOR_ID);
- ocr->header.header.subtype = htonl(NXT_MGMT);
- ocr->header.type = htons(OFMPT_CAPABILITY_REPLY);
-
- ocr->format = htonl(OFMPCOF_SIMPLE);
- ocr->mgmt_id = htonll(p->mgmt_id);
-
- ofpbuf_put(b, capabilities, strlen(capabilities));
-
- queue_tx(b, ofconn, ofconn->reply_counter);
-}
-
-static int
-handle_ofmp(struct ofproto *p, struct ofconn *ofconn,
- struct ofmp_header *ofmph)
-{
- size_t msg_len = ntohs(ofmph->header.header.length);
- if (msg_len < sizeof(*ofmph)) {
- VLOG_WARN_RL(&rl, "dropping short managment message: %zu\n", msg_len);
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
- }
-
- if (ofmph->type == htons(OFMPT_CAPABILITY_REQUEST)) {
- struct ofmp_capability_request *ofmpcr;
-
- if (msg_len < sizeof(struct ofmp_capability_request)) {
- VLOG_WARN_RL(&rl, "dropping short capability request: %zu\n",
- msg_len);
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
- }
-
- ofmpcr = (struct ofmp_capability_request *)ofmph;
- if (ofmpcr->format != htonl(OFMPCAF_SIMPLE)) {
- /* xxx Find a better type than bad subtype */
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE);
- }
-
- send_capability_reply(p, ofconn, ofmph->header.header.xid);
- return 0;
- } else {
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE);
- }
-}
-
-static int
-handle_vendor(struct ofproto *p, struct ofconn *ofconn, void *msg)
-{
- struct ofp_vendor_header *ovh = msg;
- struct nicira_header *nh;
-
- if (ntohs(ovh->header.length) < sizeof(struct ofp_vendor_header)) {
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
- }
- if (ovh->vendor != htonl(NX_VENDOR_ID)) {
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_VENDOR);
- }
- if (ntohs(ovh->header.length) < sizeof(struct nicira_header)) {
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_LENGTH);
- }
-
- nh = msg;
- switch (ntohl(nh->subtype)) {
- case NXT_STATUS_REQUEST:
- return switch_status_handle_request(p->switch_status, ofconn->rconn,
- msg);
-
- case NXT_ACT_SET_CONFIG:
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE); /* XXX */
-
- case NXT_ACT_GET_CONFIG:
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE); /* XXX */
-
- case NXT_COMMAND_REQUEST:
- if (p->executer) {
- return executer_handle_request(p->executer, ofconn->rconn, msg);
- }
- break;
-
- case NXT_MGMT:
- return handle_ofmp(p, ofconn, msg);
- }
-
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_SUBTYPE);
-}
-
-static void
-handle_openflow(struct ofconn *ofconn, struct ofproto *p,
- struct ofpbuf *ofp_msg)
-{
- struct ofp_header *oh = ofp_msg->data;
- int error;
-
- COVERAGE_INC(ofproto_recv_openflow);
- switch (oh->type) {
- case OFPT_ECHO_REQUEST:
- error = handle_echo_request(ofconn, oh);
- break;
-
- case OFPT_ECHO_REPLY:
- error = 0;
- break;
-
- case OFPT_FEATURES_REQUEST:
- error = handle_features_request(p, ofconn, oh);
- break;
-
- case OFPT_GET_CONFIG_REQUEST:
- error = handle_get_config_request(p, ofconn, oh);
- break;
-
- case OFPT_SET_CONFIG:
- error = handle_set_config(p, ofconn, ofp_msg->data);
- break;
-
- case OFPT_PACKET_OUT:
- error = handle_packet_out(p, ofconn, ofp_msg->data);
- break;
-
- case OFPT_PORT_MOD:
- error = handle_port_mod(p, oh);
- break;
-
- case OFPT_FLOW_MOD:
- error = handle_flow_mod(p, ofconn, ofp_msg->data);
- break;
-
- case OFPT_STATS_REQUEST:
- error = handle_stats_request(p, ofconn, oh);
- break;
-
- case OFPT_VENDOR:
- error = handle_vendor(p, ofconn, ofp_msg->data);
- break;
-
- default:
- if (VLOG_IS_WARN_ENABLED()) {
- char *s = ofp_to_string(oh, ntohs(oh->length), 2);
- VLOG_DBG_RL(&rl, "OpenFlow message ignored: %s", s);
- free(s);
- }
- error = ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
- break;
- }
-
- if (error) {
- send_error_oh(ofconn, ofp_msg->data, error);
- }
-}
-\f
-static void
-handle_odp_msg(struct ofproto *p, struct ofpbuf *packet)
-{
- struct odp_msg *msg = packet->data;
- uint16_t in_port = odp_port_to_ofp_port(msg->port);
- struct rule *rule;
- struct ofpbuf payload;
- flow_t flow;
-
- /* Handle controller actions. */
- if (msg->type == _ODPL_ACTION_NR) {
- COVERAGE_INC(ofproto_ctlr_action);
- pinsched_send(p->action_sched, in_port, packet,
- send_packet_in_action, p);
- return;
- }
-
- payload.data = msg + 1;
- payload.size = msg->length - sizeof *msg;
- flow_extract(&payload, msg->port, &flow);
-
- /* Check with in-band control to see if this packet should be sent
- * to the local port regardless of the flow table. */
- if (in_band_msg_in_hook(p->in_band, &flow, &payload)) {
- union odp_action action;
-
- memset(&action, 0, sizeof(action));
- action.output.type = ODPAT_OUTPUT;
- action.output.port = ODPP_LOCAL;
- dpif_execute(&p->dpif, flow.in_port, &action, 1, &payload);
- }
-
- rule = lookup_valid_rule(p, &flow);
- if (!rule) {
- /* Don't send a packet-in if OFPPC_NO_PACKET_IN asserted. */
- struct ofport *port = port_array_get(&p->ports, msg->port);
- if (port) {
- if (port->opp.config & OFPPC_NO_PACKET_IN) {
- COVERAGE_INC(ofproto_no_packet_in);
- /* XXX install 'drop' flow entry */
- ofpbuf_delete(packet);
- return;
- }
- } else {
- VLOG_WARN_RL(&rl, "packet-in on unknown port %"PRIu16, msg->port);
- }
-
- COVERAGE_INC(ofproto_packet_in);
- pinsched_send(p->miss_sched, in_port, packet, send_packet_in_miss, p);
- return;
- }
-
- if (rule->cr.wc.wildcards) {
- rule = rule_create_subrule(p, rule, &flow);
- rule_make_actions(p, rule, packet);
- } else {
- if (!rule->may_install) {
- /* The rule is not installable, that is, we need to process every
- * packet, so process the current packet and set its actions into
- * 'subrule'. */
- rule_make_actions(p, rule, packet);
- } else {
- /* XXX revalidate rule if it needs it */
- }
- }
-
- rule_execute(p, rule, &payload, &flow);
- rule_reinstall(p, rule);
-
- if (rule->super && rule->super->cr.priority == FAIL_OPEN_PRIORITY
- && rconn_is_connected(p->controller->rconn)) {
- /*
- * Extra-special case for fail-open mode.
- *
- * We are in fail-open mode and the packet matched the fail-open rule,
- * but we are connected to a controller too. We should send the packet
- * up to the controller in the hope that it will try to set up a flow
- * and thereby allow us to exit fail-open.
- *
- * See the top-level comment in fail-open.c for more information.
- */
- pinsched_send(p->miss_sched, in_port, packet, send_packet_in_miss, p);
- } else {
- ofpbuf_delete(packet);
- }
-}
-\f
-static void
-revalidate_cb(struct cls_rule *sub_, void *cbdata_)
-{
- struct rule *sub = rule_from_cls_rule(sub_);
- struct revalidate_cbdata *cbdata = cbdata_;
-
- if (cbdata->revalidate_all
- || (cbdata->revalidate_subrules && sub->super)
- || (tag_set_intersects(&cbdata->revalidate_set, sub->tags))) {
- revalidate_rule(cbdata->ofproto, sub);
- }
-}
-
-static bool
-revalidate_rule(struct ofproto *p, struct rule *rule)
-{
- const flow_t *flow = &rule->cr.flow;
-
- COVERAGE_INC(ofproto_revalidate_rule);
- if (rule->super) {
- struct rule *super;
- super = rule_from_cls_rule(classifier_lookup_wild(&p->cls, flow));
- if (!super) {
- rule_remove(p, rule);
- return false;
- } else if (super != rule->super) {
- COVERAGE_INC(ofproto_revalidate_moved);
- list_remove(&rule->list);
- list_push_back(&super->list, &rule->list);
- rule->super = super;
- rule->hard_timeout = super->hard_timeout;
- rule->idle_timeout = super->idle_timeout;
- rule->created = super->created;
- rule->used = 0;
- }
- }
-
- rule_update_actions(p, rule);
- return true;
-}
-
-static struct ofpbuf *
-compose_flow_exp(const struct rule *rule, long long int now, uint8_t reason)
-{
- struct ofp_flow_expired *ofe;
- struct ofpbuf *buf;
-
- ofe = make_openflow(sizeof *ofe, OFPT_FLOW_EXPIRED, &buf);
- flow_to_match(&rule->cr.flow, rule->cr.wc.wildcards, &ofe->match);
- ofe->priority = htons(rule->cr.priority);
- ofe->reason = reason;
- ofe->duration = htonl((now - rule->created) / 1000);
- ofe->packet_count = htonll(rule->packet_count);
- ofe->byte_count = htonll(rule->byte_count);
-
- return buf;
-}
-
-static void
-send_flow_exp(struct ofproto *p, struct rule *rule,
- long long int now, uint8_t reason)
-{
- struct ofconn *ofconn;
- struct ofconn *prev;
- struct ofpbuf *buf;
-
- /* We limit the maximum number of queued flow expirations it by accounting
- * them under the counter for replies. That works because preventing
- * OpenFlow requests from being processed also prevents new flows from
- * being added (and expiring). (It also prevents processing OpenFlow
- * requests that would not add new flows, so it is imperfect.) */
-
- prev = NULL;
- LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
- if (ofconn->send_flow_exp && rconn_is_connected(ofconn->rconn)) {
- if (prev) {
- queue_tx(ofpbuf_clone(buf), prev, prev->reply_counter);
- } else {
- buf = compose_flow_exp(rule, now, reason);
- }
- prev = ofconn;
- }
- }
- if (prev) {
- queue_tx(buf, prev, prev->reply_counter);
- }
-}
-
-static void
-uninstall_idle_flow(struct ofproto *ofproto, struct rule *rule)
-{
- assert(rule->installed);
- assert(!rule->cr.wc.wildcards);
-
- if (rule->super) {
- rule_remove(ofproto, rule);
- } else {
- rule_uninstall(ofproto, rule);
- }
-}
-
-static void
-expire_rule(struct cls_rule *cls_rule, void *p_)
-{
- struct ofproto *p = p_;
- struct rule *rule = rule_from_cls_rule(cls_rule);
- long long int hard_expire, idle_expire, expire, now;
-
- hard_expire = (rule->hard_timeout
- ? rule->created + rule->hard_timeout * 1000
- : LLONG_MAX);
- idle_expire = (rule->idle_timeout
- && (rule->super || list_is_empty(&rule->list))
- ? rule->used + rule->idle_timeout * 1000
- : LLONG_MAX);
- expire = MIN(hard_expire, idle_expire);
-
- now = time_msec();
- if (now < expire) {
- if (rule->installed && now >= rule->used + 5000) {
- uninstall_idle_flow(p, rule);
- } else if (!rule->cr.wc.wildcards) {
- active_timeout(p, rule);
- }
-
- return;
- }
-
- COVERAGE_INC(ofproto_expired);
- if (rule->cr.wc.wildcards) {
- /* Update stats. (This code will be a no-op if the rule expired
- * due to an idle timeout, because in that case the rule has no
- * subrules left.) */
- struct rule *subrule, *next;
- LIST_FOR_EACH_SAFE (subrule, next, struct rule, list, &rule->list) {
- rule_remove(p, subrule);
- }
- }
-
- send_flow_exp(p, rule, now,
- (now >= hard_expire
- ? OFPER_HARD_TIMEOUT : OFPER_IDLE_TIMEOUT));
- rule_remove(p, rule);
-}
-
-static void
-active_timeout(struct ofproto *ofproto, struct rule *rule)
-{
- if (ofproto->netflow && !is_controller_rule(rule) &&
- netflow_active_timeout_expired(ofproto->netflow, &rule->nf_flow)) {
- struct ofexpired expired;
- struct odp_flow odp_flow;
-
- /* Get updated flow stats. */
- memset(&odp_flow, 0, sizeof odp_flow);
- if (rule->installed) {
- odp_flow.key = rule->cr.flow;
- odp_flow.flags = ODPFF_ZERO_TCP_FLAGS;
- dpif_flow_get(&ofproto->dpif, &odp_flow);
-
- if (odp_flow.stats.n_packets) {
- update_time(ofproto, rule, &odp_flow.stats);
- netflow_flow_update_flags(&rule->nf_flow, odp_flow.stats.ip_tos,
- odp_flow.stats.tcp_flags);
- }
- }
-
- expired.flow = rule->cr.flow;
- expired.packet_count = rule->packet_count +
- odp_flow.stats.n_packets;
- expired.byte_count = rule->byte_count + odp_flow.stats.n_bytes;
- expired.used = rule->used;
-
- netflow_expire(ofproto->netflow, &rule->nf_flow, &expired);
-
- /* Schedule us to send the accumulated records once we have
- * collected all of them. */
- poll_immediate_wake();
- }
-}
-
-static void
-update_used(struct ofproto *p)
-{
- struct odp_flow *flows;
- size_t n_flows;
- size_t i;
- int error;
-
- error = dpif_flow_list_all(&p->dpif, &flows, &n_flows);
- if (error) {
- return;
- }
-
- for (i = 0; i < n_flows; i++) {
- struct odp_flow *f = &flows[i];
- struct rule *rule;
-
- rule = rule_from_cls_rule(
- classifier_find_rule_exactly(&p->cls, &f->key, 0, UINT16_MAX));
- if (!rule || !rule->installed) {
- COVERAGE_INC(ofproto_unexpected_rule);
- dpif_flow_del(&p->dpif, f);
- continue;
- }
-
- update_time(p, rule, &f->stats);
- rule_account(p, rule, f->stats.n_bytes);
- }
- free(flows);
-}
-
-static void
-do_send_packet_in(struct ofconn *ofconn, uint32_t buffer_id,
- const struct ofpbuf *packet, int send_len)
-{
- struct odp_msg *msg = packet->data;
- struct ofpbuf payload;
- struct ofpbuf *opi;
- uint8_t reason;
-
- /* Extract packet payload from 'msg'. */
- payload.data = msg + 1;
- payload.size = msg->length - sizeof *msg;
-
- /* Construct ofp_packet_in message. */
- reason = msg->type == _ODPL_ACTION_NR ? OFPR_ACTION : OFPR_NO_MATCH;
- opi = make_packet_in(buffer_id, odp_port_to_ofp_port(msg->port), reason,
- &payload, send_len);
-
- /* Send. */
- rconn_send_with_limit(ofconn->rconn, opi, ofconn->packet_in_counter, 100);
-}
-
-static void
-send_packet_in_action(struct ofpbuf *packet, void *p_)
-{
- struct ofproto *p = p_;
- struct ofconn *ofconn;
- struct odp_msg *msg;
-
- msg = packet->data;
- LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
- if (ofconn == p->controller || ofconn->miss_send_len) {
- do_send_packet_in(ofconn, UINT32_MAX, packet, msg->arg);
- }
- }
- ofpbuf_delete(packet);
-}
-
-static void
-send_packet_in_miss(struct ofpbuf *packet, void *p_)
-{
- struct ofproto *p = p_;
- bool in_fail_open = p->fail_open && fail_open_is_active(p->fail_open);
- struct ofconn *ofconn;
- struct ofpbuf payload;
- struct odp_msg *msg;
-
- msg = packet->data;
- payload.data = msg + 1;
- payload.size = msg->length - sizeof *msg;
- LIST_FOR_EACH (ofconn, struct ofconn, node, &p->all_conns) {
- if (ofconn->miss_send_len) {
- struct pktbuf *pb = ofconn->pktbuf;
- uint32_t buffer_id = (in_fail_open
- ? pktbuf_get_null()
- : pktbuf_save(pb, &payload, msg->port));
- int send_len = (buffer_id != UINT32_MAX ? ofconn->miss_send_len
- : UINT32_MAX);
- do_send_packet_in(ofconn, buffer_id, packet, send_len);
- }
- }
- ofpbuf_delete(packet);
-}
-
-static uint64_t
-pick_datapath_id(struct dpif *dpif, uint64_t fallback_dpid)
-{
- char local_name[IF_NAMESIZE];
- uint8_t ea[ETH_ADDR_LEN];
- int error;
-
- error = dpif_get_name(dpif, local_name, sizeof local_name);
- if (!error) {
- error = netdev_nodev_get_etheraddr(local_name, ea);
- if (!error) {
- return eth_addr_to_uint64(ea);
- }
- VLOG_WARN("could not get MAC address for %s (%s)",
- local_name, strerror(error));
- }
-
- return fallback_dpid;
-}
-
-static uint64_t
-pick_fallback_dpid(void)
-{
- uint8_t ea[ETH_ADDR_LEN];
- eth_addr_random(ea);
- ea[0] = 0x00; /* Set Nicira OUI. */
- ea[1] = 0x23;
- ea[2] = 0x20;
- return eth_addr_to_uint64(ea);
-}
-\f
-static bool
-default_normal_ofhook_cb(const flow_t *flow, const struct ofpbuf *packet,
- struct odp_actions *actions, tag_type *tags,
- uint16_t *nf_output_iface, void *ofproto_)
-{
- struct ofproto *ofproto = ofproto_;
- int out_port;
-
- /* Drop frames for reserved multicast addresses. */
- if (eth_addr_is_reserved(flow->dl_dst)) {
- return true;
- }
-
- /* Learn source MAC (but don't try to learn from revalidation). */
- if (packet != NULL) {
- tag_type rev_tag = mac_learning_learn(ofproto->ml, flow->dl_src,
- 0, flow->in_port);
- if (rev_tag) {
- /* The log messages here could actually be useful in debugging,
- * so keep the rate limit relatively high. */
- static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(30, 300);
- VLOG_DBG_RL(&rl, "learned that "ETH_ADDR_FMT" is on port %"PRIu16,
- ETH_ADDR_ARGS(flow->dl_src), flow->in_port);
- ofproto_revalidate(ofproto, rev_tag);
- }
- }
-
- /* Determine output port. */
- out_port = mac_learning_lookup_tag(ofproto->ml, flow->dl_dst, 0, tags);
- if (out_port < 0) {
- add_output_group_action(actions, DP_GROUP_FLOOD, nf_output_iface);
- } else if (out_port != flow->in_port) {
- odp_actions_add(actions, ODPAT_OUTPUT)->output.port = out_port;
- *nf_output_iface = out_port;
- } else {
- /* Drop. */
- }
-
- return true;
-}
-
-static const struct ofhooks default_ofhooks = {
- NULL,
- default_normal_ofhook_cb,
- NULL,
- NULL
-};
+++ /dev/null
-/*
- * Copyright (c) 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef OFPROTO_H
-#define OFPROTO_H 1
-
-#include <stdbool.h>
-#include <stddef.h>
-#include <stdint.h>
-#include "flow.h"
-#include "netflow.h"
-#include "tag.h"
-
-struct odp_actions;
-struct ofhooks;
-struct ofproto;
-struct svec;
-
-struct ofexpired {
- flow_t flow;
- uint64_t packet_count; /* Packets from subrules. */
- uint64_t byte_count; /* Bytes from subrules. */
- long long int used; /* Last-used time (0 if never used). */
-};
-
-int ofproto_create(const char *datapath, const struct ofhooks *, void *aux,
- struct ofproto **ofprotop);
-void ofproto_destroy(struct ofproto *);
-int ofproto_run(struct ofproto *);
-int ofproto_run1(struct ofproto *);
-int ofproto_run2(struct ofproto *, bool revalidate_all);
-void ofproto_wait(struct ofproto *);
-bool ofproto_is_alive(const struct ofproto *);
-
-/* Configuration. */
-void ofproto_set_datapath_id(struct ofproto *, uint64_t datapath_id);
-void ofproto_set_mgmt_id(struct ofproto *, uint64_t mgmt_id);
-void ofproto_set_probe_interval(struct ofproto *, int probe_interval);
-void ofproto_set_max_backoff(struct ofproto *, int max_backoff);
-void ofproto_set_desc(struct ofproto *,
- const char *manufacturer, const char *hardware,
- const char *software, const char *serial);
-int ofproto_set_in_band(struct ofproto *, bool in_band);
-int ofproto_set_discovery(struct ofproto *, bool discovery,
- const char *accept_controller_re,
- bool update_resolv_conf);
-int ofproto_set_controller(struct ofproto *, const char *controller);
-int ofproto_set_listeners(struct ofproto *, const struct svec *listeners);
-int ofproto_set_snoops(struct ofproto *, const struct svec *snoops);
-int ofproto_set_netflow(struct ofproto *,
- const struct netflow_options *nf_options);
-void ofproto_set_failure(struct ofproto *, bool fail_open);
-void ofproto_set_rate_limit(struct ofproto *, int rate_limit, int burst_limit);
-int ofproto_set_stp(struct ofproto *, bool enable_stp);
-int ofproto_set_remote_execution(struct ofproto *, const char *command_acl,
- const char *command_dir);
-
-/* Configuration querying. */
-uint64_t ofproto_get_datapath_id(const struct ofproto *);
-uint64_t ofproto_get_mgmt_id(const struct ofproto *);
-int ofproto_get_probe_interval(const struct ofproto *);
-int ofproto_get_max_backoff(const struct ofproto *);
-bool ofproto_get_in_band(const struct ofproto *);
-bool ofproto_get_discovery(const struct ofproto *);
-const char *ofproto_get_controller(const struct ofproto *);
-void ofproto_get_listeners(const struct ofproto *, struct svec *);
-void ofproto_get_snoops(const struct ofproto *, struct svec *);
-void ofproto_get_all_flows(struct ofproto *p, struct ds *);
-
-/* Functions for use by ofproto implementation modules, not by clients. */
-int ofproto_send_packet(struct ofproto *, const flow_t *,
- const union ofp_action *, size_t n_actions,
- const struct ofpbuf *);
-void ofproto_add_flow(struct ofproto *, const flow_t *, uint32_t wildcards,
- unsigned int priority,
- const union ofp_action *, size_t n_actions,
- int idle_timeout);
-void ofproto_delete_flow(struct ofproto *, const flow_t *, uint32_t wildcards,
- unsigned int priority);
-void ofproto_flush_flows(struct ofproto *);
-
-/* Hooks for ovs-vswitchd. */
-struct ofhooks {
- void (*port_changed_cb)(enum ofp_port_reason, const struct ofp_phy_port *,
- void *aux);
- bool (*normal_cb)(const flow_t *, const struct ofpbuf *packet,
- struct odp_actions *, tag_type *,
- uint16_t *nf_output_iface, void *aux);
- void (*account_flow_cb)(const flow_t *, const union odp_action *,
- size_t n_actions, unsigned long long int n_bytes,
- void *aux);
- void (*account_checkpoint_cb)(void *aux);
-};
-void ofproto_revalidate(struct ofproto *, tag_type);
-struct tag_set *ofproto_get_revalidate_set(struct ofproto *);
-
-#endif /* ofproto.h */
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "pinsched.h"
-#include <arpa/inet.h>
-#include <stdlib.h>
-#include "ofpbuf.h"
-#include "openflow/openflow.h"
-#include "poll-loop.h"
-#include "port-array.h"
-#include "queue.h"
-#include "random.h"
-#include "rconn.h"
-#include "status.h"
-#include "timeval.h"
-#include "vconn.h"
-
-struct pinsched {
- /* Client-supplied parameters. */
- int rate_limit; /* Packets added to bucket per second. */
- int burst_limit; /* Maximum token bucket size, in packets. */
-
- /* One queue per physical port. */
- struct port_array queues; /* Array of "struct ovs_queue *". */
- int n_queued; /* Sum over queues[*].n. */
- unsigned int last_tx_port; /* Last port checked in round-robin. */
-
- /* Token bucket.
- *
- * It costs 1000 tokens to send a single packet_in message. A single token
- * per message would be more straightforward, but this choice lets us avoid
- * round-off error in refill_bucket()'s calculation of how many tokens to
- * add to the bucket, since no division step is needed. */
- long long int last_fill; /* Time at which we last added tokens. */
- int tokens; /* Current number of tokens. */
-
- /* Transmission queue. */
- int n_txq; /* No. of packets waiting in rconn for tx. */
-
- /* Statistics reporting. */
- unsigned long long n_normal; /* # txed w/o rate limit queuing. */
- unsigned long long n_limited; /* # queued for rate limiting. */
- unsigned long long n_queue_dropped; /* # dropped due to queue overflow. */
-
- /* Switch status. */
- struct status_category *ss_cat;
-};
-
-static struct ofpbuf *
-dequeue_packet(struct pinsched *ps, struct ovs_queue *q,
- unsigned int port_no)
-{
- struct ofpbuf *packet = queue_pop_head(q);
- if (!q->n) {
- free(q);
- port_array_set(&ps->queues, port_no, NULL);
- }
- ps->n_queued--;
- return packet;
-}
-
-/* Drop a packet from the longest queue in 'ps'. */
-static void
-drop_packet(struct pinsched *ps)
-{
- struct ovs_queue *longest; /* Queue currently selected as longest. */
- int n_longest; /* # of queues of same length as 'longest'. */
- unsigned int longest_port_no;
- unsigned int port_no;
- struct ovs_queue *q;
-
- ps->n_queue_dropped++;
-
- longest = port_array_first(&ps->queues, &port_no);
- longest_port_no = port_no;
- n_longest = 1;
- while ((q = port_array_next(&ps->queues, &port_no)) != NULL) {
- if (longest->n < q->n) {
- longest = q;
- n_longest = 1;
- } else if (longest->n == q->n) {
- n_longest++;
-
- /* Randomly select one of the longest queues, with a uniform
- * distribution (Knuth algorithm 3.4.2R). */
- if (!random_range(n_longest)) {
- longest = q;
- longest_port_no = port_no;
- }
- }
- }
-
- /* FIXME: do we want to pop the tail instead? */
- ofpbuf_delete(dequeue_packet(ps, longest, longest_port_no));
-}
-
-/* Remove and return the next packet to transmit (in round-robin order). */
-static struct ofpbuf *
-get_tx_packet(struct pinsched *ps)
-{
- struct ovs_queue *q = port_array_next(&ps->queues, &ps->last_tx_port);
- if (!q) {
- q = port_array_first(&ps->queues, &ps->last_tx_port);
- }
- return dequeue_packet(ps, q, ps->last_tx_port);
-}
-
-/* Add tokens to the bucket based on elapsed time. */
-static void
-refill_bucket(struct pinsched *ps)
-{
- long long int now = time_msec();
- long long int tokens = (now - ps->last_fill) * ps->rate_limit + ps->tokens;
- if (tokens >= 1000) {
- ps->last_fill = now;
- ps->tokens = MIN(tokens, ps->burst_limit * 1000);
- }
-}
-
-/* Attempts to remove enough tokens from 'ps' to transmit a packet. Returns
- * true if successful, false otherwise. (In the latter case no tokens are
- * removed.) */
-static bool
-get_token(struct pinsched *ps)
-{
- if (ps->tokens >= 1000) {
- ps->tokens -= 1000;
- return true;
- } else {
- return false;
- }
-}
-
-void
-pinsched_send(struct pinsched *ps, uint16_t port_no,
- struct ofpbuf *packet, pinsched_tx_cb *cb, void *aux)
-{
- if (!ps) {
- cb(packet, aux);
- } else if (!ps->n_queued && get_token(ps)) {
- /* In the common case where we are not constrained by the rate limit,
- * let the packet take the normal path. */
- ps->n_normal++;
- cb(packet, aux);
- } else {
- /* Otherwise queue it up for the periodic callback to drain out. */
- struct ovs_queue *q;
-
- /* We are called with a buffer obtained from dpif_recv() that has much
- * more allocated space than actual content most of the time. Since
- * we're going to store the packet for some time, free up that
- * otherwise wasted space. */
- ofpbuf_trim(packet);
-
- if (ps->n_queued >= ps->burst_limit) {
- drop_packet(ps);
- }
- q = port_array_get(&ps->queues, port_no);
- if (!q) {
- q = xmalloc(sizeof *q);
- queue_init(q);
- port_array_set(&ps->queues, port_no, q);
- }
- queue_push_tail(q, packet);
- ps->n_queued++;
- ps->n_limited++;
- }
-}
-
-static void
-pinsched_status_cb(struct status_reply *sr, void *ps_)
-{
- struct pinsched *ps = ps_;
-
- status_reply_put(sr, "normal=%llu", ps->n_normal);
- status_reply_put(sr, "limited=%llu", ps->n_limited);
- status_reply_put(sr, "queue-dropped=%llu", ps->n_queue_dropped);
-}
-
-void
-pinsched_run(struct pinsched *ps, pinsched_tx_cb *cb, void *aux)
-{
- if (ps) {
- int i;
-
- /* Drain some packets out of the bucket if possible, but limit the
- * number of iterations to allow other code to get work done too. */
- refill_bucket(ps);
- for (i = 0; ps->n_queued && get_token(ps) && i < 50; i++) {
- cb(get_tx_packet(ps), aux);
- }
- }
-}
-
-void
-pinsched_wait(struct pinsched *ps)
-{
- if (ps && ps->n_queued) {
- if (ps->tokens >= 1000) {
- /* We can transmit more packets as soon as we're called again. */
- poll_immediate_wake();
- } else {
- /* We have to wait for the bucket to re-fill. We could calculate
- * the exact amount of time here for increased smoothness. */
- poll_timer_wait(TIME_UPDATE_INTERVAL / 2);
- }
- }
-}
-
-/* Creates and returns a scheduler for sending packet-in messages. */
-struct pinsched *
-pinsched_create(int rate_limit, int burst_limit, struct switch_status *ss)
-{
- struct pinsched *ps;
-
- ps = xcalloc(1, sizeof *ps);
- port_array_init(&ps->queues);
- ps->n_queued = 0;
- ps->last_tx_port = PORT_ARRAY_SIZE;
- ps->last_fill = time_msec();
- ps->tokens = rate_limit * 100;
- ps->n_txq = 0;
- ps->n_normal = 0;
- ps->n_limited = 0;
- ps->n_queue_dropped = 0;
- pinsched_set_limits(ps, rate_limit, burst_limit);
-
- if (ss) {
- ps->ss_cat = switch_status_register(ss, "rate-limit",
- pinsched_status_cb, ps);
- }
-
- return ps;
-}
-
-void
-pinsched_destroy(struct pinsched *ps)
-{
- if (ps) {
- struct ovs_queue *queue;
- unsigned int port_no;
-
- PORT_ARRAY_FOR_EACH (queue, &ps->queues, port_no) {
- queue_destroy(queue);
- free(queue);
- }
- port_array_destroy(&ps->queues);
- switch_status_unregister(ps->ss_cat);
- free(ps);
- }
-}
-
-void
-pinsched_set_limits(struct pinsched *ps, int rate_limit, int burst_limit)
-{
- if (rate_limit <= 0) {
- rate_limit = 1000;
- }
- if (burst_limit <= 0) {
- burst_limit = rate_limit / 4;
- }
- burst_limit = MAX(burst_limit, 1);
- burst_limit = MIN(burst_limit, INT_MAX / 1000);
-
- ps->rate_limit = rate_limit;
- ps->burst_limit = burst_limit;
- while (ps->n_queued > burst_limit) {
- drop_packet(ps);
- }
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef PINSCHED_H
-#define PINSCHED_H_H 1
-
-#include <stdint.h>
-
-struct ofpbuf;
-struct switch_status;
-
-typedef void pinsched_tx_cb(struct ofpbuf *, void *aux);
-struct pinsched *pinsched_create(int rate_limit, int burst_limit,
- struct switch_status *);
-void pinsched_set_limits(struct pinsched *, int rate_limit, int burst_limit);
-void pinsched_destroy(struct pinsched *);
-void pinsched_send(struct pinsched *, uint16_t port_no, struct ofpbuf *,
- pinsched_tx_cb *, void *aux);
-void pinsched_run(struct pinsched *, pinsched_tx_cb *, void *aux);
-void pinsched_wait(struct pinsched *);
-
-#endif /* pinsched.h */
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "pktbuf.h"
-#include <inttypes.h>
-#include <stdlib.h>
-#include "coverage.h"
-#include "ofpbuf.h"
-#include "timeval.h"
-#include "util.h"
-#include "vconn.h"
-
-#define THIS_MODULE VLM_pktbuf
-#include "vlog.h"
-
-/* Buffers are identified by a 32-bit opaque ID. We divide the ID
- * into a buffer number (low bits) and a cookie (high bits). The buffer number
- * is an index into an array of buffers. The cookie distinguishes between
- * different packets that have occupied a single buffer. Thus, the more
- * buffers we have, the lower-quality the cookie... */
-#define PKTBUF_BITS 8
-#define PKTBUF_MASK (PKTBUF_CNT - 1)
-#define PKTBUF_CNT (1u << PKTBUF_BITS)
-
-#define COOKIE_BITS (32 - PKTBUF_BITS)
-#define COOKIE_MAX ((1u << COOKIE_BITS) - 1)
-
-#define OVERWRITE_MSECS 5000
-
-struct packet {
- struct ofpbuf *buffer;
- uint32_t cookie;
- long long int timeout;
- uint16_t in_port;
-};
-
-struct pktbuf {
- struct packet packets[PKTBUF_CNT];
- unsigned int buffer_idx;
- unsigned int null_idx;
-};
-
-int
-pktbuf_capacity(void)
-{
- return PKTBUF_CNT;
-}
-
-struct pktbuf *
-pktbuf_create(void)
-{
- return xcalloc(1, sizeof *pktbuf_create());
-}
-
-void
-pktbuf_destroy(struct pktbuf *pb)
-{
- if (pb) {
- size_t i;
-
- for (i = 0; i < PKTBUF_CNT; i++) {
- ofpbuf_delete(pb->packets[i].buffer);
- }
- free(pb);
- }
-}
-
-static unsigned int
-make_id(unsigned int buffer_idx, unsigned int cookie)
-{
- return buffer_idx | (cookie << PKTBUF_BITS);
-}
-
-/* Attempts to allocate an OpenFlow packet buffer id within 'pb'. The packet
- * buffer will store a copy of 'buffer' and the port number 'in_port', which
- * should be the datapath port number on which 'buffer' was received.
- *
- * If successful, returns the packet buffer id (a number other than
- * UINT32_MAX). pktbuf_retrieve() can later be used to retrieve the buffer and
- * its input port number (buffers do expire after a time, so this is not
- * guaranteed to be true forever). On failure, returns UINT32_MAX.
- *
- * The caller retains ownership of 'buffer'. */
-uint32_t
-pktbuf_save(struct pktbuf *pb, struct ofpbuf *buffer, uint16_t in_port)
-{
- struct packet *p = &pb->packets[pb->buffer_idx];
- pb->buffer_idx = (pb->buffer_idx + 1) & PKTBUF_MASK;
- if (p->buffer) {
- if (time_msec() < p->timeout) {
- return UINT32_MAX;
- }
- ofpbuf_delete(p->buffer);
- }
-
- /* Don't use maximum cookie value since all-1-bits ID is special. */
- if (++p->cookie >= COOKIE_MAX) {
- p->cookie = 0;
- }
- p->buffer = ofpbuf_clone(buffer);
- p->timeout = time_msec() + OVERWRITE_MSECS;
- p->in_port = in_port;
- return make_id(p - pb->packets, p->cookie);
-}
-
-/*
- * Allocates and returns a "null" packet buffer id. The returned packet buffer
- * id is considered valid by pktbuf_retrieve(), but it is not associated with
- * actual buffered data.
- *
- * This function is always successful.
- *
- * This is useful in one special case: with the current OpenFlow design, the
- * "fail-open" code cannot always know whether a connection to a controller is
- * actually valid until it receives a OFPT_PACKET_OUT or OFPT_FLOW_MOD request,
- * but at that point the packet in question has already been forwarded (since
- * we are still in "fail-open" mode). If the packet was buffered in the usual
- * way, then the OFPT_PACKET_OUT or OFPT_FLOW_MOD would cause a duplicate
- * packet in the network. Null packet buffer ids identify such a packet that
- * has already been forwarded, so that Open vSwitch can quietly ignore the
- * request to re-send it. (After that happens, the switch exits fail-open
- * mode.)
- *
- * See the top-level comment in fail-open.c for an overview.
- */
-uint32_t
-pktbuf_get_null(void)
-{
- return make_id(0, COOKIE_MAX);
-}
-
-/* Attempts to retrieve a saved packet with the given 'id' from 'pb'. Returns
- * 0 if successful, otherwise an OpenFlow error code constructed with
- * ofp_mkerr().
- *
- * On success, ordinarily stores the buffered packet in '*bufferp' and the
- * datapath port number on which the packet was received in '*in_port'. The
- * caller becomes responsible for freeing the buffer. However, if 'id'
- * identifies a "null" packet buffer (created with pktbuf_get_null()), stores
- * NULL in '*bufferp' and -1 in '*in_port'.
- *
- * On failure, stores NULL in in '*bufferp' and -1 in '*in_port'. */
-int
-pktbuf_retrieve(struct pktbuf *pb, uint32_t id, struct ofpbuf **bufferp,
- uint16_t *in_port)
-{
- static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 20);
- struct packet *p;
- int error;
-
- if (!pb) {
- VLOG_WARN_RL(&rl, "attempt to send buffered packet via connection "
- "without buffers");
- return ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_COOKIE);
- }
-
- p = &pb->packets[id & PKTBUF_MASK];
- if (p->cookie == id >> PKTBUF_BITS) {
- struct ofpbuf *buffer = p->buffer;
- if (buffer) {
- *bufferp = buffer;
- *in_port = p->in_port;
- p->buffer = NULL;
- COVERAGE_INC(pktbuf_retrieved);
- return 0;
- } else {
- COVERAGE_INC(pktbuf_reuse_error);
- VLOG_WARN_RL(&rl, "attempt to reuse buffer %08"PRIx32, id);
- error = ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BUFFER_EMPTY);
- }
- } else if (id >> PKTBUF_BITS != COOKIE_MAX) {
- COVERAGE_INC(pktbuf_bad_cookie);
- VLOG_WARN_RL(&rl, "cookie mismatch: %08"PRIx32" != %08"PRIx32,
- id, (id & PKTBUF_MASK) | (p->cookie << PKTBUF_BITS));
- error = ofp_mkerr(OFPET_BAD_REQUEST, OFPBRC_BAD_COOKIE);
- } else {
- COVERAGE_INC(pktbuf_null_cookie);
- VLOG_INFO_RL(&rl, "Received null cookie %08"PRIx32" (this is normal "
- "if the switch was recently in fail-open mode)", id);
- error = 0;
- }
- *bufferp = NULL;
- *in_port = -1;
- return error;
-}
-
-void
-pktbuf_discard(struct pktbuf *pb, uint32_t id)
-{
- struct packet *p = &pb->packets[id & PKTBUF_MASK];
- if (p->cookie == id >> PKTBUF_BITS) {
- ofpbuf_delete(p->buffer);
- p->buffer = NULL;
- }
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef PKTBUF_H
-#define PKTBUF_H 1
-
-#include <stdint.h>
-
-struct pktbuf;
-struct ofpbuf;
-
-int pktbuf_capacity(void);
-
-struct pktbuf *pktbuf_create(void);
-void pktbuf_destroy(struct pktbuf *);
-uint32_t pktbuf_save(struct pktbuf *, struct ofpbuf *buffer, uint16_t in_port);
-uint32_t pktbuf_get_null(void);
-int pktbuf_retrieve(struct pktbuf *, uint32_t id, struct ofpbuf **bufferp,
- uint16_t *in_port);
-void pktbuf_discard(struct pktbuf *, uint32_t id);
-
-#endif /* pktbuf.h */
+++ /dev/null
-.TH secchan 8 "March 2009" "Open vSwitch" "Open vSwitch Manual"
-.ds PN secchan
-
-.SH NAME
-secchan \- OpenFlow switch implementation
-
-.SH SYNOPSIS
-.B secchan
-[\fIoptions\fR] \fIdatapath\fR [\fIcontroller\fR]
-
-.SH DESCRIPTION
-The \fBsecchan\fR program implements an OpenFlow switch using a
-flow-based datapath. \fBsecchan\fR connects to an OpenFlow controller
-over TCP or SSL.
-
-The mandatory \fIdatapath\fR argument argument specifies the local datapath
-to relay. It takes one of the following forms:
-
-.so lib/dpif.man
-
-.PP
-The optional \fIcontroller\fR argument specifies how to connect to
-the OpenFlow controller. It takes one of the following forms:
-
-.RS
-.IP "\fBssl:\fIip\fR[\fB:\fIport\fR]"
-The specified SSL \fIport\fR (default: 6633) on the host at the given
-\fIip\fR, which must be expressed as an IP address (not a DNS name).
-The \fB--private-key\fR, \fB--certificate\fR, and \fB--ca-cert\fR
-options are mandatory when this form is used.
-
-.IP "\fBtcp:\fIip\fR[\fB:\fIport\fR]"
-The specified TCP \fIport\fR (default: 6633) on the host at the given
-\fIip\fR, which must be expressed as an IP address (not a DNS name).
-
-.TP
-\fBunix:\fIfile\fR
-The Unix domain server socket named \fIfile\fR.
-.RE
-
-.PP
-If \fIcontroller\fR is omitted, \fBsecchan\fR attempts to discover the
-location of the controller automatically (see below).
-
-.SS "Contacting the Controller"
-The OpenFlow switch must be able to contact the OpenFlow controller
-over the network. It can do so in one of two ways:
-
-.IP out-of-band
-In this configuration, OpenFlow traffic uses a network separate from
-the data traffic that it controls, that is, the switch does not use
-any of the network devices added to the datapath with \fBovs\-dpctl
-add\-if\fR in its communication with the controller.
-
-To use \fBsecchan\fR in a network with out-of-band control, specify
-\fB--out-of-band\fR on the \fBsecchan\fR command line. The control
-network must be configured separately, before or after \fBsecchan\fR
-is started.
-
-.IP in-band
-In this configuration, a single network is used for OpenFlow traffic
-and other data traffic, that is, the switch contacts the controller
-over one of the network devices added to the datapath with \fBovs\-dpctl
-add\-if\fR. This configuration is often more convenient than
-out-of-band control, because it is not necessary to maintain two
-independent networks.
-
-In-band control is the default for \fBsecchan\fR, so no special
-command-line option is required.
-
-With in-band control, the location of the controller can be configured
-manually or discovered automatically:
-
-.RS
-.IP "controller discovery"
-To make \fBsecchan\fR discover the location of the controller
-automatically, do not specify the location of the controller on the
-\fBsecchan\fR command line.
-
-In this mode, \fBsecchan\fR will broadcast a DHCP request with vendor
-class identifier \fBOpenFlow\fR across the network devices added to
-the datapath with \fBovs\-dpctl add\-if\fR. It will accept any valid DHCP
-reply that has the same vendor class identifier and includes a
-vendor-specific option with code 1 whose contents are a string
-specifying the location of the controller in the same format used on
-the \fBsecchan\fR command line (e.g. \fBssl:192.168.0.1\fR).
-
-The DHCP reply may also, optionally, include a vendor-specific option
-with code 2 whose contents are a string specifying the URI to the base
-of the OpenFlow PKI (e.g. \fBhttp://192.168.0.1/openflow/pki\fR).
-This URI is used only for bootstrapping the OpenFlow PKI at initial
-switch setup; \fBsecchan\fR does not use it at all.
-
-The following ISC DHCP server configuration file assigns the IP
-address range 192.168.0.20 through 192.168.0.30 to OpenFlow switches
-that follow the switch protocol and addresses 192.168.0.1 through
-192.168.0.10 to all other DHCP clients:
-
-default-lease-time 600;
-.br
-max-lease-time 7200;
-.br
-option space openflow;
-.br
-option openflow.controller-vconn code 1 = text;
-.br
-option openflow.pki-uri code 2 = text;
-.br
-class "OpenFlow" {
-.br
- match if option vendor-class-identifier = "OpenFlow";
-.br
- vendor-option-space openflow;
-.br
- option openflow.controller-vconn "tcp:192.168.0.10";
-.br
- option openflow.pki-uri "http://192.168.0.10/openflow/pki";
-.br
- option vendor-class-identifier "OpenFlow";
-.br
-}
-.br
-subnet 192.168.0.0 netmask 255.255.255.0 {
-.br
- pool {
-.br
- allow members of "OpenFlow";
-.br
- range 192.168.0.20 192.168.0.30;
-.br
- }
-.br
- pool {
-.br
- deny members of "OpenFlow";
-.br
- range 192.168.0.1 192.168.0.10;
-.br
- }
-.br
-}
-.br
-
-.IP "manual configuration"
-To configure in-band control manually, specify the location of the
-controller on the \fBsecchan\fR command line as the \fIcontroller\fR
-argument. You must also configure the network device for the OpenFlow
-``local port'' to allow \fBsecchan\fR to connect to that controller.
-The OpenFlow local port is a virtual network port that \fBsecchan\fR
-bridges to the physical switch ports. The name of the local port for
-a given \fIdatapath\fR may be seen by running \fBovs\-dpctl show
-\fIdatapath\fR; the local port is listed as port 0 in \fBshow\fR's
-output.
-
-.IP
-Before \fBsecchan\fR starts, the local port network device is not
-bridged to any physical network, so the next step depends on whether
-connectivity is required to configure the device's IP address. If the
-switch has a static IP address, you may configure its IP address now
-with a command such as
-.B ifconfig of0 192.168.1.1
-and then invoke \fBsecchan\fR.
-
-On the other hand, if the switch does not have a static IP address,
-e.g. it obtains its IP address dynamically via DHCP, the DHCP client
-will not be able to contact the DHCP server until the secure channel
-has started up. Thus, start \fBsecchan\fR without configuring
-the local port network device, and start the DHCP client afterward.
-.RE
-
-.SH OPTIONS
-.SS "Controller Discovery Options"
-.TP
-\fB--accept-vconn=\fIregex\fR
-When \fBsecchan\fR performs controller discovery (see \fBContacting
-the Controller\fR, above, for more information about controller
-discovery), it validates the controller location obtained via DHCP
-with a POSIX extended regular expression. Only controllers whose
-names match the regular expression will be accepted.
-
-The default regular expression is \fBssl:.*\fR (meaning that only SSL
-controller connections will be accepted) when any of the SSL
-configuration options \fB--private-key\fR, \fB--certificate\fR, or
-\fB--ca-cert\fR is specified. The default is \fB.*\fR otherwise
-(meaning that any controller will be accepted).
-
-The \fIregex\fR is implicitly anchored at the beginning of the
-controller location string, as if it begins with \fB^\fR.
-
-When controller discovery is not performed, this option has no effect.
-
-.TP
-\fB--no-resolv-conf\fR
-When \fBsecchan\fR performs controller discovery (see \fBContacting
-the Controller\fR, above, for more information about controller
-discovery), by default it overwrites the system's
-\fB/etc/resolv.conf\fR with domain information and DNS servers
-obtained via DHCP. If the location of the controller is specified
-using a hostname, rather than an IP address, and the network's DNS
-servers ever change, this behavior is essential. But because it also
-interferes with any administrator or process that manages
-\fB/etc/resolv.conf\fR, when this option is specified, \fBsecchan\fR
-will not modify \fB/etc/resolv.conf\fR.
-
-\fBsecchan\fR will only modify \fBresolv.conf\fR if the DHCP response
-that it receives specifies one or more DNS servers.
-
-When controller discovery is not performed, this option has no effect.
-
-.SS "Networking Options"
-.TP
-\fB--datapath-id=\fIdpid\fR
-Sets \fIdpid\fR, which must consist of exactly 12 hexadecimal digits,
-as the datapath ID that the switch will use to identify itself to the
-OpenFlow controller.
-
-If this option is omitted, the default datapath ID is taken from the
-Ethernet address of the datapath's local port (which is typically
-randomly generated).
-
-.TP
-\fB--mgmt-id=\fImgmtid\fR
-Sets \fImgmtid\fR, which must consist of exactly 12 hexadecimal
-digits, as the switch's management ID.
-
-If this option is omitted, the management ID defaults to 0, signaling
-to the controller that management is supported but not configured.
-
-.TP
-\fB--fail=\fR[\fBopen\fR|\fBclosed\fR]
-The controller is, ordinarily, responsible for setting up all flows on
-the OpenFlow switch. Thus, if the connection to the controller fails,
-no new network connections can be set up. If the connection to the
-controller stays down long enough, no packets can pass through the
-switch at all.
-
-If this option is set to \fBopen\fR (the default), \fBsecchan\fR will
-take over responsibility for setting up flows in the local datapath
-when no message has been received from the controller for three times
-the inactivity probe interval (see below), or 45 seconds by default.
-In this ``fail open'' mode, \fBsecchan\fR causes the datapath to act
-like an ordinary MAC-learning switch. \fBsecchan\fR will continue to
-retry connection to the controller in the background and, when the
-connection succeeds, it discontinues its fail-open behavior.
-
-If this option is set to \fBclosed\fR, then \fBsecchan\fR will not
-set up flows on its own when the controller connection fails.
-
-.TP
-\fB--inactivity-probe=\fIsecs\fR
-When the secure channel is connected to the controller, the secure
-channel waits for a message to be received from the controller for
-\fIsecs\fR seconds before it sends a inactivity probe to the
-controller. After sending the inactivity probe, if no response is
-received for an additional \fIsecs\fR seconds, the secure channel
-assumes that the connection has been broken and attempts to reconnect.
-The default and the minimum value are both 5 seconds.
-
-When fail-open mode is configured, changing the inactivity probe
-interval also changes the interval before entering fail-open mode (see
-above).
-
-.TP
-\fB--max-idle=\fIsecs\fR|\fBpermanent\fR
-Sets \fIsecs\fR as the number of seconds that a flow set up by the
-secure channel will remain in the switch's flow table without any
-matching packets being seen. If \fBpermanent\fR is specified, which
-is not recommended, flows set up by the secure channel will never
-expire. The default is 15 seconds.
-
-Most flows are set up by the OpenFlow controller, not by the secure
-channel. This option affects only the following flows, which the
-secure channel sets up itself:
-
-.RS
-.IP \(bu
-When \fB--fail=open\fR is specified, flows set up when the secure
-channel has not been able to contact the controller for the configured
-fail-open delay.
-
-.IP \(bu
-When in-band control is in use, flows set up to bootstrap contacting
-the controller (see \fBContacting the Controller\fR, above, for
-more information about in-band control).
-.RE
-
-.IP
-As a result, when both \fB--fail=closed\fR and \fB--out-of-band\fR are
-specified, this option has no effect.
-
-.TP
-\fB--max-backoff=\fIsecs\fR
-Sets the maximum time between attempts to connect to the controller to
-\fIsecs\fR, which must be at least 1. The actual interval between
-connection attempts starts at 1 second and doubles on each failing
-attempt until it reaches the maximum. The default maximum backoff
-time is 8 seconds.
-
-.TP
-\fB-l\fR, \fB--listen=\fImethod\fR
-Configures the switch to additionally listen for incoming OpenFlow
-connections for switch management with \fBovs\-ofctl\fR. The \fImethod\fR
-must be given as one of the passive OpenFlow connection methods listed
-below. This option may be specified multiple times to listen to
-multiple connection methods.
-
-.RS
-.TP
-\fBpssl:\fR[\fIport\fR]
-Listens for SSL connections on \fIport\fR (default: 6633). The
-\fB--private-key\fR, \fB--certificate\fR, and \fB--ca-cert\fR options
-are mandatory when this form is used.
-
-.TP
-\fBptcp:\fR[\fIport\fR]
-Listens for TCP connections on \fIport\fR (default: 6633).
-
-.TP
-\fBpunix:\fIfile\fR
-Listens for connections on Unix domain server socket named \fIfile\fR.
-.RE
-
-.TP
-\fB--snoop=\fImethod\fR
-Configures the switch to additionally listen for incoming OpenFlow
-connections for controller connection snooping. The \fImethod\fR must
-be given as one of the passive OpenFlow connection methods listed
-under the \fB--listen\fR option above. This option may be specified
-multiple times to listen to multiple connection methods.
-
-If \fBovs\-ofctl monitor\fR is used to connect to \fImethod\fR specified on
-\fB--snoop\fR, it will display all the OpenFlow messages traveling
-between the switch and its controller on the primary OpenFlow
-connection. This can be useful for debugging switch and controller
-problems.
-
-.TP
-\fB--in-band\fR, \fB--out-of-band\fR
-Configures \fBsecchan\fR to operate in in-band or out-of-band control
-mode (see \fBContacting the Controller\fR above). When neither option
-is given, the default is in-band control.
-
-.TP
-\fB--netflow=\fIip\fB:\fIport\fR
-Configures the given UDP \fIport\fR on the specified IP \fIip\fR as
-a recipient of NetFlow messages for expired flows. The \fIip\fR must
-be specified numerically, not as a DNS name.
-
-This option may be specified multiple times to configure additional
-NetFlow collectors.
-
-.SS "Rate-Limiting Options"
-
-These options configure how the switch applies a ``token bucket'' to
-limit the rate at which packets in unknown flows are forwarded to an
-OpenFlow controller for flow-setup processing. This feature prevents
-a single OpenFlow switch from overwhelming a controller.
-
-.TP
-\fB--rate-limit\fR[\fB=\fIrate\fR]
-.
-Limits the maximum rate at which packets will be forwarded to the
-OpenFlow controller to \fIrate\fR packets per second. If \fIrate\fR
-is not specified then the default of 1,000 packets per second is used.
-
-If \fB--rate-limit\fR is not used, then the switch does not limit the
-rate at which packets are forwarded to the controller.
-
-.TP
-\fB--burst-limit=\fIburst\fR
-.
-Sets the maximum number of unused packet credits that the switch will
-allow to accumulate during time in which no packets are being
-forwarded to the OpenFlow controller to \fIburst\fR (measured in
-packets). The default \fIburst\fR is one-quarter of the \fIrate\fR
-specified on \fB--rate-limit\fR.
-
-This option takes effect only when \fB--rate-limit\fR is also specified.
-
-.SS "Remote Command Execution Options"
-
-.TP
-\fB--command-acl=\fR[\fB!\fR]\fIglob\fR[\fB,\fR[\fB!\fR]\fIglob\fR...]
-Configures the commands that remote OpenFlow connections are allowed
-to invoke using (e.g.) \fBovs\-ofctl execute\fR. The argument is a
-comma-separated sequence of shell glob patterns. A glob pattern
-specified without a leading \fB!\fR is a ``whitelist'' that specifies
-a set of commands that are that may be invoked, whereas a pattern that
-does begin with \fB!\fR is a ``blacklist'' that specifies commands
-that may not be invoked. To be permitted, a command name must be
-whitelisted and must not be blacklisted;
-e.g. \fB--command-acl=up*,!upgrade\fR would allow any command whose name
-begins with \fBup\fR except for the command named \fBupgrade\fR.
-Command names that include characters other than upper- and lower-case
-English letters, digits, and the underscore and hyphen characters are
-unconditionally disallowed.
-
-When the whitelist and blacklist permit a command name, \fBsecchan\fR
-looks for a program with the same name as the command in the commands
-directory (see below). Other directories are not searched.
-
-.TP
-\fB--command-dir=\fIdirectory\fR
-Sets the directory searched for remote command execution to
-\fBdirectory\fR. The default directory is
-\fB@pkgdatadir@/commands\fR.
-
-.SS "Daemon Options"
-.so lib/daemon.man
-
-.SS "Public Key Infrastructure Options"
-
-.TP
-\fB-p\fR, \fB--private-key=\fIprivkey.pem\fR
-Specifies a PEM file containing the private key used as the switch's
-identity for SSL connections to the controller.
-
-.TP
-\fB-c\fR, \fB--certificate=\fIcert.pem\fR
-Specifies a PEM file containing a certificate, signed by the
-controller's certificate authority (CA), that certifies the switch's
-private key to identify a trustworthy switch.
-
-.TP
-\fB-C\fR, \fB--ca-cert=\fIcacert.pem\fR
-Specifies a PEM file containing the CA certificate used to verify that
-the switch is connected to a trustworthy controller.
-
-.TP
-\fB--bootstrap-ca-cert=\fIcacert.pem\fR
-When \fIcacert.pem\fR exists, this option has the same effect as
-\fB-C\fR or \fB--ca-cert\fR. If it does not exist, then \fBsecchan\fR
-will attempt to obtain the CA certificate from the controller on its
-first SSL connection and save it to the named PEM file. If it is
-successful, it will immediately drop the connection and reconnect, and
-from then on all SSL connections must be authenticated by a
-certificate signed by the CA certificate thus obtained.
-
-\fBThis option exposes the SSL connection to a man-in-the-middle
-attack obtaining the initial CA certificate\fR, but it may be useful
-for bootstrapping.
-
-This option is only useful if the controller sends its CA certificate
-as part of the SSL certificate chain. The SSL protocol does not
-require the controller to send the CA certificate, but
-\fBcontroller\fR(8) can be configured to do so with the
-\fB--peer-ca-cert\fR option.
-
-.SS "Logging Options"
-.so lib/vlog.man
-.SS "Other Options"
-.so lib/common.man
-.so lib/leak-checker.man
-
-.SH "SEE ALSO"
-
-.BR ovs\-appctl (8),
-.BR ovs\-controller (8),
-.BR ovs\-discover (8),
-.BR ovs\-dpctl (8),
-.BR ovs\-ofctl (8),
-.BR ovs\-pki (8),
-.BR ovs\-vswitchd.conf (5)
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-#include "status.h"
-#include <arpa/inet.h>
-#include <assert.h>
-#include <errno.h>
-#include <inttypes.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include "dynamic-string.h"
-#include "list.h"
-#include "ofpbuf.h"
-#include "ofproto.h"
-#include "openflow/nicira-ext.h"
-#include "packets.h"
-#include "rconn.h"
-#include "svec.h"
-#include "timeval.h"
-#include "vconn.h"
-
-#define THIS_MODULE VLM_status
-#include "vlog.h"
-
-struct status_category {
- struct list node;
- char *name;
- void (*cb)(struct status_reply *, void *aux);
- void *aux;
-};
-
-struct switch_status {
- time_t booted;
- struct status_category *config_cat;
- struct status_category *switch_cat;
- struct list categories;
-};
-
-struct status_reply {
- struct status_category *category;
- struct ds request;
- struct ds output;
-};
-
-int
-switch_status_handle_request(struct switch_status *ss, struct rconn *rconn,
- struct nicira_header *request)
-{
- struct status_category *c;
- struct nicira_header *reply;
- struct status_reply sr;
- struct ofpbuf *b;
- int retval;
-
- sr.request.string = (void *) (request + 1);
- sr.request.length = ntohs(request->header.length) - sizeof *request;
- ds_init(&sr.output);
- LIST_FOR_EACH (c, struct status_category, node, &ss->categories) {
- if (!memcmp(c->name, sr.request.string,
- MIN(strlen(c->name), sr.request.length))) {
- sr.category = c;
- c->cb(&sr, c->aux);
- }
- }
- reply = make_openflow_xid(sizeof *reply + sr.output.length,
- OFPT_VENDOR, request->header.xid, &b);
- reply->vendor = htonl(NX_VENDOR_ID);
- reply->subtype = htonl(NXT_STATUS_REPLY);
- memcpy(reply + 1, sr.output.string, sr.output.length);
- retval = rconn_send(rconn, b, NULL);
- if (retval && retval != EAGAIN) {
- VLOG_WARN("send failed (%s)", strerror(retval));
- }
- ds_destroy(&sr.output);
- return 0;
-}
-
-void
-rconn_status_cb(struct status_reply *sr, void *rconn_)
-{
- struct rconn *rconn = rconn_;
- time_t now = time_now();
- uint32_t remote_ip = rconn_get_remote_ip(rconn);
- uint32_t local_ip = rconn_get_local_ip(rconn);
-
- status_reply_put(sr, "name=%s", rconn_get_name(rconn));
- if (remote_ip) {
- status_reply_put(sr, "remote-ip="IP_FMT, IP_ARGS(&remote_ip));
- status_reply_put(sr, "remote-port=%d",
- ntohs(rconn_get_remote_port(rconn)));
- status_reply_put(sr, "local-ip="IP_FMT, IP_ARGS(&local_ip));
- status_reply_put(sr, "local-port=%d",
- ntohs(rconn_get_local_port(rconn)));
- }
- status_reply_put(sr, "state=%s", rconn_get_state(rconn));
- status_reply_put(sr, "backoff=%d", rconn_get_backoff(rconn));
- status_reply_put(sr, "probe-interval=%d", rconn_get_probe_interval(rconn));
- status_reply_put(sr, "is-connected=%s",
- rconn_is_connected(rconn) ? "true" : "false");
- status_reply_put(sr, "sent-msgs=%u", rconn_packets_sent(rconn));
- status_reply_put(sr, "received-msgs=%u", rconn_packets_received(rconn));
- status_reply_put(sr, "attempted-connections=%u",
- rconn_get_attempted_connections(rconn));
- status_reply_put(sr, "successful-connections=%u",
- rconn_get_successful_connections(rconn));
- status_reply_put(sr, "last-connection=%ld",
- (long int) (now - rconn_get_last_connection(rconn)));
- status_reply_put(sr, "last-received=%ld",
- (long int) (now - rconn_get_last_received(rconn)));
- status_reply_put(sr, "time-connected=%lu",
- rconn_get_total_time_connected(rconn));
- status_reply_put(sr, "state-elapsed=%u", rconn_get_state_elapsed(rconn));
-}
-
-static void
-config_status_cb(struct status_reply *sr, void *ofproto_)
-{
- const struct ofproto *ofproto = ofproto_;
- uint64_t datapath_id, mgmt_id;
- struct svec listeners;
- int probe_interval, max_backoff;
- size_t i;
-
- datapath_id = ofproto_get_datapath_id(ofproto);
- if (datapath_id) {
- status_reply_put(sr, "datapath-id=%"PRIx64, datapath_id);
- }
-
- mgmt_id = ofproto_get_mgmt_id(ofproto);
- if (mgmt_id) {
- status_reply_put(sr, "mgmt-id=%"PRIx64, mgmt_id);
- }
-
- svec_init(&listeners);
- ofproto_get_listeners(ofproto, &listeners);
- for (i = 0; i < listeners.n; i++) {
- status_reply_put(sr, "management%zu=%s", i, listeners.names[i]);
- }
- svec_destroy(&listeners);
-
- probe_interval = ofproto_get_probe_interval(ofproto);
- if (probe_interval) {
- status_reply_put(sr, "probe-interval=%d", probe_interval);
- }
-
- max_backoff = ofproto_get_max_backoff(ofproto);
- if (max_backoff) {
- status_reply_put(sr, "max-backoff=%d", max_backoff);
- }
-}
-
-static void
-switch_status_cb(struct status_reply *sr, void *ss_)
-{
- struct switch_status *ss = ss_;
- time_t now = time_now();
-
- status_reply_put(sr, "now=%ld", (long int) now);
- status_reply_put(sr, "uptime=%ld", (long int) (now - ss->booted));
- status_reply_put(sr, "pid=%ld", (long int) getpid());
-}
-
-struct switch_status *
-switch_status_create(const struct ofproto *ofproto)
-{
- struct switch_status *ss = xcalloc(1, sizeof *ss);
- ss->booted = time_now();
- list_init(&ss->categories);
- ss->config_cat = switch_status_register(ss, "config", config_status_cb,
- (void *) ofproto);
- ss->switch_cat = switch_status_register(ss, "switch", switch_status_cb,
- ss);
- return ss;
-}
-
-void
-switch_status_destroy(struct switch_status *ss)
-{
- if (ss) {
- /* Orphan any remaining categories, so that unregistering them later
- * won't write to bad memory. */
- struct status_category *c, *next;
- LIST_FOR_EACH_SAFE (c, next,
- struct status_category, node, &ss->categories) {
- list_init(&c->node);
- }
- switch_status_unregister(ss->config_cat);
- switch_status_unregister(ss->switch_cat);
- free(ss);
- }
-}
-
-struct status_category *
-switch_status_register(struct switch_status *ss,
- const char *category,
- status_cb_func *cb, void *aux)
-{
- struct status_category *c = xmalloc(sizeof *c);
- c->cb = cb;
- c->aux = aux;
- c->name = xstrdup(category);
- list_push_back(&ss->categories, &c->node);
- return c;
-}
-
-void
-switch_status_unregister(struct status_category *c)
-{
- if (c) {
- if (!list_is_empty(&c->node)) {
- list_remove(&c->node);
- }
- free(c->name);
- free(c);
- }
-}
-
-void
-status_reply_put(struct status_reply *sr, const char *content, ...)
-{
- size_t old_length = sr->output.length;
- size_t added;
- va_list args;
-
- /* Append the status reply to the output. */
- ds_put_format(&sr->output, "%s.", sr->category->name);
- va_start(args, content);
- ds_put_format_valist(&sr->output, content, args);
- va_end(args);
- if (ds_last(&sr->output) != '\n') {
- ds_put_char(&sr->output, '\n');
- }
-
- /* Drop what we just added if it doesn't match the request. */
- added = sr->output.length - old_length;
- if (added < sr->request.length
- || memcmp(&sr->output.string[old_length],
- sr->request.string, sr->request.length)) {
- ds_truncate(&sr->output, old_length);
- }
-}
+++ /dev/null
-/*
- * Copyright (c) 2008, 2009 Nicira Networks.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef STATUS_H
-#define STATUS_H 1
-
-#include "compiler.h"
-
-struct nicira_header;
-struct rconn;
-struct secchan;
-struct ofproto;
-struct status_reply;
-
-struct switch_status *switch_status_create(const struct ofproto *);
-void switch_status_destroy(struct switch_status *);
-
-int switch_status_handle_request(struct switch_status *, struct rconn *,
- struct nicira_header *);
-
-typedef void status_cb_func(struct status_reply *, void *aux);
-struct status_category *switch_status_register(struct switch_status *,
- const char *category,
- status_cb_func *, void *aux);
-void switch_status_unregister(struct status_category *);
-
-void status_reply_put(struct status_reply *, const char *, ...)
- PRINTF_FORMAT(2, 3);
-
-void rconn_status_cb(struct status_reply *, void *rconn_);
-
-#endif /* status.h */
/test-list
/test-stp
/test-type-props
+/testsuite
--- /dev/null
+# -*- shell-script -*-
+PERL='@PERL@'
+LCOV='@LCOV@'
-TESTS += tests/test-classifier
+EXTRA_DIST += \
+ $(TESTSUITE_AT) \
+ $(TESTSUITE) \
+ tests/atlocal.in \
+ $(srcdir)/package.m4 \
+ $(srcdir)/tests/testsuite
+TESTSUITE_AT = \
+ tests/testsuite.at \
+ tests/lcov-pre.at \
+ tests/library.at \
+ tests/stp.at \
+ tests/ovs-vsctl.at \
+ tests/lcov-post.at
+TESTSUITE = $(srcdir)/tests/testsuite
+DISTCLEANFILES += tests/atconfig tests/atlocal $(TESTSUITE)
+
+check-local: tests/atconfig tests/atlocal $(TESTSUITE)
+ $(SHELL) '$(TESTSUITE)' -C tests AUTOTEST_PATH='utilities:vswitchd:tests' $(TESTSUITEFLAGS)
+
+clean-local:
+ test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean
+
+AUTOM4TE = autom4te
+AUTOTEST = $(AUTOM4TE) --language=autotest
+$(TESTSUITE): package.m4 $(TESTSUITE_AT)
+ $(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
+ mv $@.tmp $@
+
+# The `:;' works around a Bash 3.2 bug when the output is not writeable.
+$(srcdir)/package.m4: $(top_srcdir)/configure.ac
+ :;{ \
+ echo '# Signature of the current package.' && \
+ echo 'm4_define([AT_PACKAGE_NAME], [@PACKAGE_NAME@])' && \
+ echo 'm4_define([AT_PACKAGE_TARNAME], [@PACKAGE_TARNAME@])' && \
+ echo 'm4_define([AT_PACKAGE_VERSION], [@PACKAGE_VERSION@])' && \
+ echo 'm4_define([AT_PACKAGE_STRING], [@PACKAGE_STRING@])' && \
+ echo 'm4_define([AT_PACKAGE_BUGREPORT], [@PACKAGE_BUGREPORT@])'; \
+ } >'$(srcdir)/package.m4'
+
noinst_PROGRAMS += tests/test-classifier
tests_test_classifier_SOURCES = tests/test-classifier.c
tests_test_classifier_LDADD = lib/libopenvswitch.a
-TESTS += tests/test-csum
noinst_PROGRAMS += tests/test-csum
tests_test_csum_SOURCES = tests/test-csum.c
tests_test_csum_LDADD = lib/libopenvswitch.a
-TESTS += tests/test-flows.sh
noinst_PROGRAMS += tests/test-flows
tests_test_flows_SOURCES = tests/test-flows.c
tests_test_flows_LDADD = lib/libopenvswitch.a
-dist_check_SCRIPTS = tests/test-flows.sh tests/flowgen.pl
+dist_check_SCRIPTS = tests/flowgen.pl
-TESTS += tests/test-hash
noinst_PROGRAMS += tests/test-hash
tests_test_hash_SOURCES = tests/test-hash.c
tests_test_hash_LDADD = lib/libopenvswitch.a
-TESTS += tests/test-hmap
noinst_PROGRAMS += tests/test-hmap
tests_test_hmap_SOURCES = tests/test-hmap.c
tests_test_hmap_LDADD = lib/libopenvswitch.a
-TESTS += tests/test-list
noinst_PROGRAMS += tests/test-list
tests_test_list_SOURCES = tests/test-list.c
tests_test_list_LDADD = lib/libopenvswitch.a
-TESTS += tests/test-sha1
noinst_PROGRAMS += tests/test-sha1
tests_test_sha1_SOURCES = tests/test-sha1.c
tests_test_sha1_LDADD = lib/libopenvswitch.a
-TESTS += tests/test-type-props
noinst_PROGRAMS += tests/test-type-props
tests_test_type_props_SOURCES = tests/test-type-props.c
tests_test_dhcp_client_SOURCES = tests/test-dhcp-client.c
tests_test_dhcp_client_LDADD = lib/libopenvswitch.a $(FAULT_LIBS)
-TESTS += tests/test-stp.sh
-EXTRA_DIST += tests/test-stp.sh
noinst_PROGRAMS += tests/test-stp
-
tests_test_stp_SOURCES = tests/test-stp.c
tests_test_stp_LDADD = lib/libopenvswitch.a
-stp_files = \
- tests/test-stp-ieee802.1d-1998 \
- tests/test-stp-ieee802.1d-2004-fig17.4 \
- tests/test-stp-ieee802.1d-2004-fig17.6 \
- tests/test-stp-ieee802.1d-2004-fig17.7 \
- tests/test-stp-iol-op-1.1 \
- tests/test-stp-iol-op-1.4 \
- tests/test-stp-iol-op-3.1 \
- tests/test-stp-iol-op-3.3 \
- tests/test-stp-iol-io-1.1 \
- tests/test-stp-iol-io-1.2 \
- tests/test-stp-iol-io-1.4 \
- tests/test-stp-iol-io-1.5
-TESTS_ENVIRONMENT += stp_files='$(stp_files)'
-
-EXTRA_DIST += $(stp_files)
+
+noinst_PROGRAMS += tests/test-vconn
+tests_test_vconn_SOURCES = tests/test-vconn.c
+tests_test_vconn_LDADD = lib/libopenvswitch.a $(SSL_LIBS)
+
--- /dev/null
+AT_BANNER([code coverage])
+
+AT_SETUP([generate coverage.html with lcov])
+AT_CHECK([$LCOV || exit 77])
+AT_CHECK([cd $abs_builddir && genhtml -o coverage.html coverage.info], [0], [ignore], [ignore])
+AT_CLEANUP
--- /dev/null
+AT_BANNER([code coverage])
+
+m4_define([_OVS_RUN_LCOV], [test $LCOV = false || lcov -b $abs_top_builddir -d $abs_top_builddir $1])
+
+AT_SETUP([initialize lcov])
+AT_CHECK([rm -fr $abs_builddir/coverage.html])
+AT_CHECK([rm -f $abs_builddir/coverage.info])
+AT_CHECK([$LCOV || exit 77])
+AT_CHECK([_OVS_RUN_LCOV([-c -i -o - > $abs_builddir/coverage.info])], [0], [ignore], [ignore])
+AT_CLEANUP
+
+# OVS_CHECK_LCOV(COMMAND, [STATUS = `0'], [STDOUT = `'], [STDERR = `'],
+# [RUN-IF-FAIL], [RUN-IF-PASS])
+#
+# This macro is equivalent to AT_CHECK, except that COMMAND should be a single
+# shell command that invokes a program whose code coverage is to be measured
+# (if configure was invoked with --coverage).
+m4_define([OVS_CHECK_LCOV],
+ [AT_CHECK([_OVS_RUN_LCOV([-z])], [0], [ignore], [ignore])
+ AT_CHECK($@)
+ AT_CHECK([_OVS_RUN_LCOV([-c -o - >> $abs_builddir/coverage.info])], [0], [ignore], [ignore])])
--- /dev/null
+AT_BANNER([library unit tests])
+
+AT_SETUP([test flow extractor])
+AT_CHECK([$PERL `which flowgen.pl` >/dev/null 3>flows 4>pcap])
+OVS_CHECK_LCOV([test-flows <flows 3<pcap], [0], [checked 247 packets, 0 errors
+])
+AT_CLEANUP
+
+AT_SETUP([test TCP/IP checksumming])
+OVS_CHECK_LCOV([test-csum], [0], [ignore])
+AT_CLEANUP
+
+AT_SETUP([test flow classifier])
+OVS_CHECK_LCOV([test-classifier], [0], [ignore])
+AT_CLEANUP
+
+AT_SETUP([test hash functions])
+OVS_CHECK_LCOV([test-hash], [0], [ignore])
+AT_CLEANUP
+
+AT_SETUP([test hash map])
+OVS_CHECK_LCOV([test-hmap], [0], [ignore])
+AT_CLEANUP
+
+AT_SETUP([test linked lists])
+OVS_CHECK_LCOV([test-list], [0], [ignore])
+AT_CLEANUP
+
+AT_SETUP([test SHA-1])
+OVS_CHECK_LCOV([test-sha1], [0], [ignore])
+AT_CLEANUP
+
+AT_SETUP([test type properties])
+OVS_CHECK_LCOV([test-type-props], [0], [ignore])
+AT_CLEANUP
+
+AT_SETUP([test vconn library])
+OVS_CHECK_LCOV([test-vconn], [0], [ignore])
+AT_CLEANUP
--- /dev/null
+dnl RUN_OVS_VSCTL(COMMAND, ...)
+dnl
+dnl Executes each ovs-vsctl COMMAND on a file named "conf" in the
+dnl current directory. Creates "conf" if it does not already exist.
+m4_define([RUN_OVS_VSCTL],
+ [: >> conf
+m4_foreach([command], [$@], [ovs-vsctl --no-reload --config=conf command
+])])
+
+dnl RUN_OVS_VSCTL_TOGETHER(COMMAND, ...)
+dnl
+dnl Executes each ovs-vsctl COMMAND on a file named "conf" in the
+dnl current directory, in a single run of ovs-vsctl. Creates "conf" if it
+dnl does not already exist.
+m4_define([RUN_OVS_VSCTL_TOGETHER],
+ [: >> conf
+ ovs-vsctl --no-reload --config=conf m4_join([ -- ], $@)])
+
+dnl CHECK_BRIDGES([BRIDGE, PARENT, VLAN], ...)
+dnl
+dnl Verifies that "ovs-vsctl list-br" prints the specified list of bridges,
+dnl which must be in alphabetical order. Also checks that each BRIDGE has the
+dnl specified PARENT and is on the given VLAN.
+m4_define([_CHECK_BRIDGE],
+ [AT_CHECK([RUN_OVS_VSCTL([br-to-parent $1])], [0], [$2
+])
+
+ # Check br-to-vlan, without --oneline.
+ AT_CHECK([RUN_OVS_VSCTL([br-to-vlan $1])], [0], [$3
+])
+ # Check br-to-vlan, with --oneline.
+ # (This particular test is interesting with --oneline because it returns
+ # an integer instead of a string and that can cause type mismatches inside
+ # python if not done carefully.)
+ AT_CHECK([RUN_OVS_VSCTL([--oneline br-to-vlan $1])], [0], [$3
+])
+
+ # Check multiple queries in a single run.
+ AT_CHECK([RUN_OVS_VSCTL_TOGETHER([br-to-parent $1], [br-to-vlan $1])], [0],
+[$2
+$3
+])])
+m4_define([CHECK_BRIDGES],
+ [dnl Check that the bridges appear on list-br, without --oneline.
+ AT_CHECK(
+ [RUN_OVS_VSCTL([list-br])],
+ [0],
+ [m4_foreach([brinfo], [$@], [m4_car(brinfo)
+])])
+
+ dnl Check that the bridges appear on list-br, with --oneline.
+ AT_CHECK(
+ [RUN_OVS_VSCTL([--oneline list-br])],
+ [0],
+ [m4_join([\n], m4_foreach([brinfo], [$@], [m4_car(brinfo),]))
+])
+
+ dnl Check that each bridge exists according to br-exists and that
+ dnl a bridge that should not exist does not.
+ m4_foreach([brinfo], [$@],
+ [AT_CHECK([RUN_OVS_VSCTL([br-exists m4_car(brinfo)])])])
+ AT_CHECK([RUN_OVS_VSCTL([br-exists nonexistent])], [2])
+
+ dnl Check that each bridge has the expected parent and VLAN.
+ m4_map([_CHECK_BRIDGE], [$@])])
+
+dnl CHECK_PORTS(BRIDGE, PORT[, PORT...])
+dnl
+dnl Verifies that "ovs-vsctl list-ports BRIDGE" prints the specified
+dnl list of ports, which must be in alphabetical order. Also checks
+dnl that "ovs-vsctl port-to-br" reports that each port is
+dnl in BRIDGE.
+m4_define([CHECK_PORTS],
+ [dnl Check ports without --oneline.
+ AT_CHECK(
+ [RUN_OVS_VSCTL([list-ports $1])],
+ [0],
+ [m4_foreach([port], m4_cdr($@), [port
+])])
+
+ dnl Check ports with --oneline.
+ AT_CHECK(
+ [RUN_OVS_VSCTL([--oneline list-ports $1])],
+ [0],
+ [m4_join([\n], m4_shift($@))
+])
+ AT_CHECK([RUN_OVS_VSCTL([port-to-br $1])], [1], [], [ovs-vsctl: no port named $1
+])
+ m4_foreach(
+ [port], m4_cdr($@),
+ [AT_CHECK([RUN_OVS_VSCTL([[port-to-br] port])], [0], [$1
+])])])
+
+dnl CHECK_IFACES(BRIDGE, IFACE[, IFACE...])
+dnl
+dnl Verifies that "ovs-vsctl list-ifaces BRIDGE" prints the specified
+dnl list of ifaces, which must be in alphabetical order. Also checks
+dnl that "ovs-vsctl iface-to-br" reports that each interface is
+dnl in BRIDGE.
+m4_define([CHECK_IFACES],
+ [AT_CHECK(
+ [RUN_OVS_VSCTL([list-ifaces $1])],
+ [0],
+ [m4_foreach([iface], m4_cdr($@), [iface
+])])
+ AT_CHECK([RUN_OVS_VSCTL([iface-to-br $1])], [1], [], [ovs-vsctl: no interface named $1
+])
+ m4_foreach(
+ [iface], m4_cdr($@),
+ [AT_CHECK([RUN_OVS_VSCTL([[iface-to-br] iface])], [0], [$1
+])])])
+
+dnl ----------------------------------------------------------------------
+AT_BANNER([ovs-vsctl unit tests -- real bridges])
+
+AT_SETUP([add-br a])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL([add-br a])])
+AT_CHECK([cat conf], [0], [dnl
+bridge.a.port=a
+])
+CHECK_BRIDGES([a, a, 0])
+CHECK_PORTS([a])
+CHECK_IFACES([a])
+AT_CLEANUP
+
+AT_SETUP([add-br a, add-br a])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL([add-br a])])
+AT_CHECK([RUN_OVS_VSCTL([add-br a])], [1], [],
+ [ovs-vsctl: cannot create a bridge named a because a bridge named a already exists
+])
+AT_CLEANUP
+
+AT_SETUP([add-br a, add-br b])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL([add-br a], [add-br b])])
+AT_CHECK([cat conf], [0], [dnl
+bridge.a.port=a
+bridge.b.port=b
+])
+CHECK_BRIDGES([a, a, 0], [b, b, 0])
+CHECK_PORTS([a])
+CHECK_IFACES([a])
+CHECK_PORTS([b])
+CHECK_IFACES([b])
+AT_CLEANUP
+
+AT_SETUP([add-br a, add-br b, del-br a])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL([add-br a], [add-br b], [del-br a])])
+AT_CHECK([cat conf], [0], [dnl
+bridge.b.port=b
+])
+CHECK_BRIDGES([b, b, 0])
+CHECK_PORTS([b])
+CHECK_IFACES([b])
+AT_CLEANUP
+
+AT_SETUP([add-br a, add-port a a1, add-port a a2])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL(
+ [add-br a],
+ [add-port a a1],
+ [add-port a a2])])
+AT_CHECK([cat conf], [0],
+ [bridge.a.port=a
+bridge.a.port=a1
+bridge.a.port=a2
+])
+CHECK_BRIDGES([a, a, 0])
+CHECK_PORTS([a], [a1], [a2])
+CHECK_IFACES([a], [a1], [a2])
+AT_CLEANUP
+
+AT_SETUP([add-br a, add-port a a1, add-port a a1])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL(
+ [add-br a],
+ [add-port a a1])])
+AT_CHECK([cat conf], [0],
+ [bridge.a.port=a
+bridge.a.port=a1
+])
+AT_CHECK([RUN_OVS_VSCTL([add-port a a1])], [1], [],
+ [ovs-vsctl: cannot create a port named a1 because a port named a1 already exists on bridge a
+])
+AT_CLEANUP
+
+AT_SETUP([add-br a b, add-port a a1, add-port b b1, del-br a])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL_TOGETHER(
+ [add-br a],
+ [add-br b],
+ [add-port a a1],
+ [add-port b b1],
+ [del-br a])])
+AT_CHECK([cat conf], [0],
+ [bridge.b.port=b
+bridge.b.port=b1
+])
+CHECK_BRIDGES([b, b, 0])
+CHECK_PORTS([b], [b1])
+CHECK_IFACES([b], [b1])
+AT_CLEANUP
+
+AT_SETUP([add-br a, add-bond a bond0 a1 a2 a3])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL(
+ [add-br a],
+ [add-bond a bond0 a1 a2 a3])])
+AT_CHECK([cat conf], [0], [dnl
+bonding.bond0.slave=a1
+bonding.bond0.slave=a2
+bonding.bond0.slave=a3
+bridge.a.port=a
+bridge.a.port=bond0
+])
+CHECK_BRIDGES([a, a, 0])
+CHECK_PORTS([a], [bond0])
+CHECK_IFACES([a], [a1], [a2], [a3])
+AT_CLEANUP
+
+AT_SETUP([add-br a b, add-port a a1, add-port b b1, del-port a a1])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL(
+ [add-br a],
+ [add-br b],
+ [add-port a a1],
+ [add-port b b1],
+ [del-port a a1])])
+AT_CHECK([cat conf], [0], [dnl
+bridge.a.port=a
+bridge.b.port=b
+bridge.b.port=b1
+])
+CHECK_BRIDGES([a, a, 0], [b, b, 0])
+CHECK_PORTS([a])
+CHECK_IFACES([a])
+CHECK_PORTS([b], [b1])
+CHECK_IFACES([b], [b1])
+AT_CLEANUP
+
+AT_SETUP([add-br a, add-bond a bond0 a1 a2 a3, del-port bond0])
+AT_KEYWORDS([ovs-vsctl])
+AT_CHECK([RUN_OVS_VSCTL_TOGETHER(
+ [add-br a],
+ [add-bond a bond0 a1 a2 a3],
+ [del-port bond0])])
+AT_CHECK([cat conf], [0], [dnl
+bridge.a.port=a
+])
+CHECK_BRIDGES([a, a, 0])
+CHECK_PORTS([a])
+AT_CLEANUP
+
+dnl ----------------------------------------------------------------------
+AT_BANNER([ovs-vsctl unit tests -- fake bridges])
+
+m4_define([SIMPLE_FAKE_CONF], [dnl
+bridge.xenbr0.port=eth0
+bridge.xenbr0.port=eth0.9
+bridge.xenbr0.port=xapi1
+bridge.xenbr0.port=xenbr0
+iface.xapi1.fake-bridge=true
+iface.xapi1.internal=true
+vlan.eth0.9.tag=9
+vlan.xapi1.tag=9
+])
+
+AT_SETUP([simple fake bridge])
+AT_KEYWORDS([ovs-vsctl fake-bridge])
+AT_CHECK([RUN_OVS_VSCTL(
+ [add-br xenbr0],
+ [add-port xenbr0 eth0],
+ [add-br xapi1 xenbr0 9],
+ [add-port xapi1 eth0.9])])
+AT_CHECK([cat conf], [0], [SIMPLE_FAKE_CONF])
+CHECK_BRIDGES([xenbr0, xenbr0, 0], [xapi1, xenbr0, 9])
+CHECK_PORTS([xenbr0], [eth0])
+CHECK_IFACES([xenbr0], [eth0])
+CHECK_PORTS([xapi1], [eth0.9])
+CHECK_IFACES([xapi1], [eth0.9])
+AT_CLEANUP
+
+AT_SETUP([simple fake bridge + del-br fake bridge])
+AT_KEYWORDS([ovs-vsctl fake-bridge])
+AT_DATA([conf], [SIMPLE_FAKE_CONF])
+AT_CHECK([RUN_OVS_VSCTL([del-br xapi1])])
+AT_CHECK([cat conf], [0], [dnl
+bridge.xenbr0.port=eth0
+bridge.xenbr0.port=xenbr0
+])
+CHECK_BRIDGES([xenbr0, xenbr0, 0])
+CHECK_PORTS([xenbr0], [eth0])
+CHECK_IFACES([xenbr0], [eth0])
+AT_CLEANUP
+
+AT_SETUP([simple fake bridge + del-br real bridge])
+AT_KEYWORDS([ovs-vsctl fake-bridge])
+AT_DATA([conf], [SIMPLE_FAKE_CONF])
+AT_CHECK([RUN_OVS_VSCTL([del-br xenbr0])])
+AT_CHECK([cat conf], [0], [])
+CHECK_BRIDGES
+AT_CLEANUP
+
+m4_define([BOND_FAKE_CONF], [dnl
+bonding.bond0.slave=eth0
+bonding.bond0.slave=eth1
+bridge.xapi1.port=bond0
+bridge.xapi1.port=bond0.11
+bridge.xapi1.port=xapi1
+bridge.xapi1.port=xapi2
+iface.xapi2.fake-bridge=true
+iface.xapi2.internal=true
+vlan.bond0.11.tag=11
+vlan.xapi2.tag=11
+])
+
+AT_SETUP([fake bridge on bond])
+AT_KEYWORDS([ovs-vsctl fake-bridge])
+AT_CHECK([RUN_OVS_VSCTL(
+ [add-br xapi1],
+ [add-bond xapi1 bond0 eth0 eth1],
+ [add-br xapi2 xapi1 11],
+ [add-port xapi2 bond0.11])])
+AT_CHECK([cat conf], [0], [BOND_FAKE_CONF])
+CHECK_BRIDGES([xapi1, xapi1, 0], [xapi2, xapi1, 11])
+CHECK_PORTS([xapi1], [bond0])
+CHECK_IFACES([xapi1], [eth0], [eth1])
+CHECK_PORTS([xapi2], [bond0.11])
+CHECK_IFACES([xapi2], [bond0.11])
+AT_CLEANUP
+
+AT_SETUP([fake bridge on bond + del-br fake bridge])
+AT_KEYWORDS([ovs-vsctl fake-bridge])
+AT_DATA([conf], [BOND_FAKE_CONF])
+AT_CHECK([RUN_OVS_VSCTL([--oneline del-br xapi2])], [0], [
+])
+CHECK_BRIDGES([xapi1, xapi1, 0])
+CHECK_PORTS([xapi1], [bond0])
+CHECK_IFACES([xapi1], [eth0], [eth1])
+AT_CLEANUP
+
+AT_SETUP([fake bridge on bond + del-br real bridge])
+AT_KEYWORDS([ovs-vsctl fake-bridge])
+AT_DATA([conf], [BOND_FAKE_CONF])
+AT_CHECK([RUN_OVS_VSCTL([del-br xapi1])])
+CHECK_BRIDGES
+AT_CLEANUP
--- /dev/null
+AT_BANNER([Spanning Tree Protocol unit tests])
+
+AT_SETUP([STP example from IEEE 802.1D-1998])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-ieee802.1d-1998],
+[bridge 0 0x42 = a b
+bridge 1 0x97 = c:5 a d:5
+bridge 2 0x45 = b e
+bridge 3 0x57 = b:5 e:5
+bridge 4 0x83 = a:5 e:5
+run 1000
+check 0 = root
+check 1 = F F:10 F
+check 2 = F:10 B
+check 3 = F:5 F
+check 4 = F:5 B
+])
+OVS_CHECK_LCOV([test-stp test-stp-ieee802.1d-1998])
+AT_CLEANUP
+
+AT_SETUP([STP example from IEEE 802.1D-2004 figures 17.4 and 17.5])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-ieee802.1d-2004-fig17.4],
+[bridge 0 0x111 = a b e c
+bridge 1 0x222 = a b d f
+bridge 2 0x333 = c d l j h g
+bridge 3 0x444 = e f n m k i
+bridge 4 0x555 = g i 0 0
+bridge 5 0x666 = h k 0 0
+bridge 6 0x777 = j m 0 0
+bridge 7 0x888 = l n 0 0
+run 1000
+check 0 = root
+check 1 = F:10 B F F
+check 2 = F:10 B F F F F
+check 3 = F:10 B F F F F
+check 4 = F:20 B F F
+check 5 = F:20 B F F
+check 6 = F:20 B F F
+check 7 = F:20 B F F
+
+# Now connect two ports of bridge 7 to the same LAN.
+bridge 7 = l n o o
+# Same results except for bridge 7:
+run 1000
+check 0 = root
+check 1 = F:10 B F F
+check 2 = F:10 B F F F F
+check 3 = F:10 B F F F F
+check 4 = F:20 B F F
+check 5 = F:20 B F F
+check 6 = F:20 B F F
+check 7 = F:20 B F B
+])
+OVS_CHECK_LCOV([test-stp test-stp-ieee802.1d-2004-fig17.4])
+AT_CLEANUP
+
+AT_SETUP([STP example from IEEE 802.1D-2004 figure 17.6])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-ieee802.1d-2004-fig17.6],
+[bridge 0 0x111 = a b l
+bridge 1 0x222 = b c d
+bridge 2 0x333 = d e f
+bridge 3 0x444 = f g h
+bridge 4 0x555 = j h i
+bridge 5 0x666 = l j k
+run 1000
+check 0 = root
+check 1 = F:10 F F
+check 2 = F:20 F F
+check 3 = F:30 F B
+check 4 = F:20 F F
+check 5 = F:10 F F
+])
+OVS_CHECK_LCOV([test-stp test-stp-ieee802.1d-2004-fig17.6])
+AT_CLEANUP
+
+AT_SETUP([STP example from IEEE 802.1D-2004 figure 17.7])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-ieee802.1d-2004-fig17.7],
+[bridge 0 0xaa = b
+bridge 1 0x111 = a b d f h g e c
+bridge 2 0x222 = g h j l n m k i
+run 1000
+check 0 = root
+check 1 = F F:10 F F F F F F
+check 2 = B F:20 F F F F F F
+
+# This is not the port priority change described in that figure,
+# but I don't understand what port priority change would cause
+# that change.
+bridge 2 = g X j l n m k i
+run 1000
+check 0 = root
+check 1 = F F:10 F F F F F F
+check 2 = F:20 D F F F F F F
+])
+OVS_CHECK_LCOV([test-stp test-stp-ieee802.1d-2004-fig17.7])
+AT_CLEANUP
+
+AT_SETUP([STP.io.1.1: Link Failure])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-io-1.1],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Interoperability Test Suite
+# Version 1.5":
+#
+# STP.io.1.1: Link Failure
+bridge 0 0x111 = a b c
+bridge 1 0x222 = a b c
+run 1000
+check 0 = root
+check 1 = F:10 B B
+bridge 1 = 0 _ _
+run 1000
+check 0 = root
+check 1 = F F:10 B
+bridge 1 = X _ _
+run 1000
+check 0 = root
+check 1 = D F:10 B
+bridge 1 = _ 0 _
+run 1000
+check 0 = root
+check 1 = D F F:10
+bridge 1 = _ X _
+run 1000
+check 0 = root
+check 1 = D D F:10
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-io-1.1])
+AT_CLEANUP
+
+AT_SETUP([STP.io.1.2: Repeated Network])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-io-1.2],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Interoperability Test Suite
+# Version 1.5":
+# STP.io.1.2: Repeated Network
+bridge 0 0x111 = a a
+bridge 1 0x222 = a a
+run 1000
+check 0 = rootid:0x111 F B
+check 1 = rootid:0x111 F:10 B
+bridge 1 = a^0x90 _
+run 1000
+check 0 = rootid:0x111 F B
+check 1 = rootid:0x111 B F:10
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-io-1.2])
+AT_CLEANUP
+
+AT_SETUP([STP.io.1.4: Network Initialization])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-io-1.4],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Interoperability Test Suite
+# Version 1.5":
+# STP.io.1.4: Network Initialization
+bridge 0 0x111 = a b c
+bridge 1 0x222 = b d e
+bridge 2 0x333 = a d f
+bridge 3 0x444 = c e f
+run 1000
+check 0 = root
+check 1 = F:10 F F
+check 2 = F:10 B F
+check 3 = F:10 B B
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-io-1.4])
+AT_CLEANUP
+
+AT_SETUP([STP.io.1.5: Topology Change])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-io-1.5],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Interoperability Test Suite
+# Version 1.5":
+# STP.io.1.5: Topology Change
+bridge 0 0x111 = a b d c
+bridge 1 0x222 = a b f e
+bridge 2 0x333 = c d g h
+bridge 3 0x444 = e f g h
+run 1000
+check 0 = root
+check 1 = F:10 B F F
+check 2 = B F:10 F F
+check 3 = B F:20 B B
+bridge 1^0x7000
+run 1000
+check 0 = F:10 B F F
+check 1 = root
+check 2 = B F:20 B B
+check 3 = B F:10 F F
+bridge 2^0x6000
+run 1000
+check 0 = F F B F:10
+check 1 = F:20 B B B
+check 2 = root
+check 3 = F F F:10 B
+bridge 3^0x5000
+run 1000
+check 0 = B B B F:20
+check 1 = F F B F:10
+check 2 = F F F:10 B
+check 3 = root
+bridge 0^0x4000
+bridge 1^0x4001
+bridge 2^0x4002
+bridge 3^0x4003
+run 1000
+check 0 = root
+check 1 = F:10 B F F
+check 2 = B F:10 F F
+check 3 = B F:20 B B
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-io-1.5])
+AT_CLEANUP
+
+AT_SETUP([STP.op.1.1 and STP.op.1.2])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-op-1.1],
+[# This test file approximates the following tests from "Bridge
+# Functions Consortium Spanning Tree Protocol Operations Test Suite
+# Version 2.3":
+# Test STP.op.1.1: Root ID Initialized to Bridge ID
+# Test STP.op.1.2: Root Path Cost Initialized to Zero
+bridge 0 0x123 =
+check 0 = root
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-op-1.1])
+AT_CLEANUP
+
+AT_SETUP([STP.op.1.4: All Ports Initialized to Designated Ports])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-op-1.4],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Protocol Operations Test Suite
+# Version 2.3":
+# Test STP.op.1.4: All Ports Initialized to Designated Ports
+bridge 0 0x123 = a b c d e f
+check 0 = Li Li Li Li Li Li
+run 1000
+check 0 = F F F F F F
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-op-1.4])
+AT_CLEANUP
+
+AT_SETUP([STP.op.3.1: Root Bridge Selection: Root ID Values])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-op-3.1],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Protocol Operations Test Suite
+# Version 2.3":
+# Test STP.op.3.1: Root Bridge Selection: Root ID Values
+bridge 0 0x111 = a
+bridge 1 0x222 = a
+check 0 = rootid:0x111 Li
+check 1 = rootid:0x222 Li
+run 1000
+check 0 = rootid:0x111 root
+check 1 = rootid:0x111 F:10
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-op-3.1])
+AT_CLEANUP
+
+AT_SETUP([STP.op.3.3: Root Bridge Selection: Bridge ID Values])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-op-3.3],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Protocol Operations Test Suite
+# Version 2.3":
+# Test STP.op.3.3: Root Bridge Selection: Bridge ID Values
+bridge 0 0x333^0x6000 = a
+bridge 1 0x222^0x7000 = b
+bridge 2 0x111 = a b
+run 1000
+check 0 = rootid:0x333^0x6000 root
+check 1 = rootid:0x333^0x6000 F:20
+check 2 = rootid:0x333^0x6000 F:10 F
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-op-3.3])
+AT_CLEANUP
+
+AT_SETUP([STP.op.3.3: Root Bridge Selection: Bridge ID Values])
+AT_KEYWORDS([STP])
+AT_DATA([test-stp-iol-op-3.4],
+[# This test file approximates the following test from "Bridge
+# Functions Consortium Spanning Tree Protocol Operations Test Suite
+# Version 2.3":
+# Test STP.op.3.3: Root Bridge Selection: Bridge ID Values
+bridge 0 0x333^0x6000 = a
+bridge 1 0x222^0x7000 = b
+bridge 2 0x111 = a b
+run 1000
+check 0 = rootid:0x333^0x6000 root
+check 1 = rootid:0x333^0x6000 F:20
+check 2 = rootid:0x333^0x6000 F:10 F
+])
+OVS_CHECK_LCOV([test-stp test-stp-iol-op-3.4])
+AT_CLEANUP
+
}
\f
#ifdef WORDS_BIGENDIAN
-#define HTONL(VALUE) ((uint32_t) (VALUE))
-#define HTONS(VALUE) ((uint32_t) (VALUE))
+#define T_HTONL(VALUE) ((uint32_t) (VALUE))
+#define T_HTONS(VALUE) ((uint32_t) (VALUE))
#else
-#define HTONL(VALUE) (((((uint32_t) (VALUE)) & 0x000000ff) << 24) | \
+#define T_HTONL(VALUE) (((((uint32_t) (VALUE)) & 0x000000ff) << 24) | \
((((uint32_t) (VALUE)) & 0x0000ff00) << 8) | \
((((uint32_t) (VALUE)) & 0x00ff0000) >> 8) | \
((((uint32_t) (VALUE)) & 0xff000000) >> 24))
-#define HTONS(VALUE) (((((uint16_t) (VALUE)) & 0xff00) >> 8) | \
+#define T_HTONS(VALUE) (((((uint16_t) (VALUE)) & 0xff00) >> 8) | \
((((uint16_t) (VALUE)) & 0x00ff) << 8))
#endif
-static uint32_t nw_src_values[] = { HTONL(0xc0a80001),
- HTONL(0xc0a04455) };
-static uint32_t nw_dst_values[] = { HTONL(0xc0a80002),
- HTONL(0xc0a04455) };
-static uint16_t in_port_values[] = { HTONS(1), HTONS(OFPP_LOCAL) };
-static uint16_t dl_vlan_values[] = { HTONS(101), HTONS(0) };
-static uint16_t dl_type_values[] = { HTONS(ETH_TYPE_IP), HTONS(ETH_TYPE_ARP) };
-static uint16_t tp_src_values[] = { HTONS(49362), HTONS(80) };
-static uint16_t tp_dst_values[] = { HTONS(6667), HTONS(22) };
+static uint32_t nw_src_values[] = { T_HTONL(0xc0a80001),
+ T_HTONL(0xc0a04455) };
+static uint32_t nw_dst_values[] = { T_HTONL(0xc0a80002),
+ T_HTONL(0xc0a04455) };
+static uint16_t in_port_values[] = { T_HTONS(1), T_HTONS(OFPP_LOCAL) };
+static uint16_t dl_vlan_values[] = { T_HTONS(101), T_HTONS(0) };
+static uint16_t dl_type_values[]
+ = { T_HTONS(ETH_TYPE_IP), T_HTONS(ETH_TYPE_ARP) };
+static uint16_t tp_src_values[] = { T_HTONS(49362), T_HTONS(80) };
+static uint16_t tp_dst_values[] = { T_HTONS(6667), T_HTONS(22) };
static uint8_t dl_src_values[][6] = { { 0x00, 0x02, 0xe3, 0x0f, 0x80, 0xa4 },
{ 0x5e, 0x33, 0x7f, 0x5f, 0x1e, 0x99 } };
static uint8_t dl_dst_values[][6] = { { 0x4a, 0x27, 0x71, 0xae, 0x64, 0xc1 },
"\nDHCP options:\n"
" --request-ip=IP request specified IP address (default:\n"
" do not request a specific IP)\n"
- " --vendor-class=STRING use STRING as vendor class (default:\n"
- " none); use OpenFlow to imitate secchan\n"
+ " --vendor-class=STRING use STRING as vendor class; use\n"
+ " OpenFlow to imitate ovs-openflowd\n"
" --no-resolv-conf do not update /etc/resolv.conf\n",
program_name, program_name);
vlog_usage();
+++ /dev/null
-#! /bin/sh -e
-
-# Copyright (c) 2009 Nicira Networks.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at:
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-srcdir=`cd $srcdir && pwd`
-trap 'rm -f flows$$ pcap$$ out$$' 0 1 2 13 15
-cd tests
-"$srcdir"/tests/flowgen.pl >/dev/null 3>flows$$ 4>pcap$$
-./test-flows <flows$$ 3<pcap$$ >out$$ || true
-diff -u - out$$ <<EOF
-checked 247 packets, 0 errors
-EOF
+++ /dev/null
-# This is the STP example from IEEE 802.1D-1998.
-bridge 0 0x42 = a b
-bridge 1 0x97 = c:5 a d:5
-bridge 2 0x45 = b e
-bridge 3 0x57 = b:5 e:5
-bridge 4 0x83 = a:5 e:5
-run 1000
-check 0 = root
-check 1 = F F:10 F
-check 2 = F:10 B
-check 3 = F:5 F
-check 4 = F:5 B
+++ /dev/null
-# This is the STP example from IEEE 802.1D-2004 figures 17.4 and 17.5.
-bridge 0 0x111 = a b e c
-bridge 1 0x222 = a b d f
-bridge 2 0x333 = c d l j h g
-bridge 3 0x444 = e f n m k i
-bridge 4 0x555 = g i 0 0
-bridge 5 0x666 = h k 0 0
-bridge 6 0x777 = j m 0 0
-bridge 7 0x888 = l n 0 0
-run 1000
-check 0 = root
-check 1 = F:10 B F F
-check 2 = F:10 B F F F F
-check 3 = F:10 B F F F F
-check 4 = F:20 B F F
-check 5 = F:20 B F F
-check 6 = F:20 B F F
-check 7 = F:20 B F F
-
-# Now connect two ports of bridge 7 to the same LAN.
-bridge 7 = l n o o
-# Same results except for bridge 7:
-run 1000
-check 0 = root
-check 1 = F:10 B F F
-check 2 = F:10 B F F F F
-check 3 = F:10 B F F F F
-check 4 = F:20 B F F
-check 5 = F:20 B F F
-check 6 = F:20 B F F
-check 7 = F:20 B F B
+++ /dev/null
-# This is the STP example from IEEE 802.1D-2004 figure 17.6.
-bridge 0 0x111 = a b l
-bridge 1 0x222 = b c d
-bridge 2 0x333 = d e f
-bridge 3 0x444 = f g h
-bridge 4 0x555 = j h i
-bridge 5 0x666 = l j k
-run 1000
-check 0 = root
-check 1 = F:10 F F
-check 2 = F:20 F F
-check 3 = F:30 F B
-check 4 = F:20 F F
-check 5 = F:10 F F
+++ /dev/null
-# This is the STP example from IEEE 802.1D-2004 figure 17.7.
-bridge 0 0xaa = b
-bridge 1 0x111 = a b d f h g e c
-bridge 2 0x222 = g h j l n m k i
-run 1000
-check 0 = root
-check 1 = F F:10 F F F F F F
-check 2 = B F:20 F F F F F F
-
-# This is not the port priority change described in that figure,
-# but I don't understand what port priority change would cause
-# that change.
-bridge 2 = g X j l n m k i
-run 1000
-check 0 = root
-check 1 = F F:10 F F F F F F
-check 2 = F:20 D F F F F F F
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Interoperability Test Suite
-# Version 1.5":
-# STP.io.1.1: Link Failure
-bridge 0 0x111 = a b c
-bridge 1 0x222 = a b c
-run 1000
-check 0 = root
-check 1 = F:10 B B
-bridge 1 = 0 _ _
-run 1000
-check 0 = root
-check 1 = F F:10 B
-bridge 1 = X _ _
-run 1000
-check 0 = root
-check 1 = D F:10 B
-bridge 1 = _ 0 _
-run 1000
-check 0 = root
-check 1 = D F F:10
-bridge 1 = _ X _
-run 1000
-check 0 = root
-check 1 = D D F:10
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Interoperability Test Suite
-# Version 1.5":
-# STP.io.1.2: Repeated Network
-bridge 0 0x111 = a a
-bridge 1 0x222 = a a
-run 1000
-check 0 = rootid:0x111 F B
-check 1 = rootid:0x111 F:10 B
-bridge 1 = a^0x90 _
-run 1000
-check 0 = rootid:0x111 F B
-check 1 = rootid:0x111 B F:10
-
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Interoperability Test Suite
-# Version 1.5":
-# STP.io.1.4: Network Initialization
-bridge 0 0x111 = a b c
-bridge 1 0x222 = b d e
-bridge 2 0x333 = a d f
-bridge 3 0x444 = c e f
-run 1000
-check 0 = root
-check 1 = F:10 F F
-check 2 = F:10 B F
-check 3 = F:10 B B
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Interoperability Test Suite
-# Version 1.5":
-# STP.io.1.5: Topology Change
-bridge 0 0x111 = a b d c
-bridge 1 0x222 = a b f e
-bridge 2 0x333 = c d g h
-bridge 3 0x444 = e f g h
-run 1000
-check 0 = root
-check 1 = F:10 B F F
-check 2 = B F:10 F F
-check 3 = B F:20 B B
-bridge 1^0x7000
-run 1000
-check 0 = F:10 B F F
-check 1 = root
-check 2 = B F:20 B B
-check 3 = B F:10 F F
-bridge 2^0x6000
-run 1000
-check 0 = F F B F:10
-check 1 = F:20 B B B
-check 2 = root
-check 3 = F F F:10 B
-bridge 3^0x5000
-run 1000
-check 0 = B B B F:20
-check 1 = F F B F:10
-check 2 = F F F:10 B
-check 3 = root
-bridge 0^0x4000
-bridge 1^0x4001
-bridge 2^0x4002
-bridge 3^0x4003
-run 1000
-check 0 = root
-check 1 = F:10 B F F
-check 2 = B F:10 F F
-check 3 = B F:20 B B
+++ /dev/null
-# This test file approximates the following tests from "Bridge
-# Functions Consortium Spanning Tree Protocol Operations Test Suite
-# Version 2.3":
-# Test STP.op.1.1 Â Root ID Initialized to Bridge ID
-# Test STP.op.1.2 Â Root Path Cost Initialized to Zero
-bridge 0 0x123 =
-check 0 = root
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Protocol Operations Test Suite
-# Version 2.3":
-# Test STP.op.1.4 Â All Ports Initialized to Designated Ports
-bridge 0 0x123 = a b c d e f
-check 0 = Li Li Li Li Li Li
-run 1000
-check 0 = F F F F F F
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Protocol Operations Test Suite
-# Version 2.3":
-# Test STP.op.3.1 Â Root Bridge Selection: Root ID Values
-bridge 0 0x111 = a
-bridge 1 0x222 = a
-check 0 = rootid:0x111 Li
-check 1 = rootid:0x222 Li
-run 1000
-check 0 = rootid:0x111 root
-check 1 = rootid:0x111 F:10
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Protocol Operations Test Suite
-# Version 2.3":
-# Test STP.op.3.3 Â Root Bridge Selection: Bridge ID Values
-bridge 0 0x333^0x6000 = a
-bridge 1 0x222^0x7000 = b
-bridge 2 0x111 = a b
-run 1000
-check 0 = rootid:0x333^0x6000 root
-check 1 = rootid:0x333^0x6000 F:20
-check 2 = rootid:0x333^0x6000 F:10 F
+++ /dev/null
-# This test file approximates the following test from "Bridge
-# Functions Consortium Spanning Tree Protocol Operations Test Suite
-# Version 2.3":
-# Test STP.op.3.3 Â Root Bridge Selection: Bridge ID Values
-bridge 0 0x333^0x6000 = a
-bridge 1 0x222^0x7000 = b
-bridge 2 0x111 = a b
-run 1000
-check 0 = rootid:0x333^0x6000 root
-check 1 = rootid:0x333^0x6000 F:20
-check 2 = rootid:0x333^0x6000 F:10 F
if (!strcmp(token, "0")) {
lan = NULL;
- } else if (strlen(token) == 1 && islower(*token)) {
+ } else if (strlen(token) == 1
+ && islower((unsigned char)*token)) {
lan = tc->lans[*token - 'a'];
} else {
err("%s is not a valid LAN name "
+++ /dev/null
-#! /bin/sh
-
-# Copyright (c) 2008 Nicira Networks.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at:
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-set -e
-progress=
-for d in ${stp_files}; do
- echo "Testing $d..."
- $SUPERVISOR ./tests/test-stp ${srcdir}/$d
-done
--- /dev/null
+/*
+ * Copyright (c) 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "vconn.h"
+#include <errno.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "poll-loop.h"
+#include "socket-util.h"
+#include "timeval.h"
+#include "util.h"
+#include "vlog.h"
+
+#undef NDEBUG
+#include <assert.h>
+
+struct fake_pvconn {
+ const char *type;
+ char *pvconn_name;
+ char *vconn_name;
+ int fd;
+};
+
+static void
+fpv_create(const char *type, struct fake_pvconn *fpv)
+{
+ fpv->type = type;
+ if (!strcmp(type, "unix")) {
+ static int unix_count = 0;
+ char *bind_path;
+ int fd;
+
+ bind_path = xasprintf("fake-pvconn.%d", unix_count++);
+ fd = make_unix_socket(SOCK_STREAM, false, false, bind_path, NULL);
+ if (fd < 0) {
+ ovs_fatal(-fd, "%s: could not bind to Unix domain socket",
+ bind_path);
+ }
+
+ fpv->pvconn_name = xasprintf("punix:%s", bind_path);
+ fpv->vconn_name = xasprintf("unix:%s", bind_path);
+ fpv->fd = fd;
+ free(bind_path);
+ } else if (!strcmp(type, "tcp")) {
+ struct sockaddr_in sin;
+ socklen_t sin_len;
+ int fd;
+
+ /* Create TCP socket. */
+ fd = socket(PF_INET, SOCK_STREAM, 0);
+ if (fd < 0) {
+ ovs_fatal(errno, "failed to create TCP socket");
+ }
+
+ /* Bind TCP socket to localhost on any available port. */
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ sin.sin_port = htons(0);
+ if (bind(fd, (struct sockaddr *) &sin, sizeof sin) < 0) {
+ ovs_fatal(errno, "failed to bind TCP socket");
+ }
+
+ /* Retrieve socket's port number. */
+ sin_len = sizeof sin;
+ if (getsockname(fd, (struct sockaddr *)&sin, &sin_len) < 0) {
+ ovs_fatal(errno, "failed to read TCP socket name");
+ }
+ if (sin_len != sizeof sin || sin.sin_family != AF_INET) {
+ ovs_fatal(errno, "bad TCP socket name");
+ }
+
+ /* Save info. */
+ fpv->pvconn_name = xasprintf("ptcp:%"PRIu16":127.0.0.1",
+ ntohs(sin.sin_port));
+ fpv->vconn_name = xasprintf("tcp:127.0.0.1:%"PRIu16,
+ ntohs(sin.sin_port));
+ fpv->fd = fd;
+ } else {
+ abort();
+ }
+
+ /* Listen. */
+ if (listen(fpv->fd, 0) < 0) {
+ ovs_fatal(errno, "%s: listen failed", fpv->vconn_name);
+ }
+}
+
+static int
+fpv_accept(struct fake_pvconn *fpv)
+{
+ int fd;
+
+ fd = accept(fpv->fd, NULL, NULL);
+ if (fd < 0) {
+ ovs_fatal(errno, "%s: accept failed", fpv->pvconn_name);
+ }
+ return fd;
+}
+
+static void
+fpv_close(struct fake_pvconn *fpv)
+{
+ if (fpv->fd >= 0) {
+ if (close(fpv->fd) < 0) {
+ ovs_fatal(errno, "failed to close %s fake pvconn", fpv->type);
+ }
+ fpv->fd = -1;
+ }
+}
+
+static void
+fpv_destroy(struct fake_pvconn *fpv)
+{
+ fpv_close(fpv);
+ free(fpv->pvconn_name);
+ free(fpv->vconn_name);
+}
+
+/* Connects to a fake_pvconn with vconn_open(), then closes the listener and
+ * verifies that vconn_connect() reports 'expected_error'. */
+static void
+test_refuse_connection(const char *type, int expected_error)
+{
+ struct fake_pvconn fpv;
+ struct vconn *vconn;
+
+ fpv_create(type, &fpv);
+ assert(!vconn_open(fpv.vconn_name, OFP_VERSION, &vconn));
+ fpv_close(&fpv);
+ assert(vconn_connect(vconn) == expected_error);
+ vconn_close(vconn);
+ fpv_destroy(&fpv);
+}
+
+/* Connects to a fake_pvconn with vconn_open(), accepts that connection and
+ * closes it immediately, and verifies that vconn_connect() reports
+ * 'expected_error'. */
+static void
+test_accept_then_close(const char *type, int expected_error)
+{
+ struct fake_pvconn fpv;
+ struct vconn *vconn;
+
+ fpv_create(type, &fpv);
+ assert(!vconn_open(fpv.vconn_name, OFP_VERSION, &vconn));
+ close(fpv_accept(&fpv));
+ fpv_close(&fpv);
+ assert(vconn_connect(vconn) == expected_error);
+ vconn_close(vconn);
+ fpv_destroy(&fpv);
+}
+
+/* Connects to a fake_pvconn with vconn_open(), accepts that connection and
+ * reads the hello message from it, then closes the connection and verifies
+ * that vconn_connect() reports 'expected_error'. */
+static void
+test_read_hello(const char *type, int expected_error)
+{
+ struct fake_pvconn fpv;
+ struct vconn *vconn;
+ int fd;
+
+ fpv_create(type, &fpv);
+ assert(!vconn_open(fpv.vconn_name, OFP_VERSION, &vconn));
+ fd = fpv_accept(&fpv);
+ fpv_destroy(&fpv);
+ assert(!set_nonblocking(fd));
+ for (;;) {
+ struct ofp_header hello;
+ int retval;
+
+ retval = read(fd, &hello, sizeof hello);
+ if (retval == sizeof hello) {
+ assert(hello.version == OFP_VERSION);
+ assert(hello.type == OFPT_HELLO);
+ assert(hello.length == htons(sizeof hello));
+ break;
+ } else {
+ assert(errno == EAGAIN);
+ }
+
+ assert(vconn_connect(vconn) == EAGAIN);
+ vconn_connect_wait(vconn);
+ poll_fd_wait(fd, POLLIN);
+ poll_block();
+ }
+ close(fd);
+ assert(vconn_connect(vconn) == expected_error);
+ vconn_close(vconn);
+}
+
+/* Connects to a fake_pvconn with vconn_open(), accepts that connection and
+ * sends the 'out' bytes in 'out_size' to it (presumably an OFPT_HELLO
+ * message), then verifies that vconn_connect() reports
+ * 'expect_connect_error'. */
+static void
+test_send_hello(const char *type, const void *out, size_t out_size,
+ int expect_connect_error)
+{
+ struct fake_pvconn fpv;
+ struct vconn *vconn;
+ bool read_hello, connected;
+ struct ofpbuf *msg;
+ int fd;
+
+ fpv_create(type, &fpv);
+ assert(!vconn_open(fpv.vconn_name, OFP_VERSION, &vconn));
+ fd = fpv_accept(&fpv);
+ fpv_destroy(&fpv);
+
+ write(fd, out, out_size);
+
+ assert(!set_nonblocking(fd));
+
+ read_hello = connected = false;
+ for (;;) {
+ if (!read_hello) {
+ struct ofp_header hello;
+ int retval = read(fd, &hello, sizeof hello);
+ if (retval == sizeof hello) {
+ assert(hello.version == OFP_VERSION);
+ assert(hello.type == OFPT_HELLO);
+ assert(hello.length == htons(sizeof hello));
+ read_hello = true;
+ } else {
+ assert(errno == EAGAIN);
+ }
+ }
+
+ if (!connected) {
+ int error = vconn_connect(vconn);
+ if (error == expect_connect_error) {
+ if (!error) {
+ connected = true;
+ } else {
+ close(fd);
+ vconn_close(vconn);
+ return;
+ }
+ } else {
+ assert(error == EAGAIN);
+ }
+ }
+
+ if (read_hello && connected) {
+ break;
+ }
+
+ if (!connected) {
+ vconn_connect_wait(vconn);
+ }
+ if (!read_hello) {
+ poll_fd_wait(fd, POLLIN);
+ }
+ poll_block();
+ }
+ close(fd);
+ assert(vconn_recv(vconn, &msg) == EOF);
+ vconn_close(vconn);
+}
+
+/* Try connecting and sending a normal hello, which should succeed. */
+static void
+test_send_plain_hello(const char *type)
+{
+ struct ofp_header hello;
+
+ hello.version = OFP_VERSION;
+ hello.type = OFPT_HELLO;
+ hello.length = htons(sizeof hello);
+ hello.xid = htonl(0x12345678);
+ test_send_hello(type, &hello, sizeof hello, 0);
+}
+
+/* Try connecting and sending an extra-long hello, which should succeed (since
+ * the specification says that implementations must accept and ignore extra
+ * data). */
+static void
+test_send_long_hello(const char *type)
+{
+ struct ofp_header hello;
+ char buffer[sizeof hello * 2];
+
+ hello.version = OFP_VERSION;
+ hello.type = OFPT_HELLO;
+ hello.length = htons(sizeof buffer);
+ hello.xid = htonl(0x12345678);
+ memset(buffer, 0, sizeof buffer);
+ memcpy(buffer, &hello, sizeof hello);
+ test_send_hello(type, buffer, sizeof buffer, 0);
+}
+
+/* Try connecting and sending an echo request instead of a hello, which should
+ * fail with EPROTO. */
+static void
+test_send_echo_hello(const char *type)
+{
+ struct ofp_header echo;
+
+ echo.version = OFP_VERSION;
+ echo.type = OFPT_ECHO_REQUEST;
+ echo.length = htons(sizeof echo);
+ echo.xid = htonl(0x89abcdef);
+ test_send_hello(type, &echo, sizeof echo, EPROTO);
+}
+
+/* Try connecting and sending a hello packet that has its length field as 0,
+ * which should fail with EPROTO. */
+static void
+test_send_short_hello(const char *type)
+{
+ struct ofp_header hello;
+
+ memset(&hello, 0, sizeof hello);
+ test_send_hello(type, &hello, sizeof hello, EPROTO);
+}
+
+/* Try connecting and sending a hello packet that has a bad version, which
+ * should fail with EPROTO. */
+static void
+test_send_invalid_version_hello(const char *type)
+{
+ struct ofp_header hello;
+
+ hello.version = OFP_VERSION - 1;
+ hello.type = OFPT_HELLO;
+ hello.length = htons(sizeof hello);
+ hello.xid = htonl(0x12345678);
+ test_send_hello(type, &hello, sizeof hello, EPROTO);
+}
+
+int
+main(int argc UNUSED, char *argv[])
+{
+ set_program_name(argv[0]);
+ time_init();
+ vlog_init();
+ signal(SIGPIPE, SIG_IGN);
+ vlog_set_levels(VLM_ANY_MODULE, VLF_ANY_FACILITY, VLL_EMER);
+
+ time_alarm(10);
+
+ test_refuse_connection("unix", EPIPE);
+ test_refuse_connection("tcp", ECONNRESET);
+
+ test_accept_then_close("unix", EPIPE);
+ test_accept_then_close("tcp", ECONNRESET);
+
+ test_read_hello("unix", ECONNRESET);
+ test_read_hello("tcp", ECONNRESET);
+
+ test_send_plain_hello("unix");
+ test_send_plain_hello("tcp");
+
+ test_send_long_hello("unix");
+ test_send_long_hello("tcp");
+
+ test_send_echo_hello("unix");
+ test_send_echo_hello("tcp");
+
+ test_send_short_hello("unix");
+ test_send_short_hello("tcp");
+
+ test_send_invalid_version_hello("unix");
+ test_send_invalid_version_hello("tcp");
+
+ return 0;
+}
--- /dev/null
+AT_INIT
+
+AT_COPYRIGHT([Copyright (c) 2009 Nicira Networks.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.])
+
+AT_TESTED([ovs-vswitchd])
+AT_TESTED([ovs-vsctl])
+
+m4_include([tests/lcov-pre.at])
+m4_include([tests/library.at])
+m4_include([tests/stp.at])
+m4_include([tests/ovs-vsctl.at])
+m4_include([tests/lcov-post.at])
/ovs-kill.8
/ovs-ofctl
/ovs-ofctl.8
+/ovs-openflowd
+/ovs-openflowd.8
/ovs-parse-leaks
/ovs-pki
/ovs-pki-cgi
utilities/ovs-dpctl \
utilities/ovs-kill \
utilities/ovs-ofctl \
+ utilities/ovs-openflowd \
utilities/ovs-wdt
noinst_PROGRAMS += utilities/nlmon
-bin_SCRIPTS += utilities/ovs-pki
+bin_SCRIPTS += utilities/ovs-pki utilities/ovs-vsctl
noinst_SCRIPTS += utilities/ovs-pki-cgi utilities/ovs-parse-leaks
dist_sbin_SCRIPTS += utilities/ovs-monitor
utilities/ovs-dpctl.8.in \
utilities/ovs-kill.8.in \
utilities/ovs-ofctl.8.in \
+ utilities/ovs-openflowd.8.in \
utilities/ovs-parse-leaks.in \
utilities/ovs-pki-cgi.in \
utilities/ovs-pki.8.in \
- utilities/ovs-pki.in
+ utilities/ovs-pki.in \
+ utilities/ovs-vsctl.8.in \
+ utilities/ovs-vsctl.in
DISTCLEANFILES += \
utilities/ovs-appctl.8 \
utilities/ovs-cfg-mod.8 \
utilities/ovs-dpctl.8 \
utilities/ovs-kill.8 \
utilities/ovs-ofctl.8 \
+ utilities/ovs-openflowd.8 \
utilities/ovs-parse-leaks \
utilities/ovs-pki \
+ utilities/ovs-pki-cgi \
utilities/ovs-pki.8 \
- utilities/ovs-pki-cgi
+ utilities/ovs-vsctl \
+ utilities/ovs-vsctl.8
man_MANS += \
utilities/ovs-appctl.8 \
utilities/ovs-dpctl.8 \
utilities/ovs-kill.8 \
utilities/ovs-ofctl.8 \
- utilities/ovs-pki.8
+ utilities/ovs-openflowd.8 \
+ utilities/ovs-pki.8 \
+ utilities/ovs-vsctl.8
utilities_ovs_appctl_SOURCES = utilities/ovs-appctl.c
utilities_ovs_appctl_LDADD = lib/libopenvswitch.a
utilities_ovs_ofctl_SOURCES = utilities/ovs-ofctl.c
utilities_ovs_ofctl_LDADD = lib/libopenvswitch.a $(FAULT_LIBS) $(SSL_LIBS)
+utilities_ovs_openflowd_SOURCES = utilities/ovs-openflowd.c
+utilities_ovs_openflowd_LDADD = \
+ ofproto/libofproto.a \
+ lib/libopenvswitch.a \
+ $(FAULT_LIBS) \
+ $(SSL_LIBS)
+
utilities_ovs_wdt_SOURCES = utilities/ovs-wdt.c
utilities_nlmon_SOURCES = utilities/nlmon.c
. ns
. IP "\\$1"
..
-.TH ovs\-appctl 8 "April 2009" "Open vSwitch" "Open vSwitch Manual"
+.TH ovs\-appctl 8 "November 2009" "Open vSwitch" "Open vSwitch Manual"
.ds PN ovs\-appctl
.SH NAME
ovs\-appctl \- utility for configuring running Open vSwitch daemons
.SH SYNOPSIS
-\fBovs\-appctl\fR [\fB-h\fR | \fB--help\fR] [\fItarget\fR...] [\fIaction\fR...]
-.sp 1
-The available \fItarget\fR options are:
+\fBovs\-appctl\fR [\fB--target=\fItarget\fR | \fB-t\fR \fItarget\fR]
+\fIcommand \fR[\fIarg\fR...]
.br
-[\fB-t\fR \fIsocket\fR | \fB--target=\fIsocket\fR]
-.sp 1
-The available \fIaction\fR options are:
+\fBovs\-appctl\fR --help
.br
-[\fB-l\fR | \fB--list\fR] [\fB-s\fR
-\fImodule\fR[\fB:\fIfacility\fR[\fB:\fIlevel\fR]] |
-\fB--set=\fImodule\fR[\fB:\fIfacility\fR[\fB:\fIlevel\fR]]]
-[\fB-r\fR | \fB--reopen\fR]
-[\fB-e\fR | \fB--execute=\fIcommand\fR]
-
+\fBovs\-appctl\fR --version
.SH DESCRIPTION
-The \fBovs\-appctl\fR program connects to one or more running
-Open vSwitch daemons (such as \fBovs\-vswitchd\fR(8)), as specified by the
-user, and sends them commands to query or modify their behavior.
-Its primary purpose is currently to adjust daemons' logging levels.
-
-\fBovs\-appctl\fR applies one or more actions to each of one or more
-target processes. Targets may be specified using:
-
-.IP "\fB-t \fIsocket\fR"
-.IQ "\fB--target=\fIsocket\fR"
-The specified \fIsocket\fR must be the name of a Unix domain socket
-for a \fBovs\-appctl\fR-controllable process. If \fIsocket\fR does not
-begin with \fB/\fR, it is treated as relative to \fB@RUNDIR@\fR.
-
-Each Open vSwitch daemon by default creates a socket named
-\fB@RUNDIR@/\fIprogram\fB.\fIpid\fB.ctl\fR, where \fIprogram\fR is
-the program's name (such as \fBovs\-vswitchd\fR) and \fIpid\fR is the
-daemon's PID.
-
-.PP
-The available actions are:
-
-.IP "\fB-l\fR"
-.IQ "\fB--list\fR"
-Print the list of known logging modules and their current levels to
-stdout.
-
-.IP "\fB-s\fR \fImodule\fR[\fB:\fIfacility\fR[\fB:\fIlevel\fR]]"
-.IQ "\fB--set=\fImodule\fR[\fB:\fIfacility\fR[\fB:\fIlevel\fR]]"
-
+Open vSwitch daemons accept certain commands at runtime to control
+their behavior and query their settings. Every daemon accepts the
+commands for querying and adjusting its logging settings documented
+under \fBLOGGING COMMANDS\fR below, and \fBovs\-vswitchd\fR in
+particular accepts a number of additional commands documented in
+\fBovs\-vswitchd\fR(8).
+
+The \fBovs\-appctl\fR program provides a simple way to invoke these
+commands. The command to be sent is specified on \fBovs\-appctl\fR's
+command line as non-option arguments. \fBovs\-appctl\fR sends the
+command and prints the daemon's response on standard output.
+
+In normal use only a single option is accepted:
+
+.IP "\fB\-t \fItarget\fR"
+.IQ "\fB\-\-target=\fItarget\fR"
+Tells \fBovs\-appctl\fR which daemon to contact.
+.IP
+If \fItarget\fR begins with \fB/\fR it must name a Unix domain socket
+on which an Open vSwitch daemon is listening for control channel
+connections. By default, each daemon listens on a Unix domain socket
+named \fB@RUNDIR@/\fIprogram\fB.\fIpid\fB.ctl\fR, where \fIprogram\fR
+is the program's name and \fIpid\fR is its process ID. For example,
+if \fBovs-vswitchd\fR has PID 123, it would listen on
+\fB@RUNDIR@/ovs-vswitchd.123.ctl\fR.
+.IP
+Otherwise, \fBovs\-appctl\fR looks for a pidfile, that is, a file
+whose contents are the process ID of a running process as a decimal
+number, named \fB@RUNDIR@/\fItarget\fB.pid\fR. (The \fB\-\-pidfile\fR
+option makes an Open vSwitch daemon create a pidfile.)
+\fBovs\-appctl\fR reads the pidfile, then looks for a Unix socket
+named \fB@RUNDIR@/\fItarget\fB.\fIpid\fB.ctl\fR, where \fIpid\fR is
+replaced by the process ID read from the pidfile, and uses that file
+as if it had been specified directly as the target.
+.IP
+The default target is \fBovs\-vswitchd\fR.
+.
+.SH LOGGING COMMANDS
+Every Open vSwitch daemon supports the following commands for
+examining and adjusting log levels.
+.
+.IP "\fBvlog/list\fR"
+Lists the known logging modules and their current levels.
+.
+.IP "\fBvlog/set\fR \fImodule\fR[\fB:\fIfacility\fR[\fB:\fIlevel\fR]]"
Sets the logging level for \fImodule\fR in \fIfacility\fR to
\fIlevel\fR. The \fImodule\fR may be any valid module name (as
displayed by the \fB--list\fR option) or the special name \fBANY\fR to
\fBemer\fR, \fBerr\fR, \fBwarn\fR, \fBinfo\fR, or \fBdbg\fR, designating the
minimum severity of a message for it to be logged. If it is omitted,
\fIlevel\fR defaults to \fBdbg\fR.
-
-.IP "\fB-s PATTERN:\fIfacility\fB:\fIpattern\fR"
-.IQ "\fB--set=PATTERN:\fIfacility\fB:\fIpattern\fR"
-
+.
+.IP "\fBvlog/set PATTERN:\fIfacility\fB:\fIpattern\fR"
Sets the log pattern for \fIfacility\fR to \fIpattern\fR. Each time a
message is logged to \fIfacility\fR, \fIpattern\fR determines the
message's formatting. Most characters in \fIpattern\fR are copied
.RS
.IP \fB%A\fR
-The name of the application logging the message, e.g. \fBsecchan\fR.
+The name of the application logging the message, e.g. \fBovs-vswitchd\fR.
.IP \fB%c\fR
The name of the module (as shown by \fBovs\-appctl --list\fR) logging
The default pattern for console output is \fB%d{%b %d
%H:%M:%S}|%05N|%c|%p|%m\fR; for syslog output, \fB%05N|%c|%p|%m\fR.
-.IP \fB-r\fR
-.IQ \fB--reopen\fR
-Causes the target application to close and reopen its log file. (This
+.IP "\fBvlog/reopen\fR"
+Causes the daemon to close and reopen its log file. (This
is useful after rotating log files, to cause a new log file to be
used.)
This has no effect if the target application was not invoked with the
\fB--log-file\fR option.
-.IP "\fB-e \fIcommand\fR"
-.IQ "\fB--execute=\fIcommand\fR"
-Passes the specified \fIcommand\fR literally to the target application
-and prints its response to stdout, if successful, or to stderr if an
-error occurs. Use \fB-e help\fR to print a list of available commands.
-
.SH OPTIONS
.so lib/common.man
+.SH BUGS
+
+The protocol used to speak to Open vSwitch daemons does not contain a
+quoting mechanism, so command arguments should not generally contain
+white space.
+
.SH "SEE ALSO"
+\fBovs\-appctl\fR can control the following daemons:
+.BR ovs\-vswitchd (8),
+.BR ovs\-openflowd (8),
.BR ovs\-controller (8),
-.BR ovs\-dpctl (8),
-.BR secchan (8)
+.BR ovs\-brcompatd (8),
+.BR ovs\-discover (8).
* See the License for the specific language governing permissions and
* limitations under the License.
*/
+
#include <config.h>
-#include "vlog.h"
-#include <dirent.h>
#include <errno.h>
#include <getopt.h>
-#include <stdarg.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "command-line.h"
-#include "compiler.h"
+#include "daemon.h"
+#include "dirs.h"
+#include "dynamic-string.h"
#include "timeval.h"
#include "unixctl.h"
#include "util.h"
-static void
-usage(char *prog_name, int exit_code)
-{
- printf("Usage: %s [TARGET] [ACTION...]\n"
- "Targets:\n"
- " -t, --target=TARGET Path to Unix domain socket\n"
- "Actions:\n"
- " -l, --list List current settings\n"
- " -s, --set=MODULE[:FACILITY[:LEVEL]]\n"
- " Set MODULE and FACILITY log level to LEVEL\n"
- " MODULE may be any valid module name or 'ANY'\n"
- " FACILITY may be 'syslog', 'console', 'file', or 'ANY' (default)\n"
- " LEVEL may be 'emer', 'err', 'warn', 'info', or 'dbg' (default)\n"
- " -r, --reopen Make the program reopen its log file\n"
- " -e, --execute=COMMAND Execute control COMMAND and print its output\n"
- "Other options:\n"
- " -h, --help Print this helpful information\n"
- " -V, --version Display version information\n",
- prog_name);
- exit(exit_code);
-}
+static void usage(void);
+static const char *parse_command_line(int argc, char *argv[]);
+static struct unixctl_client *connect_to_target(const char *target);
-static char *
-transact(struct unixctl_client *client, const char *request, bool *ok)
+int
+main(int argc, char *argv[])
{
- int code;
+ struct unixctl_client *client;
+ const char *target;
+ struct ds request;
+ int code, error;
char *reply;
- int error = unixctl_client_transact(client, request, &code, &reply);
- if (error) {
- fprintf(stderr, "%s: transaction error: %s\n",
- unixctl_client_target(client), strerror(error));
- *ok = false;
- return xstrdup("");
- } else {
- if (code / 100 != 2) {
- fprintf(stderr, "%s: server returned reply code %03d\n",
- unixctl_client_target(client), code);
+ int i;
+
+ set_program_name(argv[0]);
+ time_init();
+
+ /* Parse command line and connect to target. */
+ target = parse_command_line(argc, argv);
+ client = connect_to_target(target);
+
+ /* Compose request. */
+ ds_init(&request);
+ for (i = optind; i < argc; i++) {
+ if (i != optind) {
+ ds_put_char(&request, ' ');
}
- return reply;
+ ds_put_cstr(&request, argv[i]);
}
-}
-static void
-transact_ack(struct unixctl_client *client, const char *request, bool *ok)
-{
- free(transact(client, request, ok));
-}
-
-static void
-execute_command(struct unixctl_client *client, const char *request, bool *ok)
-{
- int code;
- char *reply;
- int error = unixctl_client_transact(client, request, &code, &reply);
+ /* Transact request and process reply. */
+ error = unixctl_client_transact(client, ds_cstr(&request), &code, &reply);
if (error) {
- fprintf(stderr, "%s: transaction error: %s\n",
- unixctl_client_target(client), strerror(error));
- *ok = false;
- } else {
- if (code / 100 != 2) {
- fprintf(stderr, "%s: server returned reply code %03d\n",
- unixctl_client_target(client), code);
- fputs(reply, stderr);
- *ok = false;
- } else {
- fputs(reply, stdout);
- }
+ ovs_fatal(error, "%s: transaction error", target);
+ }
+ if (code / 100 != 2) {
+ ovs_error(0, "%s: server returned reply code %03d", target, code);
+ exit(2);
}
+ fputs(reply, stdout);
+
+ return 0;
}
static void
-add_target(struct unixctl_client ***clients, size_t *n_clients,
- const char *path, bool *ok)
+usage(void)
{
- struct unixctl_client *client;
- int error = unixctl_client_create(path, &client);
- if (error) {
- fprintf(stderr, "Error connecting to \"%s\": %s\n",
- path, strerror(error));
- *ok = false;
- } else {
- *clients = xrealloc(*clients, sizeof *clients * (*n_clients + 1));
- (*clients)[*n_clients] = client;
- ++*n_clients;
- }
+ printf("%s, for querying and controlling Open vSwitch daemon\n"
+ "usage: %s [TARGET] COMMAND [ARG...]\n"
+ "Targets:\n"
+ " -t, --target=TARGET pidfile or socket to contact\n"
+ "Common commands:"
+ " help List commands supported by the target\n"
+ " vlog/list List current logging levels\n"
+ " vlog/set MODULE[:FACILITY[:LEVEL]]\n"
+ " Set MODULE and FACILITY log level to LEVEL\n"
+ " MODULE may be any valid module name or 'ANY'\n"
+ " FACILITY may be 'syslog', 'console', 'file', or 'ANY' (default)\n"
+ " LEVEL may be 'emer', 'err', 'warn', 'info', or 'dbg' (default)\n"
+ " vlog/reopen Make the program reopen its log file\n"
+ "Other options:\n"
+ " -h, --help Print this helpful information\n"
+ " -V, --version Display version information\n",
+ program_name, program_name);
+ exit(EXIT_SUCCESS);
}
-int main(int argc, char *argv[])
+static const char *
+parse_command_line(int argc, char *argv[])
{
static const struct option long_options[] = {
- /* Target options must come first. */
{"target", required_argument, NULL, 't'},
+ {"execute", no_argument, NULL, 'e'},
{"help", no_argument, NULL, 'h'},
{"version", no_argument, NULL, 'V'},
-
- /* Action options come afterward. */
- {"list", no_argument, NULL, 'l'},
- {"set", required_argument, NULL, 's'},
- {"reopen", no_argument, NULL, 'r'},
- {"execute", required_argument, NULL, 'e'},
{0, 0, 0, 0},
};
- char *short_options;
-
- /* Determine targets. */
- bool ok = true;
- int n_actions = 0;
- struct unixctl_client **clients = NULL;
- size_t n_clients = 0;
+ const char *target;
+ int e_options;
- set_program_name(argv[0]);
- time_init();
-
- short_options = long_options_to_short_options(long_options);
+ target = NULL;
+ e_options = 0;
for (;;) {
int option;
- size_t i;
- option = getopt_long(argc, argv, short_options, long_options, NULL);
+ option = getopt_long(argc, argv, "+t:hVe", long_options, NULL);
if (option == -1) {
break;
}
- if (!strchr("thV", option) && n_clients == 0) {
- ovs_fatal(0, "no targets specified (use --help for help)");
- } else {
- ++n_actions;
- }
switch (option) {
case 't':
- add_target(&clients, &n_clients, optarg, &ok);
- break;
-
- case 'l':
- for (i = 0; i < n_clients; i++) {
- struct unixctl_client *client = clients[i];
- char *reply;
-
- printf("%s:\n", unixctl_client_target(client));
- reply = transact(client, "vlog/list", &ok);
- fputs(reply, stdout);
- free(reply);
- }
- break;
-
- case 's':
- for (i = 0; i < n_clients; i++) {
- struct unixctl_client *client = clients[i];
- char *request = xasprintf("vlog/set %s", optarg);
- transact_ack(client, request, &ok);
- free(request);
- }
- break;
-
- case 'r':
- for (i = 0; i < n_clients; i++) {
- struct unixctl_client *client = clients[i];
- char *request = xstrdup("vlog/reopen");
- transact_ack(client, request, &ok);
- free(request);
+ if (target) {
+ ovs_fatal(0, "-t or --target may be specified only once");
}
+ target = optarg;
break;
case 'e':
- for (i = 0; i < n_clients; i++) {
- execute_command(clients[i], optarg, &ok);
+ /* We ignore -e for compatibility. Older versions specified the
+ * command as the argument to -e. Since the current version takes
+ * the command as non-option arguments and we say that -e has no
+ * arguments, this just works in the common case. */
+ if (e_options++) {
+ ovs_fatal(0, "-e or --execute may be speciifed only once");
}
break;
case 'h':
- usage(argv[0], EXIT_SUCCESS);
+ usage();
break;
case 'V':
NOT_REACHED();
}
}
- if (!n_actions) {
- fprintf(stderr,
- "warning: no actions specified (use --help for help)\n");
+
+ if (optind >= argc) {
+ ovs_fatal(0, "at least one non-option argument is required "
+ "(use --help for help)");
+ }
+
+ return target ? target : "ovs-vswitchd";
+}
+
+static struct unixctl_client *
+connect_to_target(const char *target)
+{
+ struct unixctl_client *client;
+ char *socket_name;
+ int error;
+
+ if (target[0] != '/') {
+ char *pidfile_name;
+ char *socket_name;
+ pid_t pid;
+
+ pidfile_name = xasprintf("%s/%s.pid", ovs_rundir, target);
+ pid = read_pidfile(pidfile_name);
+ if (pid < 0) {
+ ovs_fatal(-pid, "cannot read pidfile \"%s\"", pidfile_name);
+ }
+ free(pidfile_name);
+ socket_name = xasprintf("%s/%s.%ld.ctl",
+ ovs_rundir, target, (long int) pid);
+ } else {
+ socket_name = xstrdup(target);
+ }
+
+ error = unixctl_client_create(socket_name, &client);
+ if (error) {
+ ovs_fatal(error, "cannot connect to \"%s\"", socket_name);
}
- exit(ok ? 0 : 1);
+ free(socket_name);
+
+ return client;
}
+
one or more of the following OpenFlow connection methods:
.TP
-\fBpssl:\fR[\fIport\fR]
+\fBpssl:\fR[\fIport\fR][\fB:\fIip\fR]
Listens for SSL connections from remote OpenFlow switches on
\fIport\fR (default: 6633). The \fB--private-key\fR,
\fB--certificate\fR, and \fB--ca-cert\fR options are mandatory when
this form is used.
+By default, \fB\*(PN\fR listens for connections to any local IP
+address, but \fIip\fR may be specified to listen only for connections
+to the given \fIip\fR.
.TP
-\fBptcp:\fR[\fIport\fR]
+\fBptcp:\fR[\fIport\fR][\fB:\fIip\fR]
Listens for TCP connections from remote OpenFlow switches on
\fIport\fR (default: 6633).
+By default, \fB\*(PN\fR listens for connections to any local IP
+address, but \fIip\fR may be specified to listen only for connections
+to the given \fIip\fR.
.TP
\fBpunix:\fIfile\fR
confidence in the controller's identity. However, this option allows
a newly installed switch to obtain the controller CA certificate on
first boot using, e.g., the \fB--bootstrap-ca-cert\fR option to
-\fBsecchan\fR(8).
+\fBovs\-openflowd\fR(8).
.IP "\fB-n\fR, \fB--noflow\fR"
By default, \fBovs\-controller\fR sets up a flow in each OpenFlow switch
This option affects only flows set up by the OpenFlow controller. In
some configurations, the switch can set up some flows
on its own. To set the idle time for those flows, pass
-\fB--max-idle\fR to \fBsecchan\fR (on the switch).
+\fB--max-idle\fR to \fBovs\-openflowd\fR (on the switch).
This option has no effect when \fB-n\fR (or \fB--noflow\fR) is in use
(because the controller does not set up flows in that case).
.SH "SEE ALSO"
-.BR secchan (8),
+.BR ovs\-openflowd (8),
.BR ovs\-appctl (8),
.BR ovs\-dpctl (8)
reply that has the same vendor class identifier and includes a
vendor-specific option with code 1 whose contents are a string
specifying the location of the controller in the same format used on
-the \fBsecchan\fR command line (e.g. \fBssl:192.168.0.1\fR).
+the \fBovs\-openflowd\fR command line (e.g. \fBssl:192.168.0.1\fR).
When \fBovs\-discover\fR receives an acceptable response, it prints
the details of the response on \fBstdout\fR. Then, by default, it
.SH OPTIONS
.TP
\fB--accept-vconn=\fIregex\fR
-By default, \fBovs\-discover\fR accepts any controller location
-advertised over DHCP. With this option, only controllers whose names
-match POSIX extended regular expression \fIregex\fR will be accepted.
-Specifying \fBssl:.*\fR for \fIregex\fR, for example, would cause only
-SSL controller connections to be accepted.
+With this option, only controllers whose names match POSIX extended
+regular expression \fIregex\fR will be accepted. Specifying
+\fBssl:.*\fR for \fIregex\fR, for example, would cause only SSL
+controller connections to be accepted.
The \fIregex\fR is implicitly anchored at the beginning of the
controller location string, as if it begins with \fB^\fR.
+When this option is not given, the default \fIregex\fR is
+\fBtcp:.*\fR.
.TP
\fB--exit-without-bind\fR
By default, \fBovs\-discover\fR binds the network device that receives
.SH "SEE ALSO"
-.BR secchan (8),
-.BR ovs-pki (8)
+.BR ovs\-openflowd (8),
+.BR ovs\-pki (8)
/* --accept-vconn: Regular expression specifying the class of controller vconns
* that we will accept during autodiscovery. */
-static const char *accept_controller_re = ".*";
+static const char *accept_controller_re = "tcp:.*";
static regex_t accept_controller_regex;
/* --exit-without-bind: Exit after discovering the controller, without binding
Creates datapath \fIdp\fR. The name of the new datapath's local port
depends on how \fIdp\fR is specified: if it takes the form
-\fBdp\fIN\fR, the local port will be named \fBdp\fIN\fR; if \fIdp\fR
-is \fBnl:\fI, the local port will be named \fBof\fIN\fR; otherwise,
+\fBdp\fIN\fR, the local port will be named \fBdp\fIN\fR; otherwise,
the local port's name will be \fIdp\fR.
This will fail if the host already has 256 datapaths, if a network
device with the same name as the new datapath's local port already
-exists, or if \fIdp\fR is given in the form \fBdp\fIN\fR or
-\fBnl:\fIN\fR and a datapath numbered \fIN\fR already exists.
+exists, or if \fIdp\fR is given in the form \fBdp\fIN\fR
+and a datapath numbered \fIN\fR already exists.
If \fInetdev\fRs are specified, \fBovs\-dpctl\fR adds them to the datapath.
The following options are currently supported:
.RS
-.IP "\fBport=\fIportno\fR"
-Specifies \fIportno\fR (a number between 1 and 255) as the port number
-at which \fInetdev\fR will be attached. By default, \fBadd\-if\fR
-automatically selects the lowest available port number.
-
.IP "\fBinternal\fR"
Instead of attaching an existing \fInetdev\fR, creates an internal
port (analogous to the local port) with that name.
.IP "\fBdump-groups \fIdp\fR"
Prints to the console the sets of port groups maintained by datapath
\fIdp\fR. Ordinarily there are at least 2 port groups in a datapath
-that \fBsecchan\fR or \fBvswitch\fR is controlling: group 0 contains
+that \fBovs\-openflowd\fR or \fBovs\-vswitch\fR is controlling: group
+0 contains
all ports except those disabled by STP, and group 1 contains all
-ports. Additional groups might be used in the future.
+ports. Additional or different groups might be used in the future.
This command is primarily useful for debugging Open vSwitch. OpenFlow
does not have a concept of port groups.
Adds two network devices to the new datapath.
.PP
-At this point one would ordinarily start \fBsecchan\fR(8) on
+At this point one would ordinarily start \fBovs\-openflowd\fR(8) on
\fBdp0\fR, transforming \fBdp0\fR into an OpenFlow switch. Then, when
the switch and the datapath is no longer needed:
.SH "SEE ALSO"
-.BR secchan (8),
.BR ovs\-appctl (8),
+.BR ovs\-openflowd (8),
.BR ovs\-vswitchd (8)
static void
do_add_dp(int argc UNUSED, char *argv[])
{
- struct dpif dpif;
+ struct dpif *dpif;
run(dpif_create(argv[1], &dpif), "add_dp");
- dpif_close(&dpif);
+ dpif_close(dpif);
if (argc > 2) {
do_add_if(argc, argv);
}
static void
do_del_dp(int argc UNUSED, char *argv[])
{
- struct dpif dpif;
+ struct dpif *dpif;
run(dpif_open(argv[1], &dpif), "opening datapath");
- run(dpif_delete(&dpif), "del_dp");
- dpif_close(&dpif);
+ run(dpif_delete(dpif), "del_dp");
+ dpif_close(dpif);
}
static int
qsort(*ports, *n_ports, sizeof **ports, compare_ports);
}
-static uint16_t
-get_free_port(struct dpif *dpif)
-{
- struct odp_port *ports;
- size_t n_ports;
- int port_no;
-
- query_ports(dpif, &ports, &n_ports);
- for (port_no = 0; port_no <= UINT16_MAX; port_no++) {
- size_t i;
- for (i = 0; i < n_ports; i++) {
- if (ports[i].port == port_no) {
- goto next_portno;
- }
- }
- free(ports);
- return port_no;
-
- next_portno: ;
- }
- ovs_fatal(0, "no free datapath ports");
-}
-
static void
do_add_if(int argc UNUSED, char *argv[])
{
bool failure = false;
- struct dpif dpif;
+ struct dpif *dpif;
int i;
run(dpif_open(argv[1], &dpif), "opening datapath");
for (i = 2; i < argc; i++) {
char *save_ptr = NULL;
char *devname, *suboptions;
- int port = -1;
int flags = 0;
int error;
suboptions = strtok_r(NULL, "", &save_ptr);
if (suboptions) {
enum {
- AP_PORT,
AP_INTERNAL
};
static char *options[] = {
- "port",
"internal"
};
char *value;
switch (getsubopt(&suboptions, options, &value)) {
- case AP_PORT:
- if (!value) {
- ovs_error(0, "'port' suboption requires a value");
- }
- port = atoi(value);
- break;
-
case AP_INTERNAL:
flags |= ODP_PORT_INTERNAL;
break;
}
}
}
- if (port < 0) {
- port = get_free_port(&dpif);
- }
- error = dpif_port_add(&dpif, devname, port, flags);
+ error = dpif_port_add(dpif, devname, flags, NULL);
if (error) {
- ovs_error(error, "adding %s as port %"PRIu16" of %s failed",
- devname, port, argv[1]);
+ ovs_error(error, "adding %s to %s failed", devname, argv[1]);
failure = true;
} else if (if_up(devname)) {
failure = true;
}
}
- dpif_close(&dpif);
+ dpif_close(dpif);
if (failure) {
exit(EXIT_FAILURE);
}
do_del_if(int argc UNUSED, char *argv[])
{
bool failure = false;
- struct dpif dpif;
+ struct dpif *dpif;
int i;
run(dpif_open(argv[1], &dpif), "opening datapath");
if (!name[strspn(name, "0123456789")]) {
port = atoi(name);
- } else if (!get_port_number(&dpif, name, &port)) {
+ } else if (!get_port_number(dpif, name, &port)) {
failure = true;
continue;
}
- error = dpif_port_del(&dpif, port);
+ error = dpif_port_del(dpif, port);
if (error) {
ovs_error(error, "deleting port %s from %s failed", name, argv[1]);
failure = true;
}
}
- dpif_close(&dpif);
+ dpif_close(dpif);
if (failure) {
exit(EXIT_FAILURE);
}
size_t n_ports;
size_t i;
- printf("dp%u:\n", dpif_id(dpif));
+ printf("%s:\n", dpif_name(dpif));
if (!dpif_get_dp_stats(dpif, &stats)) {
printf("\tflows: cur:%"PRIu32", soft-max:%"PRIu32", "
"hard-max:%"PRIu32"\n",
int i;
for (i = 1; i < argc; i++) {
const char *name = argv[i];
- struct dpif dpif;
+ struct dpif *dpif;
int error;
error = dpif_open(name, &dpif);
if (!error) {
- show_dpif(&dpif);
+ show_dpif(dpif);
} else {
ovs_error(error, "opening datapath %s failed", name);
failure = true;
unsigned int i;
for (i = 0; i < ODP_MAX; i++) {
char name[128];
- struct dpif dpif;
+ struct dpif *dpif;
int error;
sprintf(name, "dp%u", i);
error = dpif_open(name, &dpif);
if (!error) {
- show_dpif(&dpif);
+ show_dpif(dpif);
} else if (error != ENODEV) {
ovs_error(error, "opening datapath %s failed", name);
failure = true;
error = dp_enumerate(&all_dps);
for (i = 0; i < all_dps.n; i++) {
- struct dpif dpif;
- char dpif_name[IF_NAMESIZE];
-
- if (dpif_open(all_dps.names[i], &dpif)) {
- continue;
- }
- if (!dpif_get_name(&dpif, dpif_name, sizeof dpif_name)) {
- printf("%s\n", dpif_name);
+ struct dpif *dpif;
+ if (!dpif_open(all_dps.names[i], &dpif)) {
+ printf("%s\n", dpif_name(dpif));
+ dpif_close(dpif);
}
- dpif_close(&dpif);
}
svec_destroy(&all_dps);
do_dump_flows(int argc UNUSED, char *argv[])
{
struct odp_flow *flows;
- struct dpif dpif;
+ struct dpif *dpif;
size_t n_flows;
struct ds ds;
size_t i;
run(dpif_open(argv[1], &dpif), "opening datapath");
- run(dpif_flow_list_all(&dpif, &flows, &n_flows), "listing all flows");
+ run(dpif_flow_list_all(dpif, &flows, &n_flows), "listing all flows");
ds_init(&ds);
for (i = 0; i < n_flows; i++) {
f->actions = actions;
f->n_actions = MAX_ACTIONS;
- dpif_flow_get(&dpif, f);
+ dpif_flow_get(dpif, f);
ds_clear(&ds);
format_odp_flow(&ds, f);
printf("%s\n", ds_cstr(&ds));
}
ds_destroy(&ds);
- dpif_close(&dpif);
+ dpif_close(dpif);
}
static void
do_del_flows(int argc UNUSED, char *argv[])
{
- struct dpif dpif;
+ struct dpif *dpif;
run(dpif_open(argv[1], &dpif), "opening datapath");
- run(dpif_flow_flush(&dpif), "deleting all flows");
- dpif_close(&dpif);
+ run(dpif_flow_flush(dpif), "deleting all flows");
+ dpif_close(dpif);
}
static void
do_dump_groups(int argc UNUSED, char *argv[])
{
struct odp_stats stats;
- struct dpif dpif;
+ struct dpif *dpif;
unsigned int i;
run(dpif_open(argv[1], &dpif), "opening datapath");
- run(dpif_get_dp_stats(&dpif, &stats), "get datapath stats");
+ run(dpif_get_dp_stats(dpif, &stats), "get datapath stats");
for (i = 0; i < stats.max_groups; i++) {
- uint16_t ports[UINT16_MAX];
+ uint16_t *ports;
size_t n_ports;
- if (!dpif_port_group_get(&dpif, i, ports,
- ARRAY_SIZE(ports), &n_ports) && n_ports) {
+ if (!dpif_port_group_get(dpif, i, &ports, &n_ports) && n_ports) {
size_t j;
printf("group %u:", i);
}
printf("\n");
}
+ free(ports);
}
- dpif_close(&dpif);
+ dpif_close(dpif);
}
static void
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
-SECCHAN_PID=/var/run/secchan.pid
-SECCHAN_SOCK=/var/run/secchan.mgmt
+OPENFLOWD_PID=/var/run/ovs-openflowd.pid
+OPENFLOWD_SOCK=/var/run/ovs-openflowd.mgmt
LOG_FILE=/var/log/openflow/monitor
INTERVAL=1
FAIL_THRESH=3
echo
echo "OPTIONS:"
echo " -h Show this message"
- echo " -p PID file for secchan (default: $SECCHAN_PID)"
- echo " -s Unix socket for secchan (default: $SECCHAN_SOCK)"
+ echo " -p PID file for ovs-openflowd (default: $OPENFLOWD_PID)"
+ echo " -s Unix socket for ovs-openflowd (default: $OPENFLOWD_SOCK)"
echo " -l File to log messages (default: $LOG_FILE)"
echo " -i Interval to send probes in seconds (default: $INTERVAL)"
echo " -c Number of failed probes before reboot (default: $FAIL_THRESH)"
;;
p)
- SECCHAN_PID=$OPTARG
+ OPENFLOWD_PID=$OPTARG
;;
s)
- SECCHAN_SOCK=$OPTARG
+ OPENFLOWD_SOCK=$OPTARG
;;
l)
done
-if [ ! -f $SECCHAN_PID ]; then
- log "No secchan pid file: ${SECCHAN_PID}"
- echo "No secchan pid file: ${SECCHAN_PID}"
+if [ ! -f $OPENFLOWD_PID ]; then
+ log "No ovs-openflowd pid file: ${OPENFLOWD_PID}"
+ echo "No ovs-openflowd pid file: ${OPENFLOWD_PID}"
fi
-if [ ! -S $SECCHAN_SOCK ]; then
- log "No secchan sock file: ${SECCHAN_SOCK}"
- echo "No secchan sock file: ${SECCHAN_SOCK}"
+if [ ! -S $OPENFLOWD_SOCK ]; then
+ log "No ovs-openflowd sock file: ${OPENFLOWD_SOCK}"
+ echo "No ovs-openflowd sock file: ${OPENFLOWD_SOCK}"
fi
if [ ! -d `dirname $LOG_FILE` ]; then
fi
let DP_DOWN=0
-let SECCHAN_DOWN=0
+let OPENFLOWD_DOWN=0
log "===== Starting Monitor ===="
while `/bin/true`; do
- # Only check for liveness if the secchan's PID file exists. The PID
- # file is removed when secchan is brought down gracefully.
- if [ -f $SECCHAN_PID ]; then
- pid=`cat $SECCHAN_PID`
+ # Only check for liveness if ovs-openflowd's PID file exists. The PID
+ # file is removed when ovs-openflowd is brought down gracefully.
+ if [ -f $OPENFLOWD_PID ]; then
+ pid=`cat $OPENFLOWD_PID`
if [ -d /proc/$pid ]; then
- # Check if the secchan and datapath still can communicate
- if [ -S $SECCHAN_SOCK ]; then
- ovs-ofctl probe -t 2 unix:$SECCHAN_SOCK
+ # Check if the ovs-openflowd and datapath still can communicate
+ if [ -S $OPENFLOWD_SOCK ]; then
+ ovs-ofctl probe -t 2 unix:$OPENFLOWD_SOCK
if [ $? -ne 0 ]; then
log "datapath probe failed"
let DP_DOWN++
let DP_DOWN=0
fi
fi
- let SECCHAN_DOWN=0
+ let OPENFLOWD_DOWN=0
else
- log "secchan probe failed"
- let SECCHAN_DOWN++
+ log "ovs-openflowd probe failed"
+ let OPENFLOWD_DOWN++
fi
fi
- if [ $SECCHAN_DOWN -ge $FAIL_THRESH ]; then
- log "Failed to probe secchan after ${SECCHAN_DOWN} tries...rebooting!"
+ if [ $OPENFLOWD_DOWN -ge $FAIL_THRESH ]; then
+ log "Failed to probe ovs-openflowd after ${OPENFLOWD_DOWN} tries...rebooting!"
reboot
fi
-.TH ovs\-ofctl 8 "March 2009" "Open vSwitch" "Open vSwitch Manual"
+.TH ovs\-ofctl 8 "June 2009" "Open vSwitch" "Open vSwitch Manual"
.ds PN ovs\-ofctl
.SH NAME
\fBmonitor \fIswitch\fR [\fImiss-len\fR [\fIsend-exp]]
Connects to \fIswitch\fR and prints to the console all OpenFlow
messages received. Usually, \fIswitch\fR should specify a connection
-named on \fBsecchan\fR(8)'s \fB-l\fR or \fB--listen\fR command line
+named on \fBovs\-openflowd\fR(8)'s \fB-l\fR or \fB--listen\fR command line
option.
If \fImiss-len\fR is provided, \fBovs\-ofctl\fR sends an OpenFlow ``set
displayed by \fBovs\-ofctl show\fR.
.IP \fBdl_vlan=\fIvlan\fR
-Matches IEEE 802.1q virtual LAN tag \fIvlan\fR. Specify \fB0xffff\fR
-as \fIvlan\fR to match packets that are not tagged with a virtual LAN;
+Matches IEEE 802.1q Virtual LAN tag \fIvlan\fR. Specify \fB0xffff\fR
+as \fIvlan\fR to match packets that are not tagged with a Virtual LAN;
otherwise, specify a number between 0 and 4095, inclusive, as the
12-bit VLAN ID to match.
.IP \fBlocal\fR
Outputs the packet on the ``local port,'' which corresponds to the
\fBof\fIn\fR network device (see \fBCONTACTING THE CONTROLLER\fR in
-\fBsecchan\fR(8) for information on the \fBof\fIn\fR network device).
+\fBovs\-openflowd\fR(8) for information on the \fBof\fIn\fR network device).
.IP \fBdrop\fR
Discards the packet, so no further processing or forwarding takes place.
host has been configured to listen for management connections on a
Unix domain socket named \fB@RUNDIR@/openflow.sock\fR, e.g. by
specifying \fB--listen=punix:@RUNDIR@/openflow.sock\fR on the
-\fBsecchan\fR(8) command line.
+\fBovs\-openflowd\fR(8) command line.
.TP
\fBovs\-ofctl dump-tables unix:@RUNDIR@/openflow.sock\fR
static void
open_vconn(const char *name, struct vconn **vconnp)
{
- struct dpif dpif;
+ struct dpif *dpif;
struct stat s;
if (strstr(name, ":")) {
char *socket_name;
char *vconn_name;
- run(dpif_get_name(&dpif, dpif_name, sizeof dpif_name),
+ run(dpif_port_get_name(dpif, ODPP_LOCAL, dpif_name, sizeof dpif_name),
"obtaining name of %s", dpif_name);
- dpif_close(&dpif);
+ dpif_close(dpif);
if (strcmp(dpif_name, name)) {
VLOG_INFO("datapath %s is named %s", name, dpif_name);
}
* packet to the controller. */
if (arg && (strspn(act, "0123456789") == strlen(act))) {
oao->max_len = htons(str_to_u32(arg));
+ } else {
+ oao->max_len = htons(UINT16_MAX);
}
} else if (parse_port_name(act, &port)) {
put_output_action(b, port);
--- /dev/null
+.TH ovs\-openflowd 8 "March 2009" "Open vSwitch" "Open vSwitch Manual"
+.ds PN ovs\-openflowd
+
+.SH NAME
+ovs\-openflowd \- OpenFlow switch implementation
+
+.SH SYNOPSIS
+.B ovs\-openflowd
+[\fIoptions\fR] \fIdatapath\fR [\fIcontroller\fR]
+
+.SH DESCRIPTION
+The \fBovs\-openflowd\fR program implements an OpenFlow switch using a
+flow-based datapath. \fBovs\-openflowd\fR connects to an OpenFlow controller
+over TCP or SSL.
+
+The mandatory \fIdatapath\fR argument argument specifies the local datapath
+to relay. It takes one of the following forms:
+
+.so lib/dpif.man
+
+.PP
+The optional \fIcontroller\fR argument specifies how to connect to
+the OpenFlow controller. It takes one of the following forms:
+
+.RS
+.IP "\fBssl:\fIip\fR[\fB:\fIport\fR]"
+The specified SSL \fIport\fR (default: 6633) on the host at the given
+\fIip\fR, which must be expressed as an IP address (not a DNS name).
+The \fB--private-key\fR, \fB--certificate\fR, and \fB--ca-cert\fR
+options are mandatory when this form is used.
+
+.IP "\fBtcp:\fIip\fR[\fB:\fIport\fR]"
+The specified TCP \fIport\fR (default: 6633) on the host at the given
+\fIip\fR, which must be expressed as an IP address (not a DNS name).
+
+.TP
+\fBunix:\fIfile\fR
+The Unix domain server socket named \fIfile\fR.
+.RE
+
+.PP
+If \fIcontroller\fR is omitted, \fBovs\-openflowd\fR attempts to discover the
+location of the controller automatically (see below).
+
+.SS "Contacting the Controller"
+The OpenFlow switch must be able to contact the OpenFlow controller
+over the network. It can do so in one of two ways:
+
+.IP out-of-band
+In this configuration, OpenFlow traffic uses a network separate from
+the data traffic that it controls, that is, the switch does not use
+any of the network devices added to the datapath with \fBovs\-dpctl
+add\-if\fR in its communication with the controller.
+
+To use \fBovs\-openflowd\fR in a network with out-of-band control, specify
+\fB--out-of-band\fR on the \fBovs\-openflowd\fR command line. The control
+network must be configured separately, before or after \fBovs\-openflowd\fR
+is started.
+
+.IP in-band
+In this configuration, a single network is used for OpenFlow traffic
+and other data traffic, that is, the switch contacts the controller
+over one of the network devices added to the datapath with \fBovs\-dpctl
+add\-if\fR. This configuration is often more convenient than
+out-of-band control, because it is not necessary to maintain two
+independent networks.
+
+In-band control is the default for \fBovs\-openflowd\fR, so no special
+command-line option is required.
+
+With in-band control, the location of the controller can be configured
+manually or discovered automatically:
+
+.RS
+.IP "controller discovery"
+To make \fBovs\-openflowd\fR discover the location of the controller
+automatically, do not specify the location of the controller on the
+\fBovs\-openflowd\fR command line.
+
+In this mode, \fBovs\-openflowd\fR will broadcast a DHCP request with vendor
+class identifier \fBOpenFlow\fR across the network devices added to
+the datapath with \fBovs\-dpctl add\-if\fR. It will accept any valid DHCP
+reply that has the same vendor class identifier and includes a
+vendor-specific option with code 1 whose contents are a string
+specifying the location of the controller in the same format used on
+the \fBovs\-openflowd\fR command line (e.g. \fBssl:192.168.0.1\fR).
+
+The DHCP reply may also, optionally, include a vendor-specific option
+with code 2 whose contents are a string specifying the URI to the base
+of the OpenFlow PKI (e.g. \fBhttp://192.168.0.1/openflow/pki\fR).
+This URI is used only for bootstrapping the OpenFlow PKI at initial
+switch setup; \fBovs\-openflowd\fR does not use it at all.
+
+The following ISC DHCP server configuration file assigns the IP
+address range 192.168.0.20 through 192.168.0.30 to OpenFlow switches
+that follow the switch protocol and addresses 192.168.0.1 through
+192.168.0.10 to all other DHCP clients:
+
+default-lease-time 600;
+.br
+max-lease-time 7200;
+.br
+option space openflow;
+.br
+option openflow.controller-vconn code 1 = text;
+.br
+option openflow.pki-uri code 2 = text;
+.br
+class "OpenFlow" {
+.br
+ match if option vendor-class-identifier = "OpenFlow";
+.br
+ vendor-option-space openflow;
+.br
+ option openflow.controller-vconn "tcp:192.168.0.10";
+.br
+ option openflow.pki-uri "http://192.168.0.10/openflow/pki";
+.br
+ option vendor-class-identifier "OpenFlow";
+.br
+}
+.br
+subnet 192.168.0.0 netmask 255.255.255.0 {
+.br
+ pool {
+.br
+ allow members of "OpenFlow";
+.br
+ range 192.168.0.20 192.168.0.30;
+.br
+ }
+.br
+ pool {
+.br
+ deny members of "OpenFlow";
+.br
+ range 192.168.0.1 192.168.0.10;
+.br
+ }
+.br
+}
+.br
+
+.IP "manual configuration"
+To configure in-band control manually, specify the location of the
+controller on the \fBovs\-openflowd\fR command line as the \fIcontroller\fR
+argument. You must also configure the network device for the OpenFlow
+``local port'' to allow \fBovs\-openflowd\fR to connect to that controller.
+The OpenFlow local port is a virtual network port that \fBovs\-openflowd\fR
+bridges to the physical switch ports. The name of the local port for
+a given \fIdatapath\fR may be seen by running \fBovs\-dpctl show
+\fIdatapath\fR; the local port is listed as port 0 in \fBshow\fR's
+output.
+
+.IP
+Before \fBovs\-openflowd\fR starts, the local port network device is not
+bridged to any physical network, so the next step depends on whether
+connectivity is required to configure the device's IP address. If the
+switch has a static IP address, you may configure its IP address now
+with a command such as
+.B ifconfig of0 192.168.1.1
+and then invoke \fBovs\-openflowd\fR.
+
+On the other hand, if the switch does not have a static IP address,
+e.g. it obtains its IP address dynamically via DHCP, the DHCP client
+will not be able to contact the DHCP server until the OpenFlow switch
+has started up. Thus, start \fBovs\-openflowd\fR without configuring
+the local port network device, and start the DHCP client afterward.
+.RE
+
+.SH OPTIONS
+.SS "Controller Discovery Options"
+.TP
+\fB--accept-vconn=\fIregex\fR
+When \fBovs\-openflowd\fR performs controller discovery (see \fBContacting
+the Controller\fR, above, for more information about controller
+discovery), it validates the controller location obtained via DHCP
+with a POSIX extended regular expression. Only controllers whose
+names match the regular expression will be accepted.
+
+The default regular expression is \fBssl:.*\fR (meaning that only SSL
+controller connections will be accepted) when any of the SSL
+configuration options \fB--private-key\fR, \fB--certificate\fR, or
+\fB--ca-cert\fR is specified. The default is \fB^tcp:.*\fR otherwise
+(meaning that only TCP controller connections will be accepted).
+
+The \fIregex\fR is implicitly anchored at the beginning of the
+controller location string, as if it begins with \fB^\fR.
+
+When controller discovery is not performed, this option has no effect.
+
+.TP
+\fB--no-resolv-conf\fR
+When \fBovs\-openflowd\fR performs controller discovery (see \fBContacting
+the Controller\fR, above, for more information about controller
+discovery), by default it overwrites the system's
+\fB/etc/resolv.conf\fR with domain information and DNS servers
+obtained via DHCP. If the location of the controller is specified
+using a hostname, rather than an IP address, and the network's DNS
+servers ever change, this behavior is essential. But because it also
+interferes with any administrator or process that manages
+\fB/etc/resolv.conf\fR, when this option is specified, \fBovs\-openflowd\fR
+will not modify \fB/etc/resolv.conf\fR.
+
+\fBovs\-openflowd\fR will only modify \fBresolv.conf\fR if the DHCP response
+that it receives specifies one or more DNS servers.
+
+When controller discovery is not performed, this option has no effect.
+
+.SS "Networking Options"
+.TP
+\fB--datapath-id=\fIdpid\fR
+Sets \fIdpid\fR, which must consist of exactly 12 hexadecimal digits,
+as the datapath ID that the switch will use to identify itself to the
+OpenFlow controller.
+
+If this option is omitted, the default datapath ID is taken from the
+Ethernet address of the datapath's local port (which is typically
+randomly generated).
+
+.TP
+\fB--mgmt-id=\fImgmtid\fR
+Sets \fImgmtid\fR, which must consist of exactly 12 hexadecimal
+digits, as the switch's management ID.
+
+If this option is omitted, the management ID defaults to 0, signaling
+to the controller that management is supported but not configured.
+
+.TP
+\fB--fail=\fR[\fBopen\fR|\fBclosed\fR]
+The controller is, ordinarily, responsible for setting up all flows on
+the OpenFlow switch. Thus, if the connection to the controller fails,
+no new network connections can be set up. If the connection to the
+controller stays down long enough, no packets can pass through the
+switch at all.
+
+If this option is set to \fBopen\fR (the default), \fBovs\-openflowd\fR will
+take over responsibility for setting up flows in the local datapath
+when no message has been received from the controller for three times
+the inactivity probe interval (see below), or 45 seconds by default.
+In this ``fail open'' mode, \fBovs\-openflowd\fR causes the datapath to act
+like an ordinary MAC-learning switch. \fBovs\-openflowd\fR will continue to
+retry connection to the controller in the background and, when the
+connection succeeds, it discontinues its fail-open behavior.
+
+If this option is set to \fBclosed\fR, then \fBovs\-openflowd\fR will not
+set up flows on its own when the controller connection fails.
+
+.TP
+\fB--inactivity-probe=\fIsecs\fR
+When the OpenFlow switch is connected to the controller, the
+switch waits for a message to be received from the controller for
+\fIsecs\fR seconds before it sends a inactivity probe to the
+controller. After sending the inactivity probe, if no response is
+received for an additional \fIsecs\fR seconds, the switch
+assumes that the connection has been broken and attempts to reconnect.
+The default and the minimum value are both 5 seconds.
+
+When fail-open mode is configured, changing the inactivity probe
+interval also changes the interval before entering fail-open mode (see
+above).
+
+.TP
+\fB--max-idle=\fIsecs\fR|\fBpermanent\fR
+Sets \fIsecs\fR as the number of seconds that a flow set up by the
+OpenFlow switch will remain in the switch's flow table without any
+matching packets being seen. If \fBpermanent\fR is specified, which
+is not recommended, flows set up by the switch will never
+expire. The default is 15 seconds.
+
+Most flows are set up by the OpenFlow controller, not by the
+switch. This option affects only the following flows, which the
+OpenFlow switch sets up itself:
+
+.RS
+.IP \(bu
+When \fB--fail=open\fR is specified, flows set up when the
+switch has not been able to contact the controller for the configured
+fail-open delay.
+
+.IP \(bu
+When in-band control is in use, flows set up to bootstrap contacting
+the controller (see \fBContacting the Controller\fR, above, for
+more information about in-band control).
+.RE
+
+.IP
+As a result, when both \fB--fail=closed\fR and \fB--out-of-band\fR are
+specified, this option has no effect.
+
+.TP
+\fB--max-backoff=\fIsecs\fR
+Sets the maximum time between attempts to connect to the controller to
+\fIsecs\fR, which must be at least 1. The actual interval between
+connection attempts starts at 1 second and doubles on each failing
+attempt until it reaches the maximum. The default maximum backoff
+time is 8 seconds.
+
+.TP
+\fB-l\fR, \fB--listen=\fImethod\fR
+Configures the switch to additionally listen for incoming OpenFlow
+connections for switch management with \fBovs\-ofctl\fR. The \fImethod\fR
+must be given as one of the passive OpenFlow connection methods listed
+below. This option may be specified multiple times to listen to
+multiple connection methods.
+
+.RS
+.TP
+\fBpssl:\fR[\fIport\fR][\fB:\fIip\fR]
+Listens for SSL connections on \fIport\fR (default: 6633). The
+\fB--private-key\fR, \fB--certificate\fR, and \fB--ca-cert\fR options
+are mandatory when this form is used.
+By default, \fB\*(PN\fR listens for connections to any local IP
+address, but \fIip\fR may be specified to listen only for connections
+to the given \fIip\fR.
+
+.TP
+\fBptcp:\fR[\fIport\fR][\fB:\fIip\fR]
+Listens for TCP connections on \fIport\fR (default: 6633).
+By default, \fB\*(PN\fR listens for connections to any local IP
+address, but \fIip\fR may be specified to listen only for connections
+to the given \fIip\fR.
+
+.TP
+\fBpunix:\fIfile\fR
+Listens for connections on Unix domain server socket named \fIfile\fR.
+.RE
+
+.TP
+\fB--snoop=\fImethod\fR
+Configures the switch to additionally listen for incoming OpenFlow
+connections for controller connection snooping. The \fImethod\fR must
+be given as one of the passive OpenFlow connection methods listed
+under the \fB--listen\fR option above. This option may be specified
+multiple times to listen to multiple connection methods.
+
+If \fBovs\-ofctl monitor\fR is used to connect to \fImethod\fR specified on
+\fB--snoop\fR, it will display all the OpenFlow messages traveling
+between the switch and its controller on the primary OpenFlow
+connection. This can be useful for debugging switch and controller
+problems.
+
+.TP
+\fB--in-band\fR, \fB--out-of-band\fR
+Configures \fBovs\-openflowd\fR to operate in in-band or out-of-band control
+mode (see \fBContacting the Controller\fR above). When neither option
+is given, the default is in-band control.
+
+.TP
+\fB--netflow=\fIip\fB:\fIport\fR
+Configures the given UDP \fIport\fR on the specified IP \fIip\fR as
+a recipient of NetFlow messages for expired flows. The \fIip\fR must
+be specified numerically, not as a DNS name.
+
+This option may be specified multiple times to configure additional
+NetFlow collectors.
+
+.SS "Rate-Limiting Options"
+
+These options configure how the switch applies a ``token bucket'' to
+limit the rate at which packets in unknown flows are forwarded to an
+OpenFlow controller for flow-setup processing. This feature prevents
+a single OpenFlow switch from overwhelming a controller.
+
+.TP
+\fB--rate-limit\fR[\fB=\fIrate\fR]
+.
+Limits the maximum rate at which packets will be forwarded to the
+OpenFlow controller to \fIrate\fR packets per second. If \fIrate\fR
+is not specified then the default of 1,000 packets per second is used.
+
+If \fB--rate-limit\fR is not used, then the switch does not limit the
+rate at which packets are forwarded to the controller.
+
+.TP
+\fB--burst-limit=\fIburst\fR
+.
+Sets the maximum number of unused packet credits that the switch will
+allow to accumulate during time in which no packets are being
+forwarded to the OpenFlow controller to \fIburst\fR (measured in
+packets). The default \fIburst\fR is one-quarter of the \fIrate\fR
+specified on \fB--rate-limit\fR.
+
+This option takes effect only when \fB--rate-limit\fR is also specified.
+
+.SS "Remote Command Execution Options"
+
+.TP
+\fB--command-acl=\fR[\fB!\fR]\fIglob\fR[\fB,\fR[\fB!\fR]\fIglob\fR...]
+Configures the commands that remote OpenFlow connections are allowed
+to invoke using (e.g.) \fBovs\-ofctl execute\fR. The argument is a
+comma-separated sequence of shell glob patterns. A glob pattern
+specified without a leading \fB!\fR is a ``whitelist'' that specifies
+a set of commands that are that may be invoked, whereas a pattern that
+does begin with \fB!\fR is a ``blacklist'' that specifies commands
+that may not be invoked. To be permitted, a command name must be
+whitelisted and must not be blacklisted;
+e.g. \fB--command-acl=up*,!upgrade\fR would allow any command whose name
+begins with \fBup\fR except for the command named \fBupgrade\fR.
+Command names that include characters other than upper- and lower-case
+English letters, digits, and the underscore and hyphen characters are
+unconditionally disallowed.
+
+When the whitelist and blacklist permit a command name, \fBovs\-openflowd\fR
+looks for a program with the same name as the command in the commands
+directory (see below). Other directories are not searched.
+
+.TP
+\fB--command-dir=\fIdirectory\fR
+Sets the directory searched for remote command execution to
+\fBdirectory\fR. The default directory is
+\fB@pkgdatadir@/commands\fR.
+
+.SS "Daemon Options"
+.so lib/daemon.man
+
+.SS "Public Key Infrastructure Options"
+
+.TP
+\fB-p\fR, \fB--private-key=\fIprivkey.pem\fR
+Specifies a PEM file containing the private key used as the switch's
+identity for SSL connections to the controller.
+
+.TP
+\fB-c\fR, \fB--certificate=\fIcert.pem\fR
+Specifies a PEM file containing a certificate, signed by the
+controller's certificate authority (CA), that certifies the switch's
+private key to identify a trustworthy switch.
+
+.TP
+\fB-C\fR, \fB--ca-cert=\fIcacert.pem\fR
+Specifies a PEM file containing the CA certificate used to verify that
+the switch is connected to a trustworthy controller.
+
+.TP
+\fB--bootstrap-ca-cert=\fIcacert.pem\fR
+When \fIcacert.pem\fR exists, this option has the same effect as
+\fB-C\fR or \fB--ca-cert\fR. If it does not exist, then \fBovs\-openflowd\fR
+will attempt to obtain the CA certificate from the controller on its
+first SSL connection and save it to the named PEM file. If it is
+successful, it will immediately drop the connection and reconnect, and
+from then on all SSL connections must be authenticated by a
+certificate signed by the CA certificate thus obtained.
+
+\fBThis option exposes the SSL connection to a man-in-the-middle
+attack obtaining the initial CA certificate\fR, but it may be useful
+for bootstrapping.
+
+This option is only useful if the controller sends its CA certificate
+as part of the SSL certificate chain. The SSL protocol does not
+require the controller to send the CA certificate, but
+\fBcontroller\fR(8) can be configured to do so with the
+\fB--peer-ca-cert\fR option.
+
+.SS "Logging Options"
+.so lib/vlog.man
+.SS "Other Options"
+.so lib/common.man
+.so lib/leak-checker.man
+
+.SH "SEE ALSO"
+
+.BR ovs\-appctl (8),
+.BR ovs\-controller (8),
+.BR ovs\-discover (8),
+.BR ovs\-dpctl (8),
+.BR ovs\-ofctl (8),
+.BR ovs\-pki (8),
+.BR ovs\-vswitchd.conf (5)
--- /dev/null
+/*
+ * Copyright (c) 2008, 2009 Nicira Networks.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include <assert.h>
+#include <errno.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <netinet/in.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <string.h>
+
+#include "command-line.h"
+#include "compiler.h"
+#include "daemon.h"
+#include "dirs.h"
+#include "dpif.h"
+#include "fault.h"
+#include "leak-checker.h"
+#include "list.h"
+#include "netdev.h"
+#include "ofpbuf.h"
+#include "ofproto/ofproto.h"
+#include "openflow/openflow.h"
+#include "packets.h"
+#include "poll-loop.h"
+#include "rconn.h"
+#include "svec.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+#include "vconn-ssl.h"
+#include "vconn.h"
+
+#include "vlog.h"
+#define THIS_MODULE VLM_openflowd
+
+/* Behavior when the connection to the controller fails. */
+enum fail_mode {
+ FAIL_OPEN, /* Act as learning switch. */
+ FAIL_CLOSED /* Drop all packets. */
+};
+
+/* Settings that may be configured by the user. */
+struct ofsettings {
+ /* Overall mode of operation. */
+ bool discovery; /* Discover the controller automatically? */
+ bool in_band; /* Connect to controller in-band? */
+
+ /* Datapath. */
+ uint64_t datapath_id; /* Datapath ID. */
+ const char *dp_name; /* Name of local datapath. */
+
+ /* Description strings. */
+ const char *mfr_desc; /* Manufacturer. */
+ const char *hw_desc; /* Hardware. */
+ const char *sw_desc; /* Software version. */
+ const char *serial_desc; /* Serial number. */
+
+ /* Related vconns and network devices. */
+ const char *controller_name; /* Controller (if not discovery mode). */
+ struct svec listeners; /* Listen for management connections. */
+ struct svec snoops; /* Listen for controller snooping conns. */
+
+ /* Failure behavior. */
+ enum fail_mode fail_mode; /* Act as learning switch if no controller? */
+ int max_idle; /* Idle time for flows in fail-open mode. */
+ int probe_interval; /* # seconds idle before sending echo request. */
+ int max_backoff; /* Max # seconds between connection attempts. */
+
+ /* Packet-in rate-limiting. */
+ int rate_limit; /* Tokens added to bucket per second. */
+ int burst_limit; /* Maximum number token bucket size. */
+
+ /* Discovery behavior. */
+ const char *accept_controller_re; /* Controller vconns to accept. */
+ bool update_resolv_conf; /* Update /etc/resolv.conf? */
+
+ /* Spanning tree protocol. */
+ bool enable_stp;
+
+ /* Remote command execution. */
+ char *command_acl; /* Command white/blacklist, as shell globs. */
+ char *command_dir; /* Directory that contains commands. */
+
+ /* Management. */
+ uint64_t mgmt_id; /* Management ID. */
+
+ /* NetFlow. */
+ struct svec netflow; /* NetFlow targets. */
+};
+
+static void parse_options(int argc, char *argv[], struct ofsettings *);
+static void usage(void) NO_RETURN;
+
+int
+main(int argc, char *argv[])
+{
+ struct unixctl_server *unixctl;
+ struct ofproto *ofproto;
+ struct ofsettings s;
+ int error;
+ struct netflow_options nf_options;
+
+ set_program_name(argv[0]);
+ register_fault_handlers();
+ time_init();
+ vlog_init();
+ parse_options(argc, argv, &s);
+ signal(SIGPIPE, SIG_IGN);
+
+ die_if_already_running();
+ daemonize();
+
+ /* Start listening for ovs-appctl requests. */
+ error = unixctl_server_create(NULL, &unixctl);
+ if (error) {
+ ovs_fatal(error, "Could not listen for unixctl connections");
+ }
+
+ VLOG_INFO("Open vSwitch version %s", VERSION BUILDNR);
+ VLOG_INFO("OpenFlow protocol version 0x%02x", OFP_VERSION);
+
+ /* Start OpenFlow processing. */
+ error = ofproto_create(s.dp_name, NULL, NULL, &ofproto);
+ if (error) {
+ ovs_fatal(error, "could not initialize openflow switch");
+ }
+ error = ofproto_set_in_band(ofproto, s.in_band);
+ if (error) {
+ ovs_fatal(error, "failed to configure in-band control");
+ }
+ error = ofproto_set_discovery(ofproto, s.discovery, s.accept_controller_re,
+ s.update_resolv_conf);
+ if (error) {
+ ovs_fatal(error, "failed to configure controller discovery");
+ }
+ if (s.datapath_id) {
+ ofproto_set_datapath_id(ofproto, s.datapath_id);
+ }
+ if (s.mgmt_id) {
+ ofproto_set_mgmt_id(ofproto, s.mgmt_id);
+ }
+ ofproto_set_desc(ofproto, s.mfr_desc, s.hw_desc, s.sw_desc, s.serial_desc);
+ error = ofproto_set_listeners(ofproto, &s.listeners);
+ if (error) {
+ ovs_fatal(error, "failed to configure management connections");
+ }
+ error = ofproto_set_snoops(ofproto, &s.snoops);
+ if (error) {
+ ovs_fatal(error,
+ "failed to configure controller snooping connections");
+ }
+ memset(&nf_options, 0, sizeof nf_options);
+ nf_options.collectors = s.netflow;
+ error = ofproto_set_netflow(ofproto, &nf_options);
+ if (error) {
+ ovs_fatal(error, "failed to configure NetFlow collectors");
+ }
+ ofproto_set_failure(ofproto, s.fail_mode == FAIL_OPEN);
+ ofproto_set_probe_interval(ofproto, s.probe_interval);
+ ofproto_set_max_backoff(ofproto, s.max_backoff);
+ ofproto_set_rate_limit(ofproto, s.rate_limit, s.burst_limit);
+ error = ofproto_set_stp(ofproto, s.enable_stp);
+ if (error) {
+ ovs_fatal(error, "failed to configure STP");
+ }
+ error = ofproto_set_remote_execution(ofproto, s.command_acl,
+ s.command_dir);
+ if (error) {
+ ovs_fatal(error, "failed to configure remote command execution");
+ }
+ if (!s.discovery) {
+ error = ofproto_set_controller(ofproto, s.controller_name);
+ if (error) {
+ ovs_fatal(error, "failed to configure controller");
+ }
+ }
+
+ while (ofproto_is_alive(ofproto)) {
+ error = ofproto_run(ofproto);
+ if (error) {
+ ovs_fatal(error, "unrecoverable datapath error");
+ }
+ unixctl_server_run(unixctl);
+ dp_run();
+ netdev_run();
+
+ ofproto_wait(ofproto);
+ unixctl_server_wait(unixctl);
+ dp_wait();
+ netdev_wait();
+ poll_block();
+ }
+
+ return 0;
+}
+\f
+/* User interface. */
+
+static void
+parse_options(int argc, char *argv[], struct ofsettings *s)
+{
+ enum {
+ OPT_DATAPATH_ID = UCHAR_MAX + 1,
+ OPT_MANUFACTURER,
+ OPT_HARDWARE,
+ OPT_SOFTWARE,
+ OPT_SERIAL,
+ OPT_ACCEPT_VCONN,
+ OPT_NO_RESOLV_CONF,
+ OPT_BR_NAME,
+ OPT_FAIL_MODE,
+ OPT_INACTIVITY_PROBE,
+ OPT_MAX_IDLE,
+ OPT_MAX_BACKOFF,
+ OPT_SNOOP,
+ OPT_RATE_LIMIT,
+ OPT_BURST_LIMIT,
+ OPT_BOOTSTRAP_CA_CERT,
+ OPT_STP,
+ OPT_NO_STP,
+ OPT_OUT_OF_BAND,
+ OPT_IN_BAND,
+ OPT_COMMAND_ACL,
+ OPT_COMMAND_DIR,
+ OPT_NETFLOW,
+ OPT_MGMT_ID,
+ VLOG_OPTION_ENUMS,
+ LEAK_CHECKER_OPTION_ENUMS
+ };
+ static struct option long_options[] = {
+ {"datapath-id", required_argument, 0, OPT_DATAPATH_ID},
+ {"manufacturer", required_argument, 0, OPT_MANUFACTURER},
+ {"hardware", required_argument, 0, OPT_HARDWARE},
+ {"software", required_argument, 0, OPT_SOFTWARE},
+ {"serial", required_argument, 0, OPT_SERIAL},
+ {"accept-vconn", required_argument, 0, OPT_ACCEPT_VCONN},
+ {"no-resolv-conf", no_argument, 0, OPT_NO_RESOLV_CONF},
+ {"config", required_argument, 0, 'F'},
+ {"br-name", required_argument, 0, OPT_BR_NAME},
+ {"fail", required_argument, 0, OPT_FAIL_MODE},
+ {"inactivity-probe", required_argument, 0, OPT_INACTIVITY_PROBE},
+ {"max-idle", required_argument, 0, OPT_MAX_IDLE},
+ {"max-backoff", required_argument, 0, OPT_MAX_BACKOFF},
+ {"listen", required_argument, 0, 'l'},
+ {"snoop", required_argument, 0, OPT_SNOOP},
+ {"rate-limit", optional_argument, 0, OPT_RATE_LIMIT},
+ {"burst-limit", required_argument, 0, OPT_BURST_LIMIT},
+ {"stp", no_argument, 0, OPT_STP},
+ {"no-stp", no_argument, 0, OPT_NO_STP},
+ {"out-of-band", no_argument, 0, OPT_OUT_OF_BAND},
+ {"in-band", no_argument, 0, OPT_IN_BAND},
+ {"command-acl", required_argument, 0, OPT_COMMAND_ACL},
+ {"command-dir", required_argument, 0, OPT_COMMAND_DIR},
+ {"netflow", required_argument, 0, OPT_NETFLOW},
+ {"mgmt-id", required_argument, 0, OPT_MGMT_ID},
+ {"verbose", optional_argument, 0, 'v'},
+ {"help", no_argument, 0, 'h'},
+ {"version", no_argument, 0, 'V'},
+ DAEMON_LONG_OPTIONS,
+ VLOG_LONG_OPTIONS,
+ LEAK_CHECKER_LONG_OPTIONS,
+#ifdef HAVE_OPENSSL
+ VCONN_SSL_LONG_OPTIONS
+ {"bootstrap-ca-cert", required_argument, 0, OPT_BOOTSTRAP_CA_CERT},
+#endif
+ {0, 0, 0, 0},
+ };
+ char *short_options = long_options_to_short_options(long_options);
+
+ /* Set defaults that we can figure out before parsing options. */
+ s->datapath_id = 0;
+ s->mfr_desc = NULL;
+ s->hw_desc = NULL;
+ s->sw_desc = NULL;
+ s->serial_desc = NULL;
+ svec_init(&s->listeners);
+ svec_init(&s->snoops);
+ s->fail_mode = FAIL_OPEN;
+ s->max_idle = 0;
+ s->probe_interval = 0;
+ s->max_backoff = 8;
+ s->update_resolv_conf = true;
+ s->rate_limit = 0;
+ s->burst_limit = 0;
+ s->accept_controller_re = NULL;
+ s->enable_stp = false;
+ s->in_band = true;
+ s->command_acl = "";
+ s->command_dir = NULL;
+ svec_init(&s->netflow);
+ s->mgmt_id = 0;
+ for (;;) {
+ int c;
+
+ c = getopt_long(argc, argv, short_options, long_options, NULL);
+ if (c == -1) {
+ break;
+ }
+
+ switch (c) {
+ case OPT_DATAPATH_ID:
+ if (strlen(optarg) != 12
+ || strspn(optarg, "0123456789abcdefABCDEF") != 12) {
+ ovs_fatal(0, "argument to --datapath-id must be "
+ "exactly 12 hex digits");
+ }
+ s->datapath_id = strtoll(optarg, NULL, 16);
+ if (!s->datapath_id) {
+ ovs_fatal(0, "argument to --datapath-id must be nonzero");
+ }
+ break;
+
+ case OPT_MANUFACTURER:
+ s->mfr_desc = optarg;
+ break;
+
+ case OPT_HARDWARE:
+ s->hw_desc = optarg;
+ break;
+
+ case OPT_SOFTWARE:
+ s->sw_desc = optarg;
+ break;
+
+ case OPT_SERIAL:
+ s->serial_desc = optarg;
+ break;
+
+ case OPT_ACCEPT_VCONN:
+ s->accept_controller_re = optarg;
+ break;
+
+ case OPT_NO_RESOLV_CONF:
+ s->update_resolv_conf = false;
+ break;
+
+ case OPT_FAIL_MODE:
+ if (!strcmp(optarg, "open")) {
+ s->fail_mode = FAIL_OPEN;
+ } else if (!strcmp(optarg, "closed")) {
+ s->fail_mode = FAIL_CLOSED;
+ } else {
+ ovs_fatal(0, "--fail argument must be \"open\" or \"closed\"");
+ }
+ break;
+
+ case OPT_INACTIVITY_PROBE:
+ s->probe_interval = atoi(optarg);
+ if (s->probe_interval < 5) {
+ ovs_fatal(0, "--inactivity-probe argument must be at least 5");
+ }
+ break;
+
+ case OPT_MAX_IDLE:
+ if (!strcmp(optarg, "permanent")) {
+ s->max_idle = OFP_FLOW_PERMANENT;
+ } else {
+ s->max_idle = atoi(optarg);
+ if (s->max_idle < 1 || s->max_idle > 65535) {
+ ovs_fatal(0, "--max-idle argument must be between 1 and "
+ "65535 or the word 'permanent'");
+ }
+ }
+ break;
+
+ case OPT_MAX_BACKOFF:
+ s->max_backoff = atoi(optarg);
+ if (s->max_backoff < 1) {
+ ovs_fatal(0, "--max-backoff argument must be at least 1");
+ } else if (s->max_backoff > 3600) {
+ s->max_backoff = 3600;
+ }
+ break;
+
+ case OPT_RATE_LIMIT:
+ if (optarg) {
+ s->rate_limit = atoi(optarg);
+ if (s->rate_limit < 1) {
+ ovs_fatal(0, "--rate-limit argument must be at least 1");
+ }
+ } else {
+ s->rate_limit = 1000;
+ }
+ break;
+
+ case OPT_BURST_LIMIT:
+ s->burst_limit = atoi(optarg);
+ if (s->burst_limit < 1) {
+ ovs_fatal(0, "--burst-limit argument must be at least 1");
+ }
+ break;
+
+ case OPT_STP:
+ s->enable_stp = true;
+ break;
+
+ case OPT_NO_STP:
+ s->enable_stp = false;
+ break;
+
+ case OPT_OUT_OF_BAND:
+ s->in_band = false;
+ break;
+
+ case OPT_IN_BAND:
+ s->in_band = true;
+ break;
+
+ case OPT_COMMAND_ACL:
+ s->command_acl = (s->command_acl[0]
+ ? xasprintf("%s,%s", s->command_acl, optarg)
+ : optarg);
+ break;
+
+ case OPT_COMMAND_DIR:
+ s->command_dir = optarg;
+ break;
+
+ case OPT_NETFLOW:
+ svec_add(&s->netflow, optarg);
+ break;
+
+ case OPT_MGMT_ID:
+ if (strlen(optarg) != 12
+ || strspn(optarg, "0123456789abcdefABCDEF") != 12) {
+ ovs_fatal(0, "argument to --mgmt-id must be "
+ "exactly 12 hex digits");
+ }
+ s->mgmt_id = strtoll(optarg, NULL, 16);
+ if (!s->mgmt_id) {
+ ovs_fatal(0, "argument to --mgmt-id must be nonzero");
+ }
+ break;
+
+ case 'l':
+ svec_add(&s->listeners, optarg);
+ break;
+
+ case OPT_SNOOP:
+ svec_add(&s->snoops, optarg);
+ break;
+
+ case 'h':
+ usage();
+
+ case 'V':
+ OVS_PRINT_VERSION(OFP_VERSION, OFP_VERSION);
+ exit(EXIT_SUCCESS);
+
+ DAEMON_OPTION_HANDLERS
+
+ VLOG_OPTION_HANDLERS
+
+ LEAK_CHECKER_OPTION_HANDLERS
+
+#ifdef HAVE_OPENSSL
+ VCONN_SSL_OPTION_HANDLERS
+
+ case OPT_BOOTSTRAP_CA_CERT:
+ vconn_ssl_set_ca_cert_file(optarg, true);
+ break;
+#endif
+
+ case '?':
+ exit(EXIT_FAILURE);
+
+ default:
+ abort();
+ }
+ }
+ free(short_options);
+
+ argc -= optind;
+ argv += optind;
+ if (argc < 1 || argc > 2) {
+ ovs_fatal(0, "need one or two non-option arguments; "
+ "use --help for usage");
+ }
+
+ /* Local and remote vconns. */
+ s->dp_name = argv[0];
+ s->controller_name = argc > 1 ? xstrdup(argv[1]) : NULL;
+
+ /* Set accept_controller_regex. */
+ if (!s->accept_controller_re) {
+ s->accept_controller_re
+ = vconn_ssl_is_configured() ? "^ssl:.*" : "^tcp:.*";
+ }
+
+ /* Mode of operation. */
+ s->discovery = s->controller_name == NULL;
+ if (s->discovery && !s->in_band) {
+ ovs_fatal(0, "Cannot perform discovery with out-of-band control");
+ }
+
+ /* Rate limiting. */
+ if (s->rate_limit && s->rate_limit < 100) {
+ VLOG_WARN("Rate limit set to unusually low value %d", s->rate_limit);
+ }
+}
+
+static void
+usage(void)
+{
+ printf("%s: an OpenFlow switch implementation.\n"
+ "usage: %s [OPTIONS] DATAPATH [CONTROLLER]\n"
+ "DATAPATH is a local datapath (e.g. \"dp0\").\n"
+ "CONTROLLER is an active OpenFlow connection method; if it is\n"
+ "omitted, then ovs-openflowd performs controller discovery.\n",
+ program_name, program_name);
+ vconn_usage(true, true, true);
+ printf("\nOpenFlow options:\n"
+ " -d, --datapath-id=ID Use ID as the OpenFlow switch ID\n"
+ " (ID must consist of 12 hex digits)\n"
+ " --mgmt-id=ID Use ID as the management ID\n"
+ " (ID must consist of 12 hex digits)\n"
+ " --manufacturer=MFR Identify manufacturer as MFR\n"
+ " --hardware=HW Identify hardware as HW\n"
+ " --software=SW Identify software as SW\n"
+ " --serial=SERIAL Identify serial number as SERIAL\n"
+ "\nController discovery options:\n"
+ " --accept-vconn=REGEX accept matching discovered controllers\n"
+ " --no-resolv-conf do not update /etc/resolv.conf\n"
+ "\nNetworking options:\n"
+ " --fail=open|closed when controller connection fails:\n"
+ " closed: drop all packets\n"
+ " open (default): act as learning switch\n"
+ " --inactivity-probe=SECS time between inactivity probes\n"
+ " --max-idle=SECS max idle for flows set up by switch\n"
+ " --max-backoff=SECS max time between controller connection\n"
+ " attempts (default: 8 seconds)\n"
+ " -l, --listen=METHOD allow management connections on METHOD\n"
+ " (a passive OpenFlow connection method)\n"
+ " --snoop=METHOD allow controller snooping on METHOD\n"
+ " (a passive OpenFlow connection method)\n"
+ " --out-of-band controller connection is out-of-band\n"
+ " --netflow=HOST:PORT configure NetFlow output target\n"
+ "\nRate-limiting of \"packet-in\" messages to the controller:\n"
+ " --rate-limit[=PACKETS] max rate, in packets/s (default: 1000)\n"
+ " --burst-limit=BURST limit on packet credit for idle time\n"
+ "\nRemote command execution options:\n"
+ " --command-acl=[!]GLOB[,[!]GLOB...] set allowed/denied commands\n"
+ " --command-dir=DIR set command dir (default: %s/commands)\n",
+ ovs_pkgdatadir);
+ daemon_usage();
+ vlog_usage();
+ printf("\nOther options:\n"
+ " -h, --help display this help message\n"
+ " -V, --version display version information\n");
+ leak_checker_usage();
+ exit(EXIT_SUCCESS);
+}
.SH "SEE ALSO"
-.BR controller (8),
-.BR ovs\-pki\-cgi (8),
-.BR secchan (8)
+.BR ovs\-controller (8),
+.BR ovs\-openflowd (8),
+.BR ovs\-pki\-cgi (8)
--- /dev/null
+.\" -*- nroff -*-
+.de IQ
+. br
+. ns
+. IP "\\$1"
+..
+.TH ovs\-vsctl 8 "November 2009" "Open vSwitch" "Open vSwitch Manual"
+.ds PN ovs\-vsctl
+.
+.SH NAME
+ovs\-vsctl \- utility for querying and configuring \fBovs\-vswitchd\fR
+.
+.SH SYNOPSIS
+\fBovs\-vsctl\fR [\fIoptions\fR] \fIcommand \fR[\fIargs\fR\&...]
+[\fB\-\-\fR \fIcommand \fR[\fIargs\fR\&...]]
+.
+.SH DESCRIPTION
+The \fBovs\-vsctl\fR program configures \fBovs\-vswitchd\fR(8), mainly
+by providing a high\-level interface to editing its configuration file
+\fBovs\-vswitchd.conf\fR(5). This program is mainly intended for use
+when \fBovs\-vswitchd\fR is running, but it can also be used when
+\fBovs\-vswitchd\fR is not running. In the latter case configuration
+changes will only take effect when \fBovs\-vswitchd\fR is started.
+.PP
+By default, each time \fBovs\-vsctl\fR runs, it examines and,
+depending on the requested command or commands, possibly applies
+changes to an
+\fBovs\-vswitchd.conf\fR file. Then, if it applied any changes and if
+\fBovs\-vswitchd\fR is running, it tells \fBovs\-vswitchd\fR to reload
+the modified configuration file and waits for the reload to complete
+before exiting.
+.
+.SS "Linux VLAN Bridging Compatibility"
+The \fBovs\-vsctl\fR program supports the model of a bridge
+implemented by Open vSwitch, in which a single bridge supports ports
+on multiple VLANs. In this model, each port on a bridge is either a
+trunk port that potentially passes packets tagged with 802.1Q headers
+that designate VLANs or it is assigned a single implicit VLAN that is
+never tagged with an 802.1Q header.
+.PP
+For compatibility with software designed for the Linux bridge,
+\fBovs\-vsctl\fR also supports a model in which traffic associated
+with a given 802.1Q VLAN is segregated into a separate bridge. A
+special form of the \fBadd\-br\fR command (see below) creates a ``fake
+bridge'' within an Open vSwitch bridge to simulate this behavior.
+When such a ``fake bridge'' is active, \fBovs\-vsctl\fR will treat it
+much like a bridge separate from its ``parent bridge,'' but the actual
+implementation in Open vSwitch uses only a single bridge, with ports on
+the fake bridge assigned the implicit VLAN of the fake bridge of which
+they are members.
+.
+.SH OPTIONS
+.
+The following options affect the general outline of \fBovs\-vsctl\fR's
+activities:
+.
+.IP "\fB\-c \fIfile\fR"
+.IQ "\fB\-\-config=\fIfile\fR"
+Sets the configuration file that \fBovs\-vsctl\fR reads and possibly
+modifies. The default is \fB@localstatedir@/ovs\-vswitchd.conf\fR.
+.IP
+If \fIfile\fR is specified as \fB\-\fR, then \fBovs\-vsctl\fR reads
+the configuration file from standard input and, for commands that
+modify the configuration, writes the new one to standard output. This
+is useful for testing but it should not be used in production because
+it bypasses the Open vSwitch configuration file locking protocol.
+.
+.IP "\fB\-t \fItarget\fR"
+.IQ "\fB\-\-target=\fItarget\fR"
+Configures how \fBovs\-vsctl\fR contacts \fBovs\-vswitchd\fR to
+instruct it to reload its configuration file.
+.IP
+If \fItarget\fR begins with \fB/\fR it must name a Unix domain socket
+on which \fBovs\-vswitchd\fR is listening for control channel
+connections. By default, \fBovs\-vswitchd\fR listens on a Unix domain
+socket named \fB@RUNDIR@/ovs\-vswitchd.\fIpid\fB.ctl\fR, where
+\fIpid\fR is \fBovs\-vswitchd\fR's process ID.
+.IP
+Otherwise, \fBovs\-appctl\fR looks for a pidfile, that is, a file
+whose contents are the process ID of a running process as a decimal
+number, named \fB@RUNDIR@/\fItarget\fB.pid\fR. (The \fB\-\-pidfile\fR
+option makes an Open vSwitch daemon create a pidfile.)
+\fBovs\-appctl\fR reads the pidfile, then looks for a Unix socket
+named \fB@RUNDIR@/\fItarget\fB.\fIpid\fB.ctl\fR, where \fIpid\fR is
+replaced by the process ID read from \fItarget\fR, and uses that file
+as if it had been specified directly as the target.
+.IP
+The default target is \fBovs\-vswitchd\fR.
+.IP "\fB\-\-no\-reload\fR"
+Prevents \fBovs\-vsctl\fR from telling \fBovs\-vswitchd\fR to reload
+its configuration file.
+.
+.IP "\fB\-\-no\-syslog\fR"
+By default, \fBovs\-vsctl\fR logs its arguments and the details of any
+changes that it makes to the system log. This option disables this
+logging.
+.IP "\fB\-\-oneline\fR"
+Modifies the output format so that the output for each command is printed
+on a single line. New-line characters that would otherwise separate
+lines are printed as \fB\\n\fR, and any instances of \fB\\\fR that
+would otherwise appear in the output are doubled.
+Prints a blank line for each command that has no output.
+.
+.SH COMMANDS
+The commands implemented by \fBovs\-vsctl\fR are described in the
+sections below.
+.
+.SS "Bridge Commands"
+These commands examine and manipulate Open vSwitch bridges.
+.
+.IP "\fBadd\-br \fIbridge\fR"
+Creates a new bridge named \fIbridge\fR. Initially the bridge will
+have no ports (other than \fIbridge\fR itself).
+.
+.IP "\fBadd\-br \fIbridge parent vlan\fR"
+Creates a ``fake bridge'' named \fIbridge\fR within the existing Open
+vSwitch bridge \fIparent\fR, which must already exist and must not
+itself be a fake bridge. The new fake bridge will be on 802.1Q VLAN
+\fIvlan\fR, which must be an integer between 1 and 4095. Initially
+\fIbridge\fR will have no ports (other than \fIbridge\fR itself).
+.
+.IP "\fBdel\-br \fIbridge\fR"
+Deletes \fIbridge\fR and all of its ports. If \fIbridge\fR is a real
+bridge, this command also deletes any fake bridges that were created
+with \fIbridge\fR as parent, including all of their ports.
+.
+.IP "\fBlist\-br\fR"
+Lists all existing real and fake bridges on standard output, one per
+line.
+.
+.IP "\fBbr\-exists \fIbridge\fR"
+Tests whether \fIbridge\fR exists as a real or fake bridge. If so,
+\fBovs\-vsctl\fR exits successfully with exit code 0. If not,
+\fBovs\-vsctl\fR exits unsuccessfully with exit code 2.
+.
+.IP "\fBbr\-to\-vlan \fIbridge\fR"
+If \fIbridge\fR is a fake bridge, prints the bridge's 802.1Q VLAN as a
+decimal integer. If \fIbridge\fR is a real bridge, prints 0.
+.
+.IP "\fBbr\-to\-parent \fIbridge\fR"
+If \fIbridge\fR is a fake bridge, prints the name of its parent
+bridge. If \fIbridge\fR is a real bridge, print \fIbridge\fR.
+.
+.SS "Port Commands"
+.
+These commands examine and manipulate Open vSwitch ports. These
+commands treat a bonded port as a single entity.
+.
+.IP "\fBlist\-ports \fIbridge\fR"
+Lists all of the ports within \fIbridge\fR on standard output, one per
+line. The local port \fIbridge\fR is not included in the list.
+.
+.IP "\fBadd\-port \fIbridge port\fR"
+Creates on \fIbridge\fR a new port named \fIport\fR from the network
+device of the same name.
+.
+.IP "\fBadd\-bond \fIbridge port iface\fR\&..."
+Creates on \fIbridge\fR a new port named \fIport\fR that bonds
+together the network devices given as each \fIiface\fR. At least two
+interfaces must be named.
+.
+.IP "\fBdel\-port \fR[\fIbridge\fR] \fIport\fR"
+Deletes \fIport\fR. If \fIbridge\fR is omitted, \fIport\fR is removed
+from whatever bridge contains it; if \fIbridge\fR is specified, it
+must be the real or fake bridge that contains \fIport\fR.
+.
+.IP "\fBport\-to\-br \fIport\fR"
+Prints the name of the bridge that contains \fIport\fR on standard
+output.
+.
+.SS "Interface Commands"
+.
+These commands examine the interfaces attached to an Open vSwitch
+bridge. These commands treat a bonded port as a collection of two or
+more interfaces, rather than as a single port.
+.
+.IP "\fBlist\-ifaces \fIbridge\fR"
+Lists all of the interfaces within \fIbridge\fR on standard output,
+one per line. The local port \fIbridge\fR is not included in the
+list.
+.
+.IP "\fBiface\-to\-br \fIiface\fR"
+Prints the name of the bridge that contains \fIiface\fR on standard
+output.
+.SH "EXAMPLES"
+Create a new bridge named br0 and add port eth0 to it:
+.IP
+.B "ovs-vsctl add\-br br0"
+.br
+.B "ovs-vsctl add\-port br0 eth0"
+.PP
+Alternatively, perform both operations in a single atomic transaction:
+.IP
+.B "ovs-vsctl add\-br br0 \-\- add\-port br0 eth0"
+.
+.SH "EXIT STATUS"
+.IP "0"
+Successful program execution.
+.IP "1"
+Usage, syntax, or configuration file error.
+.IP "2"
+The \fIbridge\fR argument to \fBbr\-exists\fR specified the name of a
+bridge that does not exist.
+.SH "SEE ALSO"
+.
+.BR ovs\-vswitchd.conf (5),
+.BR ovs\-vswitchd (8).
--- /dev/null
+#! @PYTHON@
+# Copyright (c) 2009 Nicira Networks. -*- python -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import errno
+import fcntl
+import fnmatch
+import getopt
+import os
+import re
+import socket
+import stat
+import sys
+import syslog
+
+argv0 = sys.argv[0]
+if argv0.find('/') >= 0:
+ argv0 = argv0[argv0.rfind('/') + 1:]
+
+DEFAULT_VSWITCHD_CONF = "@sysconfdir@/ovs-vswitchd.conf"
+vswitchd_conf = DEFAULT_VSWITCHD_CONF
+
+DEFAULT_VSWITCHD_TARGET = "ovs-vswitchd"
+vswitchd_target = DEFAULT_VSWITCHD_TARGET
+
+reload_vswitchd = True
+
+enable_syslog = True
+
+class Error(Exception):
+ def __init__(self, msg):
+ Exception.__init__(self)
+ self.msg = msg
+
+def log(message):
+ if enable_syslog:
+ syslog.syslog(message)
+
+# XXX Most of the functions below should be integrated into a
+# VSwitchConfiguration object with logically named fields and methods
+# instead of this mishmash of functionality.
+
+# Locks 'filename' for writing.
+def cfg_lock(filename):
+ if filename == '-':
+ return
+
+ if '/' in filename:
+ lastSlash = filename.rfind('/')
+ prefix = filename[:lastSlash]
+ suffix = filename[lastSlash + 1:]
+ lock_name = "%s/.%s.~lock~" % (prefix, suffix)
+ else:
+ lock_name = ".%s.~lock~" % filename
+
+ while True:
+ # Try to open an existing lock file.
+ try:
+ f = open(lock_name, 'r')
+ except IOError, e:
+ if e.errno != errno.ENOENT:
+ raise
+
+ # Try to create a new lock file.
+ try:
+ fd = os.open(lock_name, os.O_RDWR | os.O_CREAT | os.O_EXCL, 0600)
+ except OSError, e:
+ if e.errno != errno.EEXIST:
+ raise
+ # Someone else created the lock file, try again.
+ os.close(fd)
+ continue
+
+ fcntl.flock(f, fcntl.LOCK_EX)
+ return
+
+# Read the ovs-vswitchd.conf file named 'filename' and return its contents as a
+# dictionary that maps from string keys to lists of string values. (Even
+# singleton values are represented as lists.)
+def cfg_read(filename, lock=False):
+ if lock:
+ cfg_lock(filename)
+
+ try:
+ if filename == '-':
+ f = open('/dev/stdin')
+ else:
+ f = open(filename)
+ except IOError, e:
+ sys.stderr.write("%s: could not open %s (%s)\n"
+ % (argv0, filename, e.strerror))
+ sys.exit(1)
+
+ cfg = {}
+ rx = re.compile('([-._@$:+a-zA-Z0-9]+)(?:[ \t\r\n\v]*)=(?:[ \t\r\n\v]*)(.*)$')
+ for line in f:
+ line = line.strip()
+ if len(line) == 0 or line[0] == '#':
+ continue
+
+ match = rx.match(line)
+ if match == None:
+ continue
+
+ key, value = match.groups()
+ if key not in cfg:
+ cfg[key] = []
+ cfg[key].append(value)
+
+ global orig_cfg
+ orig_cfg = cfg_clone(cfg)
+
+ return cfg
+
+# Returns a deep copy of 'cfg', which must be in the format returned
+# by cfg_read().
+def cfg_clone(cfg):
+ new = {}
+ for key in cfg:
+ new[key] = list(cfg[key])
+ return new
+
+# Returns a list of all the configuration lines that are in 'a' but
+# not in 'b'.
+def cfg_subtract(a, b):
+ difference = []
+ for key in a:
+ for value in a[key]:
+ if key not in b or value not in b[key]:
+ difference.append("%s=%s" % (key, value))
+ return difference
+
+def do_cfg_save(cfg, file):
+ # Log changes.
+ added = cfg_subtract(cfg, orig_cfg)
+ removed = cfg_subtract(orig_cfg, cfg)
+ if added or removed:
+ log("configuration changes:")
+ for line in removed:
+ log("-%s\n" % line)
+ for line in added:
+ log("+%s\n" % line)
+
+ # Write changes.
+ for key in sorted(cfg.keys()):
+ for value in sorted(cfg[key]):
+ file.write("%s=%s\n" % (key, value))
+
+def cfg_reload():
+ target = VSWITCHD_TARGET
+ if not target.startswith('/'):
+ pid = read_first_line_of_file('%s/%s.pid' % ('@RUNDIR@', target))
+ target = '%s/%s.%s.ctl' % ('@RUNDIR@', target, pid)
+ s = os.stat(target)
+ if not stat.S_ISSOCK(s.st_mode):
+ raise Error("%s is not a Unix domain socket, cannot reload" % target)
+ skt = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+ skt.connect(target)
+ f = os.fdopen(skt.fileno(), "r+")
+ f.write("vswitchd/reload\n")
+ f.flush()
+ f.readline()
+ f.close()
+
+def cfg_save(cfg, filename):
+ if filename == '-':
+ do_cfg_save(cfg, sys.stdout)
+ else:
+ tmp_name = filename + ".~tmp~"
+ f = open(tmp_name, 'w')
+ do_cfg_save(cfg, f)
+ f.close()
+ os.rename(tmp_name, filename)
+ if reload_vswitchd:
+ cfg_reload()
+
+# Returns a set of the immediate subsections of 'section' within 'cfg'. For
+# example, if 'section' is "bridge" and keys bridge.a, bridge.b, bridge.b.c,
+# and bridge.c.x.y.z exist, returns set(['a', 'b', 'c']).
+def cfg_get_subsections(cfg, section):
+ subsections = set()
+ for key in cfg:
+ if key.startswith(section + "."):
+ dot = key.find(".", len(section) + 1)
+ if dot == -1:
+ dot = len(key)
+ subsections.add(key[len(section) + 1:dot])
+ return subsections
+
+# Returns True if 'cfg' contains a key whose single value is 'true'. Otherwise
+# returns False.
+def cfg_get_bool(cfg, name):
+ return name in cfg and cfg[name] == ['true']
+
+# If 'cfg' has a port named 'port' configured with an implicit VLAN, returns
+# that VLAN number. Otherwise, returns 0.
+def get_port_vlan(cfg, port):
+ try:
+ return int(cfg["vlan.%s.tag" % port][0])
+ except (ValueError, KeyError):
+ return 0
+
+# Returns all the ports within 'bridge' in 'cfg'. If 'vlan' is nonnegative,
+# the ports returned are only those configured with implicit VLAN 'vlan'.
+def get_bridge_ports(cfg, bridge, vlan):
+ ports = []
+ for port in cfg["bridge.%s.port" % bridge]:
+ if vlan < 0 or get_port_vlan(cfg, port) == vlan:
+ ports.append(port)
+ return ports
+
+# Returns all the interfaces within 'bridge' in 'cfg'. If 'vlan' is
+# nonnegative, the interfaces returned are only those whose ports are
+# configured with implicit VLAN 'vlan'.
+def get_bridge_ifaces(cfg, bridge, vlan):
+ ifaces = []
+ for port in get_bridge_ports(cfg, bridge, vlan):
+ ifaces.extend(cfg.get("bonding.%s.slave" % port, [port]))
+ return ifaces
+
+# Returns the first line of the file named 'name', with the trailing new-line
+# (if any) stripped off.
+def read_first_line_of_file(name):
+ file = None
+ try:
+ file = open(name, 'r')
+ return file.readline().rstrip('\n')
+ finally:
+ if file != None:
+ file.close()
+
+# Returns a bridge ID constructed from the MAC address of network device
+# 'netdev', in the format "8000.000102030405".
+def get_bridge_id(netdev):
+ try:
+ hwaddr = read_first_line_of_file("/sys/class/net/%s/address" % netdev)
+ return "8000.%s" % (hwaddr.replace(":", ""))
+ except:
+ return "8000.002320ffffff"
+
+# Returns a list of 3-tuples based on 'cfg'. Each 3-tuple represents
+# one real bridge or one fake bridge and has the form (bridge, parent,
+# vlan), where 'bridge' is the real or fake bridge name, 'parent' is
+# the same as 'bridge' for a real bridge or the name of the containing
+# bridge for a fake bridge, and 'vlan' is 0 for a real bridge or a
+# VLAN number for a fake bridge.
+def get_bridge_info(cfg):
+ real_bridges = [(br, br, 0) for br in get_real_bridges(cfg)]
+ fake_bridges = []
+ for linux_bridge, ovs_bridge, vlan in real_bridges:
+ for iface in get_bridge_ifaces(cfg, ovs_bridge, -1):
+ if cfg_get_bool(cfg, "iface.%s.fake-bridge" % iface):
+ fake_bridges.append((iface, ovs_bridge,
+ get_port_vlan(cfg, iface)))
+ return real_bridges + fake_bridges
+
+# Returns the real bridges configured in 'cfg'.
+def get_real_bridges(cfg):
+ return cfg_get_subsections(cfg, "bridge")
+
+# Returns the fake bridges configured in 'cfg'.
+def get_fake_bridges(cfg):
+ return [bridge for bridge, parent, vlan in get_bridge_info(cfg)
+ if bridge != parent]
+
+# Returns all the real and fake bridges configured in 'cfg'.
+def get_all_bridges(cfg):
+ return [bridge for bridge, parent, vlan in get_bridge_info(cfg)]
+
+# Returns the parent bridge and VLAN of real or fake 'bridge' in
+# 'cfg', where the parent bridge and VLAN are as defined in the
+# description of get_bridge_info(). Raises an error if no bridge
+# named 'bridge' exists in 'cfg'.
+def find_bridge(cfg, bridge):
+ for br, parent, vlan in get_bridge_info(cfg):
+ if br == bridge:
+ return parent, vlan
+ raise Error("no bridge named %s" % bridge)
+
+def del_matching_keys(cfg, pattern):
+ for key in [key for key in cfg.keys() if fnmatch.fnmatch(key, pattern)]:
+ del cfg[key]
+
+# Deletes anything related to a port named 'port' from 'cfg'. No port
+# named 'port' need actually exist; this function will clean up
+# regardless.
+def del_port(cfg, port):
+ # The use of [!0-9] keeps an interface of 'eth0' from matching
+ # VLANs attached to eth0 (such as 'eth0.123'), which are distinct
+ # interfaces.
+ for iface in cfg.get('bonding.%s.slave' % port, [port]):
+ del_matching_keys(cfg, 'iface.%s.[!0-9]*' % iface)
+ # Yes, this "port" setting applies to interfaces, not ports, *sigh*.
+ del_matching_keys(cfg, 'port.%s.ingress-policing*' % iface)
+ del_matching_keys(cfg, 'bonding.%s.[!0-9]*' % port)
+ del_matching_keys(cfg, 'vlan.%s.[!0-9]*' % port)
+ for key in cfg.keys():
+ if fnmatch.fnmatch(key, 'bridge.*.port'):
+ cfg[key] = [s for s in cfg[key] if s != port]
+
+# Returns the name of the (real or fake) bridge in 'cfg' that contains
+# port 'port', or None if there is no such port.
+def port_to_bridge(cfg, port):
+ for bridge, parent, vlan in get_bridge_info(cfg):
+ if port != bridge and port in get_bridge_ports(cfg, parent, vlan):
+ return bridge
+ return None
+
+def usage():
+ print """%(argv0)s: ovs-vswitchd management utility
+usage: %(argv0)s [OPTIONS] COMMAND [ARG...]
+
+Bridge commands:
+ add-br BRIDGE create a new bridge named BRIDGE
+ add-br BRIDGE PARENT VLAN create new fake bridge BRIDGE in PARENT on VLAN
+ del-br BRIDGE delete BRIDGE and all of its ports
+ list-br print the names of all the bridges
+ br-exists BRIDGE test whether BRIDGE exists
+ br-to-vlan BRIDGE print the VLAN which BRIDGE is on
+ br-to-parent BRIDGE print the parent of BRIDGE
+
+Port commands:
+ list-ports BRIDGE print the names of all the ports on BRIDGE
+ add-port BRIDGE PORT add network device PORT to BRIDGE
+ add-bond BRIDGE PORT IFACE... add new bonded port PORT in BRIDGE from IFACES
+ del-port [BRIDGE] PORT delete PORT (which may be bonded) from BRIDGE
+ port-to-br PORT print name of bridge that contains PORT
+A bond is considered to be a single port.
+
+Interface commands (a bond consists of multiple interfaces):
+ list-ifaces BRIDGE print the names of all the interfaces on BRIDGE
+ iface-to-br IFACE print name of bridge that contains IFACE
+A bond is considered to consist of interfaces.
+
+General options:
+ --no-syslog do not write mesages to syslog
+ -c, --config=FILE set configuration file
+ (default: %(config)s)
+ -t, --target=PROGRAM|SOCKET set ovs-vswitchd target
+ (default: %(target)s)
+ --no-reload do not make ovs-vswitchd reload its configuration
+ -h, --help display this help message and exit
+ -V, --version display version information and exit
+Report bugs to bugs@openvswitch.org.""" % {'argv0': argv0,
+ 'config': DEFAULT_VSWITCHD_CONF,
+ 'target': DEFAULT_VSWITCHD_TARGET}
+ sys.exit(0)
+
+def version():
+ print "ovs-vsctl (Open vSwitch) @VERSION@"
+ sys.exit(0)
+
+def check_conflicts(cfg, name, op):
+ bridges = get_bridge_info(cfg)
+ if name in [bridge for bridge, parent, vlan in bridges]:
+ raise Error("%s because a bridge named %s already exists" % (op, name))
+
+ for bridge, parent, vlan in bridges:
+ if name in get_bridge_ports(cfg, parent, vlan):
+ raise Error("%s because a port named %s already exists on bridge %s" % (op, name, bridge))
+ if name in get_bridge_ifaces(cfg, parent, vlan):
+ raise Error("%s because an interface named %s already exists on bridge %s" % (op, name, bridge))
+
+def cmd_add_br(cfg, bridge, parent=None, vlan=None):
+ check_conflicts(cfg, bridge, "cannot create a bridge named %s" % bridge)
+
+ if parent and vlan:
+ if parent in get_fake_bridges(cfg):
+ raise Error("cannot create bridge with fake bridge as parent")
+ if parent not in get_real_bridges(cfg):
+ raise Error("parent bridge %s does not exist" % bridge)
+ try:
+ if int(vlan) < 0 or int(vlan) > 4095:
+ raise ValueError
+ except ValueError:
+ raise Error("invalid VLAN number %s" % vlan)
+
+ # Create fake bridge internal port.
+ cfg['iface.%s.internal' % bridge] = ['true']
+ cfg['iface.%s.fake-bridge' % bridge] = ['true']
+ cfg['vlan.%s.tag' % bridge] = [vlan]
+
+ # Add fake bridge port to parent.
+ cfg['bridge.%s.port' % parent].append(bridge)
+ else:
+ cfg['bridge.%s.port' % bridge] = [bridge]
+
+def cmd_del_br(cfg, bridge):
+ parent, vlan = find_bridge(cfg, bridge)
+ if vlan == 0:
+ vlan = -1
+ for port in set(get_bridge_ports(cfg, parent, vlan) + [bridge]):
+ del_port(cfg, port)
+ if vlan < 0:
+ del_matching_keys(cfg, 'bridge.%s.[!0-9]*' % bridge)
+
+def cmd_list_br(cfg):
+ return get_all_bridges(cfg)
+
+def cmd_br_exists(cfg, bridge):
+ if bridge not in get_all_bridges(cfg):
+ sys.exit(2)
+
+def cmd_list_ports(cfg, bridge):
+ ports = []
+ parent, vlan = find_bridge(cfg, bridge)
+ for port in get_bridge_ports(cfg, parent, vlan):
+ if port != bridge:
+ ports.append(port)
+ return ports
+
+def do_add_port(cfg, bridge, parent, port, vlan):
+ check_conflicts(cfg, port, "cannot create a port named %s" % port)
+ cfg['bridge.%s.port' % parent].append(port)
+ if vlan > 0:
+ cfg['vlan.%s.tag' % port] = [vlan]
+
+def cmd_add_port(cfg, bridge, port):
+ parent, vlan = find_bridge(cfg, bridge)
+ do_add_port(cfg, bridge, parent, port, vlan)
+
+def cmd_add_bond(cfg, bridge, port, *slaves):
+ parent, vlan = find_bridge(cfg, bridge)
+ do_add_port(cfg, bridge, parent, port, vlan)
+ cfg['bonding.%s.slave' % port] = list(slaves)
+
+def cmd_del_port(cfg, *args):
+ if len(args) == 2:
+ bridge, port = args
+ parent, vlan = find_bridge(cfg, bridge)
+ if port not in get_bridge_ports(cfg, parent, vlan):
+ if port in get_bridge_ports(cfg, parent, -1):
+ raise Error("bridge %s does not have a port %s (although its parent bridge %s does)" % (bridge, port, parent))
+ else:
+ raise Error("bridge %s does not have a port %s" % (bridge, port))
+ else:
+ port, = args
+ if not port_to_bridge(cfg, port):
+ raise Error("no port %s on any bridge" % port)
+ del_port(cfg, port)
+
+def cmd_port_to_br(cfg, port):
+ bridge = port_to_bridge(cfg, port)
+ if bridge:
+ return (bridge,)
+ else:
+ raise Error("no port named %s" % port)
+
+def cmd_list_ifaces(cfg, bridge):
+ ifaces = []
+ parent, vlan = find_bridge(cfg, bridge)
+ for iface in get_bridge_ifaces(cfg, parent, vlan):
+ if iface != bridge:
+ ifaces.append(iface)
+ return ifaces
+
+def cmd_iface_to_br(cfg, iface):
+ for bridge, parent, vlan in get_bridge_info(cfg):
+ if iface != bridge and iface in get_bridge_ifaces(cfg, parent, vlan):
+ return (bridge,)
+ raise Error("no interface named %s" % iface)
+
+def cmd_br_to_vlan(cfg, bridge):
+ parent, vlan = find_bridge(cfg, bridge)
+ return (vlan,)
+
+def cmd_br_to_parent(cfg, bridge):
+ parent, vlan = find_bridge(cfg, bridge)
+ return (parent,)
+
+cmdTable = {'add-br': (cmd_add_br, True, lambda n: n == 1 or n == 3),
+ 'del-br': (cmd_del_br, True, 1),
+ 'list-br': (cmd_list_br, False, 0),
+ 'br-exists': (cmd_br_exists, False, 1),
+ 'list-ports': (cmd_list_ports, False, 1),
+ 'add-port': (cmd_add_port, True, 2),
+ 'add-bond': (cmd_add_bond, True, lambda n: n >= 4),
+ 'del-port': (cmd_del_port, True, lambda n: n == 1 or n == 2),
+ 'port-to-br': (cmd_port_to_br, False, 1),
+ 'br-to-vlan': (cmd_br_to_vlan, False, 1),
+ 'br-to-parent': (cmd_br_to_parent, False, 1),
+ 'list-ifaces': (cmd_list_ifaces, False, 1),
+ 'iface-to-br': (cmd_iface_to_br, False, 1)}
+
+# Break up commands at -- boundaries.
+def split_commands(args):
+ commands = []
+ command = []
+ for arg in args:
+ if arg == '--':
+ if command:
+ commands.append(command)
+ command = []
+ else:
+ command.append(arg)
+ if command:
+ commands.append(command)
+ return commands
+
+def check_command(args):
+ command, args = args[0], args[1:]
+ if command not in cmdTable:
+ sys.stderr.write("%s: unknown command '%s' (use --help for help)\n"
+ % (argv0, command))
+ sys.exit(1)
+
+ function, is_mutator, nargs = cmdTable[command]
+ if callable(nargs) and not nargs(len(args)):
+ sys.stderr.write("%s: '%s' command does not accept %d arguments (use --help for help)\n" % (argv0, command, len(args)))
+ sys.exit(1)
+ elif not callable(nargs) and len(args) != nargs:
+ sys.stderr.write("%s: '%s' command takes %d arguments but %d were supplied (use --help for help)\n" % (argv0, command, nargs, len(args)))
+ sys.exit(1)
+
+def run_command(cfg, args):
+ command, args = args[0], args[1:]
+ function, need_lock, nargs = cmdTable[command]
+ return function(cfg, *args)
+
+def main():
+ # Parse command line.
+ try:
+ options, args = getopt.getopt(sys.argv[1:], "c:t:hV",
+ ["config=",
+ "target=",
+ "no-reload",
+ "no-syslog",
+ "oneline",
+ "help",
+ "version"])
+ except getopt.GetoptError, msg:
+ sys.stderr.write("%s: %s (use --help for help)\n" % (argv0, msg))
+ sys.exit(1)
+
+ # Handle options.
+ oneline = False
+ for opt, optarg in options:
+ if opt == "-c" or opt == "--config":
+ global vswitchd_conf
+ vswitchd_conf = optarg
+ elif opt == "-t" or opt == "--target":
+ global vswitchd_target
+ vswitchd_target = optarg
+ elif opt == "--no-reload":
+ global reload_vswitchd
+ reload_vswitchd = False
+ elif opt == "-h" or opt == "--help":
+ usage()
+ elif opt == "-V" or opt == "--version":
+ version()
+ elif opt == "--no-syslog":
+ global enable_syslog
+ enable_syslog = False
+ elif opt == "--oneline":
+ oneline = True
+ else:
+ raise RuntimeError("unhandled option %s" % opt)
+
+ if enable_syslog:
+ syslog.openlog("ovs-vsctl")
+ log("Called as %s" % ' '.join(sys.argv[1:]))
+
+ # Break arguments into a series of commands.
+ commands = split_commands(args)
+ if not commands:
+ sys.stderr.write("%s: missing command name (use --help for help)\n"
+ % argv0)
+ sys.exit(1)
+
+ # Check command syntax.
+ need_lock = False
+ for command in commands:
+ check_command(command)
+ if cmdTable[command[0]][1]:
+ need_lock = True
+
+ # Execute commands.
+ cfg = cfg_read(vswitchd_conf, need_lock)
+ for command in commands:
+ output = run_command(cfg, command)
+ if oneline:
+ if output == None:
+ output = ()
+ print '\\n'.join([str(s).replace('\\', '\\\\')
+ for s in output])
+ elif output != None:
+ for line in output:
+ print line
+ if need_lock:
+ cfg_save(cfg, vswitchd_conf)
+ sys.exit(0)
+
+if __name__ == "__main__":
+ try:
+ main()
+ except Error, msg:
+ sys.stderr.write("%s: %s\n" % (argv0, msg.msg))
+ sys.exit(1)
vswitchd/bridge.h \
vswitchd/mgmt.c \
vswitchd/mgmt.h \
- vswitchd/port.c \
- vswitchd/port.h \
vswitchd/proc-net-compat.c \
vswitchd/proc-net-compat.h \
vswitchd/ovs-vswitchd.c \
vswitchd/xenserver.c \
vswitchd/xenserver.h
vswitchd_ovs_vswitchd_LDADD = \
- secchan/libsecchan.a \
+ ofproto/libofproto.a \
lib/libopenvswitch.a \
$(FAULT_LIBS) \
$(SSL_LIBS)
#include "odp-util.h"
#include "ofp-print.h"
#include "ofpbuf.h"
+#include "ofproto/netflow.h"
+#include "ofproto/ofproto.h"
#include "packets.h"
#include "poll-loop.h"
#include "port-array.h"
#include "proc-net-compat.h"
#include "process.h"
-#include "secchan/netflow.h"
-#include "secchan/ofproto.h"
#include "socket-util.h"
#include "stp.h"
#include "svec.h"
extern uint64_t mgmt_id;
struct iface {
+ /* These members are always valid. */
struct port *port; /* Containing port. */
size_t port_ifidx; /* Index within containing port. */
-
char *name; /* Host network device name. */
- int dp_ifidx; /* Index within kernel datapath. */
-
- uint8_t mac[ETH_ADDR_LEN]; /* Ethernet address (all zeros if unknowns). */
-
tag_type tag; /* Tag associated with this interface. */
- bool enabled; /* May be chosen for flows? */
long long delay_expires; /* Time after which 'enabled' may change. */
+
+ /* These members are valid only after bridge_reconfigure() causes them to
+ * be initialized.*/
+ int dp_ifidx; /* Index within kernel datapath. */
+ struct netdev *netdev; /* Network device. */
+ bool enabled; /* May be chosen for flows? */
};
#define BOND_MASK 0xff
struct ofproto *ofproto; /* OpenFlow switch. */
/* Kernel datapath information. */
- struct dpif dpif; /* Kernel datapath. */
+ struct dpif *dpif; /* Datapath. */
struct port_array ifaces; /* Indexed by kernel datapath port number. */
/* Bridge ports. */
static void bridge_flush(struct bridge *);
static void bridge_pick_local_hw_addr(struct bridge *,
uint8_t ea[ETH_ADDR_LEN],
- const char **devname);
+ struct iface **hw_addr_iface);
static uint64_t bridge_pick_datapath_id(struct bridge *,
const uint8_t bridge_ea[ETH_ADDR_LEN],
- const char *devname);
+ struct iface *hw_addr_iface);
+static struct iface *bridge_get_local_iface(struct bridge *);
static uint64_t dpid_from_hash(const void *, size_t nbytes);
static void bridge_unixctl_fdb_show(struct unixctl_conn *, const char *args);
uint16_t dp_ifidx);
static void port_update_bond_compat(struct port *);
static void port_update_vlan_compat(struct port *);
+static void port_update_bonding(struct port *);
static void mirror_create(struct bridge *, const char *name);
static void mirror_destroy(struct mirror *);
for (j = 0; j < port->n_ifaces; j++) {
struct iface *iface = port->ifaces[j];
if (iface->dp_ifidx < 0) {
- VLOG_ERR("%s interface not in dp%u, ignoring",
- iface->name, dpif_id(&br->dpif));
+ VLOG_ERR("%s interface not in datapath %s, ignoring",
+ iface->name, dpif_name(br->dpif));
} else {
if (iface->dp_ifidx != ODPP_LOCAL) {
svec_add(svec, iface->name);
void
bridge_init(void)
{
- int retval;
- int i;
-
- bond_init();
+ struct svec dpif_names;
+ size_t i;
unixctl_command_register("fdb/show", bridge_unixctl_fdb_show);
- for (i = 0; i < DP_MAX; i++) {
- struct dpif dpif;
- char devname[16];
+ svec_init(&dpif_names);
+ dp_enumerate(&dpif_names);
+ for (i = 0; i < dpif_names.n; i++) {
+ const char *dpif_name = dpif_names.names[i];
+ struct dpif *dpif;
+ int retval;
- sprintf(devname, "dp%d", i);
- retval = dpif_open(devname, &dpif);
+ retval = dpif_open(dpif_name, &dpif);
if (!retval) {
- char dpif_name[IF_NAMESIZE];
- if (dpif_get_name(&dpif, dpif_name, sizeof dpif_name)
- || !cfg_has("bridge.%s.port", dpif_name)) {
- dpif_delete(&dpif);
+ struct svec all_names;
+ size_t j;
+
+ svec_init(&all_names);
+ dpif_get_all_names(dpif, &all_names);
+ for (j = 0; j < all_names.n; j++) {
+ if (cfg_has("bridge.%s.port", all_names.names[j])) {
+ goto found;
+ }
}
- dpif_close(&dpif);
- } else if (retval != ENODEV) {
- VLOG_ERR("failed to delete datapath dp%d: %s",
- i, strerror(retval));
+ dpif_delete(dpif);
+ found:
+ svec_destroy(&all_names);
+ dpif_close(dpif);
}
}
+ svec_destroy(&dpif_names);
unixctl_command_register("bridge/dump-flows", bridge_unixctl_dump_flows);
+ bond_init();
bridge_reconfigure();
}
* the old certificate will still be trusted until vSwitch is
* restarted. We may want to address this in vconn's SSL library. */
if (config_string_change("ssl.ca-cert", &cacert_file)
- || (stat(cacert_file, &s) && errno == ENOENT)) {
+ || (cacert_file && stat(cacert_file, &s) && errno == ENOENT)) {
vconn_ssl_set_ca_cert_file(cacert_file,
cfg_get_bool(0, "ssl.bootstrap-ca-cert"));
}
}
#endif
+/* iterate_and_prune_ifaces() callback function that opens the network device
+ * for 'iface', if it is not already open, and retrieves the interface's MAC
+ * address and carrier status. */
+static bool
+init_iface_netdev(struct bridge *br UNUSED, struct iface *iface,
+ void *aux UNUSED)
+{
+ if (iface->netdev) {
+ return true;
+ } else if (!netdev_open(iface->name, NETDEV_ETH_TYPE_NONE,
+ &iface->netdev)) {
+ netdev_get_carrier(iface->netdev, &iface->enabled);
+ return true;
+ } else {
+ /* If the network device can't be opened, then we're not going to try
+ * to do anything with this interface. */
+ return false;
+ }
+}
+
+static bool
+check_iface_dp_ifidx(struct bridge *br, struct iface *iface, void *aux UNUSED)
+{
+ if (iface->dp_ifidx >= 0) {
+ VLOG_DBG("%s has interface %s on port %d",
+ dpif_name(br->dpif),
+ iface->name, iface->dp_ifidx);
+ return true;
+ } else {
+ VLOG_ERR("%s interface not in %s, dropping",
+ iface->name, dpif_name(br->dpif));
+ return false;
+ }
+}
+
+static bool
+set_iface_properties(struct bridge *br UNUSED, struct iface *iface,
+ void *aux UNUSED)
+{
+ int rate, burst;
+
+ /* Set policing attributes. */
+ rate = cfg_get_int(0, "port.%s.ingress.policing-rate", iface->name);
+ burst = cfg_get_int(0, "port.%s.ingress.policing-burst", iface->name);
+ netdev_set_policing(iface->netdev, rate, burst);
+
+ /* Set MAC address of internal interfaces other than the local
+ * interface. */
+ if (iface->dp_ifidx != ODPP_LOCAL
+ && iface_is_internal(br, iface->name)) {
+ iface_set_mac(iface);
+ }
+
+ return true;
+}
+
+/* Calls 'cb' for each interfaces in 'br', passing along the 'aux' argument.
+ * Deletes from 'br' all the interfaces for which 'cb' returns false, and then
+ * deletes from 'br' any ports that no longer have any interfaces. */
+static void
+iterate_and_prune_ifaces(struct bridge *br,
+ bool (*cb)(struct bridge *, struct iface *,
+ void *aux),
+ void *aux)
+{
+ size_t i, j;
+
+ for (i = 0; i < br->n_ports; ) {
+ struct port *port = br->ports[i];
+ for (j = 0; j < port->n_ifaces; ) {
+ struct iface *iface = port->ifaces[j];
+ if (cb(br, iface, aux)) {
+ j++;
+ } else {
+ iface_destroy(iface);
+ }
+ }
+
+ if (port->n_ifaces) {
+ i++;
+ } else {
+ VLOG_ERR("%s port has no interfaces, dropping", port->name);
+ port_destroy(port);
+ }
+ }
+}
+
void
bridge_reconfigure(void)
{
- struct svec old_br, new_br, raw_new_br;
+ struct svec old_br, new_br;
struct bridge *br, *next;
- size_t i, j;
+ size_t i;
COVERAGE_INC(bridge_reconfigure);
- /* Collect old bridges. */
+ /* Collect old and new bridges. */
svec_init(&old_br);
+ svec_init(&new_br);
LIST_FOR_EACH (br, struct bridge, node, &all_bridges) {
svec_add(&old_br, br->name);
}
-
- /* Collect new bridges. */
- svec_init(&raw_new_br);
- cfg_get_subsections(&raw_new_br, "bridge");
- svec_init(&new_br);
- for (i = 0; i < raw_new_br.n; i++) {
- const char *name = raw_new_br.names[i];
- if ((!strncmp(name, "dp", 2) && isdigit(name[2])) ||
- (!strncmp(name, "nl:", 3) && isdigit(name[3]))) {
- VLOG_ERR("%s is not a valid bridge name (bridges may not be "
- "named \"dp\" or \"nl:\" followed by a digit)", name);
- } else {
- svec_add(&new_br, name);
- }
- }
- svec_destroy(&raw_new_br);
+ cfg_get_subsections(&new_br, "bridge");
/* Get rid of deleted bridges and add new bridges. */
svec_sort(&old_br);
size_t n_dpif_ports;
struct svec want_ifaces;
- dpif_port_list(&br->dpif, &dpif_ports, &n_dpif_ports);
+ dpif_port_list(br->dpif, &dpif_ports, &n_dpif_ports);
bridge_get_all_ifaces(br, &want_ifaces);
for (i = 0; i < n_dpif_ports; i++) {
const struct odp_port *p = &dpif_ports[i];
if (!svec_contains(&want_ifaces, p->devname)
&& strcmp(p->devname, br->name)) {
- int retval = dpif_port_del(&br->dpif, p->port);
+ int retval = dpif_port_del(br->dpif, p->port);
if (retval) {
- VLOG_ERR("failed to remove %s interface from dp%u: %s",
- p->devname, dpif_id(&br->dpif), strerror(retval));
+ VLOG_ERR("failed to remove %s interface from %s: %s",
+ p->devname, dpif_name(br->dpif),
+ strerror(retval));
}
}
}
struct odp_port *dpif_ports;
size_t n_dpif_ports;
struct svec cur_ifaces, want_ifaces, add_ifaces;
- int next_port_no;
- dpif_port_list(&br->dpif, &dpif_ports, &n_dpif_ports);
+ dpif_port_list(br->dpif, &dpif_ports, &n_dpif_ports);
svec_init(&cur_ifaces);
for (i = 0; i < n_dpif_ports; i++) {
svec_add(&cur_ifaces, dpif_ports[i].devname);
bridge_get_all_ifaces(br, &want_ifaces);
svec_diff(&want_ifaces, &cur_ifaces, &add_ifaces, NULL, NULL);
- next_port_no = 1;
for (i = 0; i < add_ifaces.n; i++) {
const char *if_name = add_ifaces.names[i];
- for (;;) {
- bool internal;
- int error;
-
- /* Add to datapath. */
- internal = iface_is_internal(br, if_name);
- error = dpif_port_add(&br->dpif, if_name, next_port_no++,
- internal ? ODP_PORT_INTERNAL : 0);
- if (error != EEXIST) {
- if (next_port_no >= 256) {
- VLOG_ERR("ran out of valid port numbers on dp%u",
- dpif_id(&br->dpif));
- goto out;
- }
- if (error) {
- VLOG_ERR("failed to add %s interface to dp%u: %s",
- if_name, dpif_id(&br->dpif), strerror(error));
- }
- break;
- }
+ bool internal;
+ int error;
+
+ /* Add to datapath. */
+ internal = iface_is_internal(br, if_name);
+ error = dpif_port_add(br->dpif, if_name,
+ internal ? ODP_PORT_INTERNAL : 0, NULL);
+ if (error == EFBIG) {
+ VLOG_ERR("ran out of valid port numbers on %s",
+ dpif_name(br->dpif));
+ break;
+ } else if (error) {
+ VLOG_ERR("failed to add %s interface to %s: %s",
+ if_name, dpif_name(br->dpif), strerror(error));
}
}
- out:
svec_destroy(&cur_ifaces);
svec_destroy(&want_ifaces);
svec_destroy(&add_ifaces);
LIST_FOR_EACH (br, struct bridge, node, &all_bridges) {
uint8_t ea[8];
uint64_t dpid;
- struct iface *local_iface = NULL;
- const char *devname;
+ struct iface *local_iface;
+ struct iface *hw_addr_iface;
struct netflow_options nf_options;
bridge_fetch_dp_ifaces(br);
- for (i = 0; i < br->n_ports; ) {
- struct port *port = br->ports[i];
+ iterate_and_prune_ifaces(br, init_iface_netdev, NULL);
- for (j = 0; j < port->n_ifaces; ) {
- struct iface *iface = port->ifaces[j];
- if (iface->dp_ifidx < 0) {
- VLOG_ERR("%s interface not in dp%u, dropping",
- iface->name, dpif_id(&br->dpif));
- iface_destroy(iface);
- } else {
- if (iface->dp_ifidx == ODPP_LOCAL) {
- local_iface = iface;
- }
- VLOG_DBG("dp%u has interface %s on port %d",
- dpif_id(&br->dpif), iface->name, iface->dp_ifidx);
- j++;
- }
- }
- if (!port->n_ifaces) {
- VLOG_ERR("%s port has no interfaces, dropping", port->name);
- port_destroy(port);
- continue;
- }
- i++;
- }
+ iterate_and_prune_ifaces(br, check_iface_dp_ifidx, NULL);
/* Pick local port hardware address, datapath ID. */
- bridge_pick_local_hw_addr(br, ea, &devname);
+ bridge_pick_local_hw_addr(br, ea, &hw_addr_iface);
+ local_iface = bridge_get_local_iface(br);
if (local_iface) {
- int error = netdev_nodev_set_etheraddr(local_iface->name, ea);
+ int error = netdev_set_etheraddr(local_iface->netdev, ea);
if (error) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
VLOG_ERR_RL(&rl, "bridge %s: failed to set bridge "
}
}
- dpid = bridge_pick_datapath_id(br, ea, devname);
+ dpid = bridge_pick_datapath_id(br, ea, hw_addr_iface);
ofproto_set_datapath_id(br->ofproto, dpid);
/* Set NetFlow configuration on this bridge. */
memset(&nf_options, 0, sizeof nf_options);
- nf_options.engine_type = br->dpif.minor;
- nf_options.engine_id = br->dpif.minor;
+ dpif_get_netflow_ids(br->dpif, &nf_options.engine_type,
+ &nf_options.engine_id);
nf_options.active_timeout = -1;
if (cfg_has("netflow.%s.engine-type", br->name)) {
struct port *port = br->ports[i];
port_update_vlan_compat(port);
-
- for (j = 0; j < port->n_ifaces; j++) {
- struct iface *iface = port->ifaces[j];
- if (iface->dp_ifidx != ODPP_LOCAL
- && iface_is_internal(br, iface->name)) {
- iface_set_mac(iface);
- }
- }
+ port_update_bonding(port);
}
}
LIST_FOR_EACH (br, struct bridge, node, &all_bridges) {
brstp_reconfigure(br);
+ iterate_and_prune_ifaces(br, set_iface_properties, NULL);
}
}
static void
bridge_pick_local_hw_addr(struct bridge *br, uint8_t ea[ETH_ADDR_LEN],
- const char **devname)
+ struct iface **hw_addr_iface)
{
uint64_t requested_ea;
size_t i, j;
int error;
- *devname = NULL;
+ *hw_addr_iface = NULL;
/* Did the user request a particular MAC? */
requested_ea = cfg_get_mac(0, "bridge.%s.mac", br->name);
for (j = 0; j < port->n_ifaces; j++) {
struct iface *candidate = port->ifaces[j];
uint8_t candidate_ea[ETH_ADDR_LEN];
- if (!netdev_nodev_get_etheraddr(candidate->name, candidate_ea)
+ if (!netdev_get_etheraddr(candidate->netdev, candidate_ea)
&& eth_addr_equals(iface_ea, candidate_ea)) {
iface = candidate;
}
}
/* Grab MAC. */
- error = netdev_nodev_get_etheraddr(iface->name, iface_ea);
+ error = netdev_get_etheraddr(iface->netdev, iface_ea);
if (error) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
VLOG_ERR_RL(&rl, "failed to obtain Ethernet address of %s: %s",
memcmp(iface_ea, ea, ETH_ADDR_LEN) < 0)
{
memcpy(ea, iface_ea, ETH_ADDR_LEN);
- *devname = iface ? iface->name : NULL;
+ *hw_addr_iface = iface;
}
}
if (eth_addr_is_multicast(ea) || eth_addr_is_vif(ea)) {
memcpy(ea, br->default_ea, ETH_ADDR_LEN);
- *devname = NULL;
+ *hw_addr_iface = NULL;
VLOG_WARN("bridge %s: using default bridge Ethernet "
"address "ETH_ADDR_FMT, br->name, ETH_ADDR_ARGS(ea));
} else {
/* Choose and returns the datapath ID for bridge 'br' given that the bridge
* Ethernet address is 'bridge_ea'. If 'bridge_ea' is the Ethernet address of
- * a network device, then that network device's name must be passed in as
- * 'devname'; if 'bridge_ea' was derived some other way, then 'devname' must be
- * passed in as a null pointer. */
+ * an interface on 'br', then that interface must be passed in as
+ * 'hw_addr_iface'; if 'bridge_ea' was derived some other way, then
+ * 'hw_addr_iface' must be passed in as a null pointer. */
static uint64_t
bridge_pick_datapath_id(struct bridge *br,
const uint8_t bridge_ea[ETH_ADDR_LEN],
- const char *devname)
+ struct iface *hw_addr_iface)
{
/*
* The procedure for choosing a bridge MAC address will, in the most
return dpid;
}
- if (devname) {
+ if (hw_addr_iface) {
int vlan;
- if (!netdev_get_vlan_vid(devname, &vlan)) {
+ if (!netdev_get_vlan_vid(hw_addr_iface->netdev, &vlan)) {
/*
* A bridge whose MAC address is taken from a VLAN network device
* (that is, a network device created with vconfig(8) or similar
br->flush = true;
mac_learning_flush(br->ml);
}
+
+/* Returns the 'br' interface for the ODPP_LOCAL port, or null if 'br' has no
+ * such interface. */
+static struct iface *
+bridge_get_local_iface(struct bridge *br)
+{
+ size_t i, j;
+
+ for (i = 0; i < br->n_ports; i++) {
+ struct port *port = br->ports[i];
+ for (j = 0; j < port->n_ifaces; j++) {
+ struct iface *iface = port->ifaces[j];
+ if (iface->dp_ifidx == ODPP_LOCAL) {
+ return iface;
+ }
+ }
+ }
+
+ return NULL;
+}
\f
/* Bridge unixctl user interface functions. */
static void
br = xcalloc(1, sizeof *br);
error = dpif_create(name, &br->dpif);
- if (error == EEXIST) {
+ if (error == EEXIST || error == EBUSY) {
error = dpif_open(name, &br->dpif);
if (error) {
VLOG_ERR("datapath %s already exists but cannot be opened: %s",
free(br);
return NULL;
}
- dpif_flow_flush(&br->dpif);
+ dpif_flow_flush(br->dpif);
} else if (error) {
VLOG_ERR("failed to create datapath %s: %s", name, strerror(error));
free(br);
error = ofproto_create(name, &bridge_ofhooks, br, &br->ofproto);
if (error) {
VLOG_ERR("failed to create switch %s: %s", name, strerror(error));
- dpif_delete(&br->dpif);
- dpif_close(&br->dpif);
+ dpif_delete(br->dpif);
+ dpif_close(br->dpif);
free(br);
return NULL;
}
list_push_back(&all_bridges, &br->node);
- VLOG_INFO("created bridge %s on dp%u", br->name, dpif_id(&br->dpif));
+ VLOG_INFO("created bridge %s on %s", br->name, dpif_name(br->dpif));
return br;
}
port_destroy(br->ports[br->n_ports - 1]);
}
list_remove(&br->node);
- error = dpif_delete(&br->dpif);
+ error = dpif_delete(br->dpif);
if (error && error != ENOENT) {
- VLOG_ERR("failed to delete dp%u: %s",
- dpif_id(&br->dpif), strerror(error));
+ VLOG_ERR("failed to delete %s: %s",
+ dpif_name(br->dpif), strerror(error));
}
- dpif_close(&br->dpif);
+ dpif_close(br->dpif);
ofproto_destroy(br->ofproto);
free(br->controller);
mac_learning_destroy(br->ml);
return controller && controller[0] ? controller : NULL;
}
+static bool
+check_duplicate_ifaces(struct bridge *br, struct iface *iface, void *ifaces_)
+{
+ struct svec *ifaces = ifaces_;
+ if (!svec_contains(ifaces, iface->name)) {
+ svec_add(ifaces, iface->name);
+ svec_sort(ifaces);
+ return true;
+ } else {
+ VLOG_ERR("bridge %s: %s interface is on multiple ports, "
+ "removing from %s",
+ br->name, iface->name, iface->port->name);
+ return false;
+ }
+}
+
static void
bridge_reconfigure_one(struct bridge *br)
{
struct svec old_ports, new_ports, ifaces;
struct svec listeners, old_listeners;
struct svec snoops, old_snoops;
- size_t i, j;
+ size_t i;
/* Collect old ports. */
svec_init(&old_ports);
svec_init(&new_ports);
cfg_get_all_keys(&new_ports, "bridge.%s.port", br->name);
svec_sort(&new_ports);
- if (bridge_get_controller(br) && !svec_contains(&new_ports, br->name)) {
- svec_add(&new_ports, br->name);
- svec_sort(&new_ports);
+ if (bridge_get_controller(br)) {
+ char local_name[IF_NAMESIZE];
+ int error;
+
+ error = dpif_port_get_name(br->dpif, ODPP_LOCAL,
+ local_name, sizeof local_name);
+ if (!error && !svec_contains(&new_ports, local_name)) {
+ svec_add(&new_ports, local_name);
+ svec_sort(&new_ports);
+ }
}
if (!svec_is_unique(&new_ports)) {
VLOG_WARN("bridge %s: %s specified twice as bridge port",
/* Check and delete duplicate interfaces. */
svec_init(&ifaces);
- for (i = 0; i < br->n_ports; ) {
- struct port *port = br->ports[i];
- for (j = 0; j < port->n_ifaces; ) {
- struct iface *iface = port->ifaces[j];
- if (svec_contains(&ifaces, iface->name)) {
- VLOG_ERR("bridge %s: %s interface is on multiple ports, "
- "removing from %s",
- br->name, iface->name, port->name);
- iface_destroy(iface);
- } else {
- svec_add(&ifaces, iface->name);
- svec_sort(&ifaces);
- j++;
- }
- }
- if (!port->n_ifaces) {
- VLOG_ERR("%s port has no interfaces, dropping", port->name);
- port_destroy(port);
- } else {
- i++;
- }
- }
+ iterate_and_prune_ifaces(br, check_duplicate_ifaces, &ifaces);
svec_destroy(&ifaces);
/* Delete all flows if we're switching from connected to standalone or vice
cfg_get_string(0, "%s.accept-regex", pfx),
update_resolv_conf);
} else {
- struct netdev *netdev;
+ struct iface *local_iface;
bool in_band;
- int error;
in_band = (!cfg_is_valid(CFG_BOOL | CFG_REQUIRED,
"%s.in-band", pfx)
ofproto_set_discovery(br->ofproto, false, NULL, NULL);
ofproto_set_in_band(br->ofproto, in_band);
- error = netdev_open(br->name, NETDEV_ETH_TYPE_NONE, &netdev);
- if (!error) {
- if (cfg_is_valid(CFG_IP | CFG_REQUIRED, "%s.ip", pfx)) {
- struct in_addr ip, mask, gateway;
- ip.s_addr = cfg_get_ip(0, "%s.ip", pfx);
- mask.s_addr = cfg_get_ip(0, "%s.netmask", pfx);
- gateway.s_addr = cfg_get_ip(0, "%s.gateway", pfx);
-
- netdev_turn_flags_on(netdev, NETDEV_UP, true);
- if (!mask.s_addr) {
- mask.s_addr = guess_netmask(ip.s_addr);
- }
- if (!netdev_set_in4(netdev, ip, mask)) {
- VLOG_INFO("bridge %s: configured IP address "IP_FMT", "
- "netmask "IP_FMT,
- br->name, IP_ARGS(&ip.s_addr),
- IP_ARGS(&mask.s_addr));
- }
+ local_iface = bridge_get_local_iface(br);
+ if (local_iface
+ && cfg_is_valid(CFG_IP | CFG_REQUIRED, "%s.ip", pfx)) {
+ struct netdev *netdev = local_iface->netdev;
+ struct in_addr ip, mask, gateway;
+ ip.s_addr = cfg_get_ip(0, "%s.ip", pfx);
+ mask.s_addr = cfg_get_ip(0, "%s.netmask", pfx);
+ gateway.s_addr = cfg_get_ip(0, "%s.gateway", pfx);
+
+ netdev_turn_flags_on(netdev, NETDEV_UP, true);
+ if (!mask.s_addr) {
+ mask.s_addr = guess_netmask(ip.s_addr);
+ }
+ if (!netdev_set_in4(netdev, ip, mask)) {
+ VLOG_INFO("bridge %s: configured IP address "IP_FMT", "
+ "netmask "IP_FMT,
+ br->name, IP_ARGS(&ip.s_addr),
+ IP_ARGS(&mask.s_addr));
+ }
- if (gateway.s_addr) {
- if (!netdev_add_router(gateway)) {
- VLOG_INFO("bridge %s: configured gateway "IP_FMT,
- br->name, IP_ARGS(&gateway.s_addr));
- }
+ if (gateway.s_addr) {
+ if (!netdev_add_router(netdev, gateway)) {
+ VLOG_INFO("bridge %s: configured gateway "IP_FMT,
+ br->name, IP_ARGS(&gateway.s_addr));
}
}
- netdev_close(netdev);
}
}
}
port_array_clear(&br->ifaces);
- dpif_port_list(&br->dpif, &dpif_ports, &n_dpif_ports);
+ dpif_port_list(br->dpif, &dpif_ports, &n_dpif_ports);
for (i = 0; i < n_dpif_ports; i++) {
struct odp_port *p = &dpif_ports[i];
struct iface *iface = iface_lookup(br, p->devname);
if (iface) {
if (iface->dp_ifidx >= 0) {
- VLOG_WARN("dp%u reported interface %s twice",
- dpif_id(&br->dpif), p->devname);
+ VLOG_WARN("%s reported interface %s twice",
+ dpif_name(br->dpif), p->devname);
} else if (iface_from_dp_ifidx(br, p->port)) {
- VLOG_WARN("dp%u reported interface %"PRIu16" twice",
- dpif_id(&br->dpif), p->port);
+ VLOG_WARN("%s reported interface %"PRIu16" twice",
+ dpif_name(br->dpif), p->port);
} else {
port_array_set(&br->ifaces, p->port, iface);
iface->dp_ifidx = p->port;
bridge_flush(br);
} else {
- memcpy(iface->mac, opp->hw_addr, ETH_ADDR_LEN);
if (port->n_ifaces > 1) {
bool up = !(opp->state & OFPPS_LINK_DOWN);
bond_link_status_update(iface, up);
enable_slave(conn, args, false);
}
+static void
+bond_unixctl_hash(struct unixctl_conn *conn, const char *args)
+{
+ uint8_t mac[ETH_ADDR_LEN];
+ uint8_t hash;
+ char *hash_cstr;
+
+ if (sscanf(args, ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))
+ == ETH_ADDR_SCAN_COUNT) {
+ hash = bond_hash(mac);
+
+ hash_cstr = xasprintf("%u", hash);
+ unixctl_command_reply(conn, 200, hash_cstr);
+ free(hash_cstr);
+ } else {
+ unixctl_command_reply(conn, 501, "invalid mac");
+ }
+}
+
static void
bond_init(void)
{
bond_unixctl_set_active_slave);
unixctl_command_register("bond/enable-slave", bond_unixctl_enable_slave);
unixctl_command_register("bond/disable-slave", bond_unixctl_disable_slave);
+ unixctl_command_register("bond/hash", bond_unixctl_hash);
}
\f
/* Port functions. */
if (slave->up) {
bond.up = true;
}
- memcpy(slave->mac, iface->mac, ETH_ADDR_LEN);
+ netdev_get_etheraddr(iface->netdev, slave->mac);
}
if (cfg_get_bool(0, "bonding.%s.fake-iface", port->name)) {
&& p->n_ifaces
&& (!vlandev_name || strcmp(p->name, vlandev_name) <= 0))
{
- const uint8_t *ea = p->ifaces[0]->mac;
+ uint8_t ea[ETH_ADDR_LEN];
+ netdev_get_etheraddr(p->ifaces[0]->netdev, ea);
if (!eth_addr_is_multicast(ea) &&
!eth_addr_is_reserved(ea) &&
!eth_addr_is_zero(ea)) {
iface->dp_ifidx = -1;
iface->tag = tag_create_random();
iface->delay_expires = LLONG_MAX;
-
- if (!cfg_get_bool(0, "iface.%s.internal", iface->name)) {
- netdev_nodev_get_etheraddr(name, iface->mac);
- netdev_nodev_get_carrier(name, &iface->enabled);
- } else {
- /* Internal interfaces are created later by the call to dpif_port_add()
- * in bridge_reconfigure(). Until then, we can't obtain any
- * information about them. (There's no real value in doing so, anyway,
- * because the 'mac' and 'enabled' values are only used for interfaces
- * that are bond slaves, and it doesn't normally make sense to bond an
- * internal interface.) */
- }
+ iface->netdev = NULL;
if (port->n_ifaces >= port->allocated_ifaces) {
port->ifaces = x2nrealloc(port->ifaces, &port->allocated_ifaces,
VLOG_DBG("attached network device %s to port %s", iface->name, port->name);
- port_update_bonding(port);
bridge_flush(port->bridge);
}
del = port->ifaces[iface->port_ifidx] = port->ifaces[--port->n_ifaces];
del->port_ifidx = iface->port_ifidx;
+ netdev_close(iface->netdev);
free(iface->name);
free(iface);
bond_send_learning_packets(port);
}
- port_update_bonding(port);
bridge_flush(port->bridge);
}
}
VLOG_ERR("ignoring iface.%s.mac; use bridge.%s.mac instead",
iface->name, iface->name);
} else {
- int error = netdev_nodev_set_etheraddr(iface->name, ea);
+ int error = netdev_set_etheraddr(iface->netdev, ea);
if (error) {
VLOG_ERR("interface %s: setting MAC failed (%s)",
iface->name, strerror(error));
if (!iface) {
VLOG_WARN_RL(&rl, "%s: cannot send BPDU on unknown port %d",
br->name, port_no);
- } else if (eth_addr_is_zero(iface->mac)) {
- VLOG_WARN_RL(&rl, "%s: cannot send BPDU on port %d with unknown MAC",
- br->name, port_no);
} else {
- union ofp_action action;
struct eth_header *eth = pkt->l2;
- flow_t flow;
- memcpy(eth->eth_src, iface->mac, ETH_ADDR_LEN);
+ netdev_get_etheraddr(iface->netdev, eth->eth_src);
+ if (eth_addr_is_zero(eth->eth_src)) {
+ VLOG_WARN_RL(&rl, "%s: cannot send BPDU on port %d "
+ "with unknown MAC", br->name, port_no);
+ } else {
+ union ofp_action action;
+ flow_t flow;
- memset(&action, 0, sizeof action);
- action.type = htons(OFPAT_OUTPUT);
- action.output.len = htons(sizeof action);
- action.output.port = htons(port_no);
+ memset(&action, 0, sizeof action);
+ action.type = htons(OFPAT_OUTPUT);
+ action.output.len = htons(sizeof action);
+ action.output.port = htons(port_no);
- flow_extract(pkt, ODPP_NONE, &flow);
- ofproto_send_packet(br->ofproto, &flow, &action, 1, pkt);
+ flow_extract(pkt, ODPP_NONE, &flow);
+ ofproto_send_packet(br->ofproto, &flow, &action, 1, pkt);
+ }
}
ofpbuf_delete(pkt);
}
appear in \fIcommand\fR.
.IP
The commands that are substituted into \fIcommand\fR are those that
-can be listed by passing \fB-e help\fR to \fBovs\-appctl\fR with
-\fBovs\-vswitchd\fR as target. The command that is substituted may
-include white space-separated arguments, so \fIcommand\fR should include
-shell quotes around \fB%s\fR.
+can be listed by passing \fBhelp\fR to \fBovs\-appctl\fR with
+\fBovs\-vswitchd\fR as target.
.IP
\fIcommand\fR must not redirect \fBovs\-appctl\fR's standard output or
standard error streams, because \fBovs\-brcompatd\fR expects to read
.BR ovs\-appctl (8),
.BR ovs\-vswitchd (8),
.BR ovs\-vswitchd.conf (5),
-\fBINSTALL\fR in the Open vSwitch distribution.
+\fBINSTALL.bridge\fR in the Open vSwitch distribution.
#include "coverage.h"
#include "daemon.h"
#include "dirs.h"
-#include "dpif.h"
#include "dynamic-string.h"
#include "fatal-signal.h"
#include "fault.h"
prune_ports(void)
{
int i, j;
- int error;
struct svec bridges, delete;
if (cfg_lock(NULL, 0)) {
get_bridge_ifaces(br_name, &ifaces, -1);
for (j = 0; j < ifaces.n; j++) {
const char *iface_name = ifaces.names[j];
- enum netdev_flags flags;
/* The local port and internal ports are created and destroyed by
* ovs-vswitchd itself, so don't bother checking for them at all.
continue;
}
- error = netdev_nodev_get_flags(iface_name, &flags);
- if (error == ENODEV) {
+ if (!netdev_exists(iface_name)) {
VLOG_INFO_RL(&rl, "removing dead interface %s from %s",
iface_name, br_name);
svec_add(&delete, iface_name);
- } else if (error) {
- VLOG_INFO_RL(&rl, "unknown error %d on interface %s from %s",
- error, iface_name, br_name);
}
}
svec_destroy(&ifaces);
svec_destroy(&delete);
}
-/* Checks whether a network device named 'name' exists and returns true if so,
- * false otherwise.
- *
- * XXX it is possible that this doesn't entirely accomplish what we want in
- * context, since ovs-vswitchd.conf may cause vswitchd to create or destroy
- * network devices based on iface.*.internal settings.
- *
- * XXX may want to move this to lib/netdev.
- *
- * XXX why not just use netdev_nodev_get_flags() or similar function? */
-static bool
-netdev_exists(const char *name)
-{
- struct stat s;
- char *filename;
- int error;
-
- filename = xasprintf("/sys/class/net/%s", name);
- error = stat(filename, &s);
- free(filename);
- return !error;
-}
-
static int
add_bridge(const char *br_name)
{
for (i = 0; i < ifaces.n; i++) {
const char *iface_name = ifaces.names[i];
struct mac *mac = &local_macs[n_local_macs];
- if (!netdev_nodev_get_etheraddr(iface_name, mac->addr)) {
- n_local_macs++;
+ struct netdev *netdev;
+
+ error = netdev_open(iface_name, NETDEV_ETH_TYPE_NONE, &netdev);
+ if (netdev) {
+ if (!netdev_get_etheraddr(netdev, mac->addr)) {
+ n_local_macs++;
+ }
+ netdev_close(netdev);
}
}
svec_destroy(&ifaces);
const char *port_name = nl_attr_get_string(attrs[IFLA_IFNAME]);
char br_name[IFNAMSIZ];
uint32_t br_idx = nl_attr_get_u32(attrs[IFLA_MASTER]);
- enum netdev_flags flags;
if (!if_indextoname(br_idx, br_name)) {
ofpbuf_delete(buf);
return;
}
- if (netdev_nodev_get_flags(port_name, &flags) == ENODEV) {
+ if (!netdev_exists(port_name)) {
/* Network device is really gone. */
struct svec ports;
for (;;) {
unixctl_server_run(unixctl);
brc_recv_update();
+ netdev_run();
/* If 'prune_timeout' is non-zero, we actively prune from the
* config file any 'bridge.<br_name>.port' entries that are no
nl_sock_wait(brc_sock, POLLIN);
unixctl_server_wait(unixctl);
+ netdev_wait();
poll_block();
}
char *short_options = long_options_to_short_options(long_options);
int error;
- appctl_command = xasprintf("%s/ovs-appctl -t "
- "%s/ovs-vswitchd.`cat %s/ovs-vswitchd.pid`.ctl "
- "-e '%%s'",
- ovs_bindir, ovs_rundir, ovs_rundir);
+ appctl_command = xasprintf("%s/ovs-appctl %%s", ovs_bindir);
for (;;) {
int c;
. ns
. IP "\\$1"
..
-.TH ovs\-vswitchd 8 "March 2009" "Open vSwitch" "Open vSwitch Manual"
+.TH ovs\-vswitchd 8 "June 2009" "Open vSwitch" "Open vSwitch Manual"
.ds PN ovs\-vswitchd
.
.SH NAME
-ovs\-vswitchd \- virtual switch daemon
+ovs\-vswitchd \- Open vSwitch daemon
.
.SH SYNOPSIS
.B ovs\-vswitchd
\fIconfig\fR
.
.SH DESCRIPTION
-A daemon that manages and controls any number of virtual switches on
-the local machine.
+A daemon that manages and controls any number of Open vSwitch switches
+on the local machine.
.PP
The mandatory \fIconfig\fR argument specifies a configuration file.
For a description of \fBovs\-vswitchd\fR configuration syntax, see
files. If a logfile was specified on the command line it will also
be opened or reopened.
.PP
-\fBovs\-vswitchd\fR virtual switches may be configured with any of the
-following features:
+\fBovs\-vswitchd\fR switches may be configured with any of the following
+features:
.
.IP \(bu
L2 switching with MAC learning.
.
.PP
Only a single instance of \fBovs\-vswitchd\fR is intended to run at a time.
-A single \fBovs\-vswitchd\fR can manage any number of virtual switches, up
+A single \fBovs\-vswitchd\fR can manage any number of switch instances, up
to the maximum number of supported Open vSwitch datapaths.
.PP
\fBovs\-vswitchd\fR does all the necessary management of Open vSwitch datapaths
its operation. (\fBovs\-dpctl\fR may still be useful for diagnostics.)
.PP
An Open vSwitch datapath kernel module must be loaded for \fBovs\-vswitchd\fR
-to be useful. Please refer to the \fBINSTALL\fR file included in the
+to be useful. Please refer to the \fBINSTALL.Linux\fR file included in the
Open vSwitch distribution for instructions on how to build and load
the Open vSwitch kernel module.
.PP
.IP
This setting is not permanent: it persists only until the carrier
status of \fIslave\fR changes.
+.IP "\fBbond/hash\fR \fImac\fR"
+Returns the hash value which would be used for \fImac\fR.
.
.so lib/vlog-unixctl.man
.SH "SEE ALSO"
.BR ovs\-appctl (8),
.BR ovs\-vswitchd.conf (5),
.BR ovs\-brcompatd (8),
-\fBINSTALL\fR in the Open vSwitch distribution.
+\fBINSTALL.Linux\fR in the Open vSwitch distribution.
#include "command-line.h"
#include "compiler.h"
#include "daemon.h"
+#include "dpif.h"
#include "fault.h"
#include "leak-checker.h"
#include "mgmt.h"
+#include "netdev.h"
#include "ovs-vswitchd.h"
#include "poll-loop.h"
-#include "port.h"
#include "proc-net-compat.h"
#include "process.h"
#include "signals.h"
}
mgmt_init();
bridge_init();
- port_init();
mgmt_reconfigure();
need_reconfigure = false;
need_reconfigure = true;
}
unixctl_server_run(unixctl);
+ dp_run();
+ netdev_run();
if (need_reconfigure) {
poll_immediate_wake();
mgmt_wait();
bridge_wait();
unixctl_server_wait(unixctl);
+ dp_wait();
+ netdev_wait();
poll_block();
}
cfg_read();
bridge_reconfigure();
mgmt_reconfigure();
- port_reconfigure();
for (i = 0; i < n_conns; i++) {
unixctl_command_reply(conns[i], 202, NULL);
static void
usage(void)
{
- printf("%s: virtual switch daemon\n"
+ printf("%s: Open vSwitch daemon\n"
"usage: %s [OPTIONS] CONFIG\n"
"CONFIG is a configuration file in ovs-vswitchd.conf(5) format.\n",
program_name, program_name);
. RE
. PP
..
-.TH ovs\-vswitchd.conf 5 "April 2009" "Open vSwitch" "Open vSwitch Manual"
+.TH ovs\-vswitchd.conf 5 "June 2009" "Open vSwitch" "Open vSwitch Manual"
.
.SH NAME
ovs\-vswitchd.conf \- configuration file for \fBovs\-vswitchd\fR
.SS "Bridge Configuration"
A bridge (switch) with a given \fIname\fR is configured by specifying
the names of its network devices as values for key
-\fBbridge.\fIname\fB.port\fR. (The specified \fIname\fR may not begin
-with \fBdp\fR or \fBnl:\fR followed by a digit.)
+\fBbridge.\fIname\fB.port\fR.
.PP
To designate network device \fInetdev\fR as an internal port, add
\fBiface.\fInetdev\fB.internal=true\fR to the configuration file,
\fBnetflow.\fIbridge\fB.engine-id\fR, respectively. Each takes a value
between 0 and 255, inclusive.
-Many NetFlow collectors do not expect multiple virtual switches to be
+Many NetFlow collectors do not expect multiple switches to be
sending messages from the same host, and they do not store the engine
information which could be used to disambiguate the traffic. To prevent
flows from multiple switches appearing as if they came on the interface,
.TP
\fBdiscover\fR
Use controller discovery to find the local OpenFlow controller.
-Refer to \fBsecchan\fR(8) for information on how to configure a DHCP
+Refer to \fB\ovs\-openflowd\fR(8) for information on how to configure a DHCP
server to support controller discovery. The following additional
options control the discovery process:
.
.IP
The default regular expression is \fBssl:.*\fR, meaning that only SSL
controller connections will be accepted, when SSL is configured (see
-\fBSSL Configuration\fR), and \fB.*\fR otherwise, meaning that any
-controller will be accepted.
+\fBSSL Configuration\fR), and \fBtcp:.*\fR otherwise, meaning that only
+TCP controller connections will be accepted.
.IP
The regular expression is implicitly anchored at the beginning of the
controller location string, as if it begins with \fB^\fR.
By default, or if this is set to \fBtrue\fR, \fBovs\-vswitchd\fR connects
to the controller in-band. If this is set to \fBfalse\fR,
\fBovs\-vswitchd\fR connects to the controller out-of-band. Refer to
-\fBsecchan\fR(8) for a description of in-band and out-of-band control.
+\fBovs\-openflowd\fR(8) for a description of in-band and out-of-band control.
.IP "\fBbridge.\fIname\fB.controller.ip=\fIip\fR"
If specified, the IP address to configure on the bridge's local port.
.IP "\fBbridge.\fIname\fB.controller.netmask=\fInetmask\fR"
The minimum value of \fIsecs\fR is 5 seconds. The default is taken
from \fBmgmt.inactivity-probe\fR (see above).
.IP
-When the virtual switch is connected to the controller, it waits for a
+When the switch is connected to the controller, it waits for a
message to be received from the controller for \fIsecs\fR seconds
before it sends a inactivity probe to the controller. After sending
the inactivity probe, if no response is received for an additional
-\fIsecs\fR seconds, the secure channel assumes that the connection has
+\fIsecs\fR seconds, \fBovs-vswitchd\fR assumes that the connection has
been broken and attempts to reconnect.
.IP
Changing the inactivity probe interval also changes the interval
.IP "\fBbridge.\fIname\fB.controller.fail-mode=\fBstandalone\fR|\fBsecure\fR"
.IQ "\fBmgmt.fail-mode=standalone\fR|\fBsecure\fR"
When a controller is configured, it is, ordinarily, responsible for
-setting up all flows on the virtual switch. Thus, if the connection to
+setting up all flows on the switch. Thus, if the connection to
the controller fails, no new network connections can be set up. If
the connection to the controller stays down long enough, no packets
can pass through the switch at all.
attempt until it reaches the maximum. The default maximum backoff
time is taken from \fBmgmt.max-backoff\fR.
.ST "Controller Rate-Limiting"
-These settings configure how the virtual switch applies a ``token
+These settings configure how the switch applies a ``token
bucket'' to limit the rate at which packets in unknown flows are
forwarded to the OpenFlow controller for flow-setup processing. This
feature prevents a single bridge from overwhelming a controller.
+.PP
+In addition, when a high rate triggers rate-limiting,
+\fBovs\-vswitchd\fR queues controller packets for each port and
+transmits them to the controller at the configured rate. The number
+of queued packets is limited by a ``burst size'' parameter. The
+packet queue is shared fairly among the ports on a bridge.
+.PP
+\fBovs\-vswitchd\fR maintains two such packet rate-limiters per
+bridge. One of these applies to packets sent up to the controller
+because they do not correspond to any flow. The other applies to
+packets sent up to the controller by request through flow actions.
+When both rate-limiters are filled with packets, the actual rate that
+packets are sent to the controller is up to twice the specified rate.
.IP "\fBbridge.\fIname\fB.controller.rate-limit=\fIrate\fR"
.IQ "\fBmgmt.rate-limit=\fIrate\fR"
Limits the maximum rate at which packets will be forwarded to the
for controller connectivity, the following settings are required:
.TP
\fBssl.private-key=\fIprivkey.pem\fR
-Specifies a PEM file containing the private key used as the virtual
+Specifies a PEM file containing the private key used as the
switch's identity for SSL connections to the controller.
.TP
\fBssl.certificate=\fIcert.pem\fR
Specifies a PEM file containing a certificate, signed by the
certificate authority (CA) used by the controller and manager, that
-certifies the virtual switch's private key, identifying a trustworthy
+certifies the switch's private key, identifying a trustworthy
switch.
.TP
\fBssl.ca-cert=\fIcacert.pem\fR
Specifies a PEM file containing the CA certificate used to verify that
-the virtual switch is connected to a trustworthy controller.
+the switch is connected to a trustworthy controller.
.PP
These files are read only once, at \fBovs\-vswitchd\fR startup time. If
their contents change, \fBovs\-vswitchd\fR must be killed and restarted.
.PP
-These SSL settings apply to all SSL connections made by the virtual
-switch.
+These SSL settings apply to all SSL connections made by the switch.
.ST "CA Certificate Bootstrap"
Ordinarily, all of the files named in the SSL configuration must exist
when \fBovs\-vswitchd\fR starts. However, if \fBssl.bootstrap-ca-cert\fR
Listens for SSL connections on \fIport\fR (default: 6633). SSL must
be configured when this form is used (see \fBSSL Configuration\fR,
above).
-.IP "\fBptcp:\fR[\fIport\fR]"
+.IP "\fBptcp:\fR[\fIport\fR][\fB:\fIip\fR]"
Listens for TCP connections on \fIport\fR (default: 6633).
+By default, \fB\ovs\-vswitchd\fR listens for connections to any local
+IP address, but \fIip\fR may be specified to limit connections to the
+specified local \fIip\fR.
.RE
To entirely disable listening for management connections, set
\fBbridge.\fIname\fB.openflow.listeners\fR to the single value
+++ /dev/null
-/* Copyright (c) 2009 Nicira Networks
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <config.h>
-
-#include "bridge.h"
-#include "cfg.h"
-#include "netdev.h"
-#include "ovs-vswitchd.h"
-#include "port.h"
-#include "svec.h"
-
-#define THIS_MODULE VLM_port
-#include "vlog.h"
-
-static int
-set_ingress_policing(const char *port_name)
-{
- int kbits_rate = cfg_get_int(0, "port.%s.ingress.policing-rate",
- port_name);
- int kbits_burst = cfg_get_int(0, "port.%s.ingress.policing-burst",
- port_name);
-
- return netdev_nodev_set_policing(port_name, kbits_rate, kbits_burst);
-}
-
-void
-port_init(void)
-{
- port_reconfigure();
-}
-
-void
-port_reconfigure(void)
-{
- struct svec ports;
- int i;
-
- svec_init(&ports);
- bridge_get_ifaces(&ports);
- for (i=0; i<ports.n; i++) {
- set_ingress_policing(ports.names[i]);
- }
-}
+++ /dev/null
-/* Copyright (c) 2009 Nicira Networks
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at:
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef VSWITCHD_PORT_H
-#define VSWITCHD_PORT_H 1
-
-void port_init(void);
-void port_reconfigure(void);
-
-#endif /* port.h */
needed by the controller. This is called by the "vif" script,
which is run when virtual interfaces are added and removed.
- root_vswitch_scripts_refresh-xs-network-uuids
+ usr_share_vswitch_scripts_refresh-xs-network-uuids
Script to refresh bridge.<bridge>.xs-network-uuids keys, which
can get out-of-sync following a pool join. Running this script
xenserver/etc_xapi.d_plugins_vswitch-cfg-update \
xenserver/etc_xensource_scripts_vif \
xenserver/opt_xensource_libexec_interface-reconfigure \
- xenserver/root_vswitch_scripts_sysconfig.template \
- xenserver/root_vswitch_scripts_dump-vif-details \
- xenserver/root_vswitch_scripts_refresh-xs-network-uuids \
xenserver/usr_lib_xsconsole_plugins-base_XSFeatureVSwitch.py \
xenserver/usr_sbin_brctl \
xenserver/usr_sbin_xen-bugtool \
+ xenserver/usr_share_vswitch_scripts_sysconfig.template \
+ xenserver/usr_share_vswitch_scripts_dump-vif-details \
+ xenserver/usr_share_vswitch_scripts_refresh-xs-network-uuids \
xenserver/vswitch-xen.spec
test -e /etc/sysconfig/vswitch && . /etc/sysconfig/vswitch
# General config variables in /etc/sysconfig/vswitch
-VSWITCH_BASE="${VSWITCH_BASE:-/root/vswitch}"
-ENABLE_BRCOMPAT="${ENABLE_BRCOMPAT:-y}"
-ENABLE_FAKE_PROC_NET="${ENABLE_FAKE_PROC_NET:-y}"
-FORCE_COREFILES="${FORCE_COREFILES:-y}"
+: ${ENABLE_BRCOMPAT:=y}
+: ${ENABLE_FAKE_PROC_NET:=y}
+: ${FORCE_COREFILES:=y}
# Config variables specific to ovs-vswitchd
-VSWITCHD_CONF="${VSWITCHD_CONF:-/etc/ovs-vswitchd.conf}"
-VSWITCHD_PIDFILE="${VSWITCHD_PIDFILE:-/var/run/ovs-vswitchd.pid}"
-VSWITCHD_RUN_DIR="${VSWITCHD_RUN_DIR:-/var/xen/vswitch}"
-VSWITCHD_PRIORITY="${VSWITCHD_PRIORITY:--10}"
-VSWITCHD_LOGFILE="${VSWITCHD_LOGFILE:-/var/log/ovs-vswitchd.log}"
-VSWITCHD_FILE_LOGLEVEL="${VSWITCHD_FILE_LOGLEVEL:-INFO}"
-VSWITCHD_SYSLOG_LOGLEVEL="${VSWITCHD_SYSLOG_LOGLEVEL:-ERR}"
-VSWITCHD_MEMLEAK_LOGFILE="${VSWITCHD_MEMLEAK_LOGFILE:-}"
-VSWITCHD_STRACE_LOG="${VSWITCHD_STRACE_LOG:-}"
-VSWITCHD_STRACE_OPT="${VSWITCHD_STRACE_OPT:-}"
-VSWITCHD_VALGRIND_LOG="${VSWITCHD_VALGRIND_LOG:-}"
-VSWITCHD_VALGRIND_OPT="${VSWITCHD_VALGRIND_OPT:-}"
+: ${VSWITCHD_CONF:=/etc/ovs-vswitchd.conf}
+: ${VSWITCHD_PIDFILE:=/var/run/ovs-vswitchd.pid}
+: ${VSWITCHD_RUN_DIR:=/var/xen/vswitch}
+: ${VSWITCHD_PRIORITY:=-10}
+: ${VSWITCHD_LOGFILE:=/var/log/ovs-vswitchd.log}
+: ${VSWITCHD_FILE_LOGLEVEL:=INFO}
+: ${VSWITCHD_SYSLOG_LOGLEVEL:=ERR}
+: ${VSWITCHD_MEMLEAK_LOGFILE:=}
+: ${VSWITCHD_STRACE_LOG:=}
+: ${VSWITCHD_STRACE_OPT:=}
+: ${VSWITCHD_VALGRIND_LOG:=}
+: ${VSWITCHD_VALGRIND_OPT:=}
# Config variables specific to ovs-brcompatd
-BRCOMPATD_PIDFILE="${BRCOMPATD_PIDFILE:-/var/run/ovs-brcompatd.pid}"
-BRCOMPATD_RUN_DIR="${BRCOMPATD_RUN_DIR:-/var/xen/vswitch}"
-BRCOMPATD_PRIORITY="${BRCOMPATD_PRIORITY:--10}"
-BRCOMPATD_LOGFILE="${BRCOMPATD_LOGFILE:-/var/log/ovs-brcompatd.log}"
-BRCOMPATD_FILE_LOGLEVEL="${BRCOMPATD_FILE_LOGLEVEL:-INFO}"
-BRCOMPATD_SYSLOG_LOGLEVEL="${BRCOMPATD_SYSLOG_LOGLEVEL:-ERR}"
-BRCOMPATD_MEMLEAK_LOGFILE="${BRCOMPATD_MEMLEAK_LOGFILE:-}"
-BRCOMPATD_STRACE_LOG="${BRCOMPATD_STRACE_LOG:-}"
-BRCOMPATD_STRACE_OPT="${BRCOMPATD_STRACE_OPT:-}"
-BRCOMPATD_VALGRIND_LOG="${BRCOMPATD_VALGRIND_LOG:-}"
-BRCOMPATD_VALGRIND_OPT="${BRCOMPATD_VALGRIND_OPT:-}"
-
-
-
+: ${BRCOMPATD_PIDFILE:=/var/run/ovs-brcompatd.pid}
+: ${BRCOMPATD_RUN_DIR:=/var/xen/vswitch}
+: ${BRCOMPATD_PRIORITY:=-10}
+: ${BRCOMPATD_LOGFILE:=/var/log/ovs-brcompatd.log}
+: ${BRCOMPATD_FILE_LOGLEVEL:=INFO}
+: ${BRCOMPATD_SYSLOG_LOGLEVEL:=ERR}
+: ${BRCOMPATD_MEMLEAK_LOGFILE:=}
+: ${BRCOMPATD_STRACE_LOG:=}
+: ${BRCOMPATD_STRACE_OPT:=}
+: ${BRCOMPATD_VALGRIND_LOG:=}
+: ${BRCOMPATD_VALGRIND_OPT:=}
# Full paths to executables & modules
-vswitchd="$VSWITCH_BASE/sbin/ovs-vswitchd"
-brcompatd="$VSWITCH_BASE/sbin/ovs-brcompatd"
-dpctl="$VSWITCH_BASE/bin/ovs-dpctl"
-appctl="$VSWITCH_BASE/bin/ovs-appctl"
-ofctl="$VSWITCH_BASE/bin/ovs-ofctl"
+vswitchd="/usr/sbin/ovs-vswitchd"
+brcompatd="/usr/sbin/ovs-brcompatd"
+dpctl="/usr/bin/ovs-dpctl"
+appctl="/usr/bin/ovs-appctl"
+ofctl="/usr/bin/ovs-ofctl"
if [ "$ENABLE_FAKE_PROC_NET" = "y" ]; then
function insert_modules_if_required {
if ! lsmod | grep -q "openvswitch_mod"; then
action "Inserting llc module" modprobe llc
- action "Inserting openvswitch module" insmod $VSWITCH_BASE/kernel_modules/openvswitch_mod.ko
+ action "Inserting openvswitch module" modprobe openvswitch_mod
fi
if [ -n "$BRCOMPATD_PIDFILE" ] && ! lsmod | grep -q "brcompat_mod"; then
- action "Inserting brcompat module" insmod $VSWITCH_BASE/kernel_modules/brcompat_mod.ko
+ action "Inserting brcompat module" modprobe brcompat_mod
fi
}
function reload_vswitchd {
if [ -f "$VSWITCHD_PIDFILE" ]; then
- "$appctl" \
- --target=ovs-vswitchd.$(cat "$VSWITCHD_PIDFILE").ctl \
- --execute=vswitchd/reload
+ "$appctl" --target=/var/run/ovs-vswitchd.`cat $VSWITCHD_PIDFILE`.ctl vswitchd/reload
fi
}
function reload_brcompatd {
if [ -f "$BRCOMPATD_PIDFILE" ]; then
- "$appctl" \
- --target=ovs-brcompatd.$(cat "$BRCOMPATD_PIDFILE").ctl --reopen
+ "$appctl" --target=/var/run/ovs-brcompatd.`cat $BRCOMPATD_PIDFILE`.ctl vlog/reopen
fi
}
valgrind_opt="valgrind --log-file=$BRCOMPATD_VALGRIND_LOG $BRCOMPATD_VALGRIND_OPT"
daemonize="n"
fi
- appctl_cmd="$appctl -t /var/run/ovs-vswitchd.\`cat $VSWITCHD_PIDFILE\`.ctl -e '%s'"
+ appctl_cmd="$appctl --target=/var/run/ovs-vswitchd.\`cat $VSWITCHD_PIDFILE\`.ctl %s"
if [ "$daemonize" != "y" ]; then
# Start in background and force a "success" message
action "Starting ovs-brcompatd ($strace_opt$valgrind_opt)" true
start_vswitchd
start_brcompatd
reload_vswitchd # ensures ovs-vswitchd has fully read config file.
+ touch /var/lock/subsys/vswitch
}
function stop {
stop_brcompatd
stop_vswitchd
+ rm -f /var/lock/subsys/vswitch
}
function restart {
status -p ovs-brcompatd.pid ovs-brcompatd
;;
version)
- "$VSWITCH_BASE"/sbin/ovs-vswitchd -V
- "$VSWITCH_BASE"/sbin/ovs-brcompatd -V
+ /usr/sbin/ovs-vswitchd -V
+ /usr/sbin/ovs-brcompatd -V
;;
help)
printf "vswitch [start|stop|restart|reload|unload|status|version]\n"
. /etc/init.d/functions
test -e /etc/sysconfig/vswitch && . /etc/sysconfig/vswitch
-VSWITCH_BASE="${VSWITCH_BASE:-/root/vswitch}"
-VSWITCHD_CONF="${VSWITCHD_CONF:-/etc/ovs-vswitchd.conf}"
-VSWITCHD_PIDFILE="${VSWITCHD_PIDFILE:-/var/run/ovs-vswitchd.pid}"
-VSWITCHD_PRIORITY="${VSWITCHD_PRIORITY:--5}"
-VSWITCHD_LOGFILE="${VSWITCHD_LOGFILE:-/var/log/ovs-vswitchd.log}"
-VSWITCHD_FILE_LOGLEVEL="${VSWITCHD_FILE_LOGLEVEL:-}"
-VSWITCHD_SYSLOG_LOGLEVEL="${VSWITCHD_SYSLOG_LOGLEVEL:-WARN}"
-VSWITCHD_MEMLEAK_LOGFILE="${VSWITCHD_MEMLEAK_LOGFILE:-}"
-BRCOMPATD_PIDFILE="${BRCOMPATD_PIDFILE:-/var/run/ovs-brcompatd.pid}"
-BRCOMPATD_PRIORITY="${BRCOMPATD_PRIORITY:--5}"
-BRCOMPATD_LOGFILE="${BRCOMPATD_LOGFILE:-/var/log/ovs-brcompatd.log}"
-BRCOMPATD_FILE_LOGLEVEL="${BRCOMPATD_FILE_LOGLEVEL:-}"
-BRCOMPATD_SYSLOG_LOGLEVEL="${BRCOMPATD_SYSLOG_LOGLEVEL:-WARN}"
-BRCOMPATD_MEMLEAK_LOGFILE="${BRCOMPATD_MEMLEAK_LOGFILE:-}"
+: ${VSWITCHD_CONF:=/etc/ovs-vswitchd.conf}
+: ${VSWITCHD_PIDFILE:=/var/run/ovs-vswitchd.pid}
+: ${VSWITCHD_PRIORITY:=-5}
+: ${VSWITCHD_LOGFILE:=/var/log/ovs-vswitchd.log}
+: ${VSWITCHD_FILE_LOGLEVEL:=}
+: ${VSWITCHD_SYSLOG_LOGLEVEL:=WARN}
+: ${VSWITCHD_MEMLEAK_LOGFILE:=}
+: ${BRCOMPATD_PIDFILE:=/var/run/ovs-brcompatd.pid}
+: ${BRCOMPATD_PRIORITY:=-5}
+: ${BRCOMPATD_LOGFILE:=/var/log/ovs-brcompatd.log}
+: ${BRCOMPATD_FILE_LOGLEVEL:=}
+: ${BRCOMPATD_SYSLOG_LOGLEVEL:=WARN}
+: ${BRCOMPATD_MEMLEAK_LOGFILE:=}
function do_host_call {
xe host-call-plugin host-uuid="$INSTALLATION_UUID" plugin="vswitch-cfg-update" fn="update" >/dev/null
# notice and this notice are preserved. This file is offered as-is,
# without warranty of any kind.
-PATH=/root/vswitch/bin:$PATH
-export PATH
-MANPATH=/root/vswitch/share/man:$MANPATH
-export MANPATH
-
alias vswitch='service vswitch'
function watchconf {
# TBD: - error handling needs to be improved. Currently this can leave
# TBD: the system in a bad state if anything goes wrong.
-import logging
-log = logging.getLogger("vswitch-cfg-update")
-logging.basicConfig(filename="/var/log/vswitch-cfg-update.log", level=logging.DEBUG)
-
import XenAPIPlugin
import XenAPI
import os
import subprocess
-cfg_mod="/root/vswitch/bin/ovs-cfg-mod"
+cfg_mod="/usr/bin/ovs-cfg-mod"
vswitchd_cfg_filename="/etc/ovs-vswitchd.conf"
cacert_filename="/etc/ovs-vswitchd.cacert"
pools = session.xenapi.pool.get_all()
# We assume there is only ever one pool...
if len(pools) == 0:
- log.error("No pool for host.")
raise XenAPIPlugin.Failure("NO_POOL_FOR_HOST", [])
if len(pools) > 1:
- log.error("More than one pool for host.")
raise XenAPIPlugin.Failure("MORE_THAN_ONE_POOL_FOR_HOST", [])
pool = session.xenapi.pool.get_record(pools[0])
try:
controller = ""
currentController = vswitchCurrentController()
if controller == "" and currentController != "":
- log.debug("Removing controller configuration.")
delete_cacert()
removeControllerCfg()
return "Successfully removed controller config"
elif controller != currentController:
- if len(controller) == 0:
- log.debug("Setting controller to: %s" % (controller))
- else:
- log.debug("Changing controller from %s to %s" % (currentController, controller))
delete_cacert()
setControllerCfg(controller)
return "Successfully set controller to " + controller
else:
- log.debug("No change to controller configuration required.")
- return "No change to configuration"
+ return "No change to configuration"
def vswitchCurrentController():
controller = vswitchCfgQuery("mgmt.controller")
if controller == "":
return controller
if len(controller) < 4 or controller[0:4] != "ssl:":
- log.warning("Controller does not specify ssl connection type, returning entire string.")
return controller
else:
return controller[4:]
"--config-file=" + vswitchd_cfg_filename] + action_args
exitcode = subprocess.call(cmd)
if exitcode != 0:
- log.error("ovs-cfg-mod failed with exit code "
- + str(exitcode) + " for " + repr(action_args))
raise XenAPIPlugin.Failure("VSWITCH_CONFIG_MOD_FAILURE",
[ str(exitcode) , str(action_args) ])
vswitchReload()
def vswitchReload():
exitcode = subprocess.call(["/sbin/service", "vswitch", "reload"])
if exitcode != 0:
- log.error("vswitch reload failed with exit code " + str(exitcode))
raise XenAPIPlugin.Failure("VSWITCH_CFG_RELOAD_FAILURE", [ str(exitcode) ])
# Keep other-config/ keys in sync with device.ml:vif_udev_keys
-cfg_mod="/root/vswitch/bin/ovs-cfg-mod"
-dump_vif_details="/root/vswitch/scripts/dump-vif-details"
+cfg_mod="/usr/bin/ovs-cfg-mod"
+vsctl="/usr/bin/ovs-vsctl"
+dump_vif_details="/usr/share/vswitch/scripts/dump-vif-details"
service="/sbin/service"
TYPE=`echo ${XENBUS_PATH} | cut -f 2 -d '/'`
fi
logger -t scripts-vif "Adding ${vif} to ${bridge} with address ${address}"
- vid=
- if [ -e "/var/lib/openvswitch/br-$bridge" ]; then
- . "/var/lib/openvswitch/br-$bridge"
- if [ -n "$VLAN_SLAVE" -a -n "$VLAN_VID" ]; then
- bridge=$VLAN_SLAVE
- vid="--add=vlan.$vif.tag=$VLAN_VID"
- fi
+ local VLAN_ID=$($vsctl br-to-vlan $bridge)
+ local vid=
+ if [ "$VLAN_ID" -ne 0 ] ; then
+ bridge=$($vsctl br-to-parent $bridge)
+ vid="--add=vlan.${vif}.tag=${VLAN_ID}"
fi
${IP} link set "${vif}" down || logger -t scripts-vif "Failed to ip link set ${vif} down"
for c in commands:
log(" %s" % c)
- rc = run_command(['/root/vswitch/bin/ovs-cfg-mod', '-vANY:console:emer',
+ rc = run_command(['/usr/bin/ovs-cfg-mod', '-vANY:console:emer',
'-F', '/etc/ovs-vswitchd.conf']
+ [c for c in commands if c[0] != '#'] + ['-c'])
if not rc:
cfgmod_argv += ['--add=vlan.%s.tag=%s' % (ipdev, pifrec['VLAN'])]
cfgmod_argv += ['--add=iface.%s.internal=true' % (ipdev)]
cfgmod_argv += ['--add=iface.%s.fake-bridge=true' % (ipdev)]
- if not os.path.exists(vswitch_state_dir):
- os.mkdir(vswitch_state_dir)
- br = ConfigurationFile("br-%s" % ipdev, vswitch_state_dir)
- br.write("VLAN_SLAVE=%s\n" % bridge)
- br.write("VLAN_VID=%s\n" % pifrec['VLAN'])
- br.close()
- f.attach_child(br)
- else:
- br = ConfigurationFile("br-%s" % ipdev, vswitch_state_dir)
- br.unlink()
- f.attach_child(br)
-
+
# Apply updated configuration.
try:
f.apply()
+++ /dev/null
-#!/usr/bin/python
-#
-# Script to retrieve extended information about VIFs that are
-# needed by the controller. This is called by the "vif" script,
-# which is run when virtual interfaces are added and removed.
-
-# Copyright (C) 2009 Nicira Networks, Inc.
-#
-# Copying and distribution of this file, with or without modification,
-# are permitted in any medium without royalty provided the copyright
-# notice and this notice are preserved. This file is offered as-is,
-# without warranty of any kind.
-
-import sys
-import XenAPI
-import xen.lowlevel.xs
-
-# Query XenStore for the opaque reference of this vif
-def get_vif_ref(domid, devid):
- xenstore = xen.lowlevel.xs.xs()
- t = xenstore.transaction_start()
- vif_ref = xenstore.read(t, '/xapi/%s/private/vif/%s/ref' % (domid, devid))
- xenstore.transaction_end(t)
- return vif_ref
-
-# Query XAPI for the information we need using the vif's opaque reference
-def dump_vif_info(domid, devid, vif_ref):
- vif_info = []
- session = XenAPI.xapi_local()
- session.xenapi.login_with_password("root", "")
- try:
- vif_rec = session.xenapi.VIF.get_record(vif_ref)
- net_rec = session.xenapi.network.get_record(vif_rec["network"])
- vm_uuid = session.xenapi.VM.get_uuid(vif_rec["VM"])
-
- # Data to allow vNetManager to associate VIFs with xapi data
- add_port = '--add=port.vif%s.%s' % (domid, devid)
- vif_info.append('%s.net-uuid=%s' % (add_port, net_rec["uuid"]))
- vif_info.append('%s.vif-mac=%s' % (add_port, vif_rec["MAC"]))
- vif_info.append('%s.vif-uuid=%s' % (add_port, vif_rec["uuid"]))
- vif_info.append('%s.vm-uuid=%s' % (add_port, vm_uuid))
-
- # vNetManager needs to know the network UUID(s) associated with
- # each datapath. Normally interface-reconfigure adds them, but
- # interface-reconfigure never gets called for internal networks
- # (xapi does the addbr ioctl internally), so we have to do it
- # here instead for internal networks. This is only acceptable
- # because xapi is lazy about creating internal networks: it
- # only creates one just before it adds the first vif to it.
- # There may still be a brief delay between the initial
- # ovs-vswitchd connection to vNetManager and setting this
- # configuration variable, but vNetManager can tolerate that.
- if not net_rec['PIFs']:
- key = 'bridge.%s.xs-network-uuids' % net_rec['bridge']
- value = net_rec['uuid']
- vif_info.append('--del-match=%s=*' % key)
- vif_info.append('--add=%s=%s' % (key, value))
- finally:
- session.xenapi.session.logout()
- print ' '.join(vif_info)
-
-if __name__ == '__main__':
- if len(sys.argv) != 3:
- sys.stderr.write("ERROR: %s <domid> <devid>\n" % sys.argv[0])
- sys.exit(1)
-
- domid = sys.argv[1]
- devid = sys.argv[2]
-
- vif_ref = get_vif_ref(domid, devid)
- if not vif_ref:
- sys.stderr.write("ERROR: Could not find interface vif%s.%s\n"
- % (domid, devid))
- sys.exit(1)
-
- dump_vif_info(domid, devid, vif_ref)
- sys.exit(0)
+++ /dev/null
-#! /bin/sh
-
-. /etc/xensource-inventory
-
-for pif in $(xe pif-list --minimal host-uuid=${INSTALLATION_UUID} currently-attached=true VLAN=-1 | sed 's/,/ /g'); do
- printf "Refreshing PIF %s... " $pif
- if /opt/xensource/libexec/interface-reconfigure --pif-uuid=$pif up; then
- printf "done\n"
- else
- printf "error!\n"
- fi
-done
+++ /dev/null
-### Configuration options for vswitch
-
-# Copyright (C) 2009 Nicira Networks, Inc.
-#
-# Copying and distribution of this file, with or without modification,
-# are permitted in any medium without royalty provided the copyright
-# notice and this notice are preserved. This file is offered as-is,
-# without warranty of any kind.
-
-# VSWITCH_BASE: Root directory where vswitch binaries are installed
-# VSWITCH_BASE=/root/vswitch/openvswitch/build
-
-# ENABLE_BRCOMPAT: If 'y' than emulate linux bridging interfaces
-# using the brcompat kernel module and ovs-brcompatd daemon
-# ENABLE_BRCOMPAT=y
-
-# ENABLE_FAKE_PROC_NET: If 'y' then emulate linux bonding and vlan
-# files in /proc as if the bonding and vlan demultiplexing done in
-# ovs-vswitchd were being implemented using existing Linux mechanisms.
-# This is useful in some cases when replacing existing solutions.
-# ENABLE_FAKE_PROC_NET=y
-
-# FORCE_COREFILES: If 'y' then core files will be enabled.
-# FORCE_COREFILES=y
-
-# COREFILE_PATTERN: Pattern used to determine path and filename for
-# core files when FORCE_COREFILES is 'y'. This is Linux specific.
-# See the manpage for "core".
-# COREFILE_PATTERN="/var/log/%e-%t"
-
-# VSWITCHD_CONF: File in which ovs-vswitchd stores its configuration.
-# VSWITCHD_CONF=/etc/ovs-vswitchd.conf
-
-# VSWITCHD_PIDFILE: File in which to store the pid of the running
-# ovs-vswitchd.
-# VSWITCHD_PIDFILE=/var/run/ovs-vswitchd.pid
-
-# VSWITCHD_RUN_DIR: Set the directory in which ovs-vswitchd should be
-# run. This mainly affects where core files will be placed.
-# VSWITCHD_RUN_DIR=/var/xen/vswitch
-
-# VSWITCHD_PRIORITY: "nice" priority at which to run ovs-vswitchd and related
-# processes.
-# VSWITCHD_PRIORITY=-10
-
-# VSWITCHD_LOGFILE: File to send the FILE_LOGLEVEL log messages to.
-# VSWITCHD_LOGFILE=/var/log/ovs-vswitchd.log
-
-# VSWITCHD_MEMLEAK_LOGFILE: File for logging memory leak data.
-# Enabling this option will slow ovs-vswitchd significantly. Do not
-# enable it except to debug a suspected memory leak. Use the
-# ovs-parse-leaks utility included with Open vSwitch to parse the
-# log file. For best results, you also need debug symbols.
-# VSWITCHD_MEMLEAK_LOGFILE=""
-
-# VSWITCHD_FILE_LOGLEVEL: Log level at which to log into the
-# VSWITCHD_LOG file. If this is null or not set the logfile will
-# not be created and nothing will be sent to it. This is the
-# default. The available options are: EMER, WARN, INFO and DBG.
-# VSWITCHD_FILE_LOGLEVEL=""
-
-# VSWITCHD_SYSLOG_LOGLEVEL: Log level at which to log into syslog. If
-# this is null or not set the default is to log to syslog
-# emergency and warning level messages only.
-# VSWITCHD_SYSLOG_LOGLEVEL="WARN"
-
-# VSWITCHD_STRACE_LOG: File for logging strace output.
-# If this is set to a nonempty string, then ovs-vswitchd will run
-# under strace, whose output will be logged to the specified file.
-# Enabling this option will slow ovs-vswitchd significantly.
-# VSWITCHD_STRACE_LOG and VSWITCHD_VALGRIND_LOG are mutually exclusive.
-# VSWITCHD_STRACE_LOG=""
-
-# VSWITCHD_STRACE_OPT: Options to pass to strace.
-# This option's value is honored only when VSWITCHD_STRACE_LOG is
-# set to a nonempty string.
-# VSWITCHD_STRACE_OPT=""
-
-# VSWITCHD_VALGRIND_LOG: File for logging valgrind output.
-# If this is set to a nonempty string, then ovs-vswitchd will run
-# under valgrind, whose output will be logged to the specified file.
-# Enabling this option will slow ovs-vswitchd by 100X or more.
-# valgrind is not installed by default on XenServer systems; you must
-# install it by hand to usefully enable this option.
-# VSWITCHD_STRACE_LOG and VSWITCHD_VALGRIND_LOG are mutually exclusive.
-# VSWITCHD_VALGRIND_LOG=""
-
-# VSWITCHD_VALGRIND_OPT: Options to pass to valgrind.
-# This option's value is honored only when VSWITCHD_VALGRIND_LOG is
-# set to a nonempty string.
-# VSWITCHD_VALGRIND_OPT=""
-
-# BRCOMPATD_PIDFILE: File in which to store the pid of the running
-# ovs-brcompatd (the Linux bridge compatibility daemon for ovs-vswitchd).
-# If this is the empty string, ovs-brcompatd will not be started and
-# the brcompat_mod kernel module will not be inserted. Note that
-# the default is to use brcompat!
-# BRCOMPATD_PIDFILE=/var/run/ovs-brcompatd.pid
-
-# BRCOMPATD_RUN_DIR: Set the directory in which ovs-brcompatd should be
-# run. This mainly affects where core files will be placed.
-# BRCOMPATD_RUN_DIR=/var/xen/vswitch
-
-# BRCOMPATD_PRIORITY: "nice" priority at which to run ovs-vswitchd and related
-# processes.
-# BRCOMPATD_PRIORITY=-10
-
-# BRCOMPATD_LOGFILE: File to send the FILE_LOGLEVEL log messages to.
-# BRCOMPATD_LOGFILE=/var/log/ovs-brcompatd.log
-
-# BRCOMPATD_FILE_LOGLEVEL: Log level at which to log into the
-# BRCOMPATD_LOG file. If this is null or not set the logfile will
-# not be created and nothing will be sent to it. This is the
-# default. The available options are: EMER, WARN, INFO and DBG.
-# BRCOMPATD_FILE_LOGLEVEL=""
-
-# BRCOMPATD_SYSLOG_LOGLEVEL: Log level at which to log into syslog. If
-# this is null or not set the default is to log to syslog
-# emergency and warning level messages only.
-# BRCOMPATD_SYSLOG_LOGLEVEL="WARN"
-
-# BRCOMPATD_MEMLEAK_LOGFILE: File for logging memory leak data.
-# Enabling this option will slow ovs-brcompatd significantly. Do not
-# enable it except to debug a suspected memory leak. Use the
-# ovs-parse-leaks utility included with Open vSwitch to parse the
-# log file. For best results, you also need debug symbols.
-# BRCOMPATD_MEMLEAK_LOGFILE=""
-
-# BRCOMPATD_STRACE_LOG: File for logging strace output.
-# If this is set to a nonempty string, then ovs-brcompatd will run
-# under strace, whose output will be logged to the specified file.
-# Enabling this option will slow brcompatd significantly.
-# BRCOMPATD_STRACE_LOG and BRCOMPATD_VALGRIND_LOG are mutually exclusive.
-# BRCOMPATD_STRACE_LOG=""
-
-# BRCOMPATD_STRACE_OPT: Options to pass to strace.
-# This option's value is honored only when BRCOMPATD_STRACE_LOG is
-# set to a nonempty string.
-# BRCOMPATD_STRACE_OPT=""
-
-# BRCOMPATD_VALGRIND_LOG: File for logging valgrind output.
-# If this is set to a nonempty string, then ovs-brcompatd will run
-# under valgrind, whose output will be logged to the specified file.
-# Enabling this option will slow brcompatd by 100X or more.
-# valgrind is not installed by default on XenServer systems; you must
-# install it by hand to usefully enable this option.
-# BRCOMPATD_STRACE_LOG and BRCOMPATD_VALGRIND_LOG are mutually exclusive.
-# BRCOMPATD_VALGRIND_LOG=""
-
-# BRCOMPATD_VALGRIND_OPT: Options to pass to valgrind.
-# This option's value is honored only when BRCOMPATD_VALGRIND_LOG is
-# set to a nonempty string.
-# BRCOMPATD_VALGRIND_OPT=""
# Copyright (c) 2009 Nicira Networks.
-import logging
-log = logging.getLogger("vswitch-cfg-update")
-logging.basicConfig(filename="/var/log/vswitch-xsplugin.log", level=logging.DEBUG)
+from XSConsoleLog import *
import os
import socket
import subprocess
-cfg_mod="/root/vswitch/bin/ovs-cfg-mod"
+cfg_mod="/usr/bin/ovs-cfg-mod"
vswitchd_cfg_filename="/etc/ovs-vswitchd.conf"
if __name__ == "__main__":
try:
output = ShellPipe(["service", self.name, "version"]).Stdout()
except StandardError, e:
- log.error("version retrieval error: " + str(e))
+ XSLogError("vswitch version retrieval error: " + str(e))
return "<unknown>"
for line in output:
if self.processname in line:
try:
output = ShellPipe(["service", self.name, "status"]).Stdout()
except StandardError, e:
- log.error("status retrieval error: " + str(e))
+ XSLogError("vswitch status retrieval error: " + str(e))
return "<unknown>"
if len(output) == 0:
return "<unknown>"
try:
ShellPipe(["service", self.name, "restart"]).Call()
except StandardError, e:
- log.error("restart error: " + str(e))
+ XSLogError("vswitch restart error: " + str(e))
@classmethod
def Inst(cls, name, processname=None):
output = ShellPipe([cfg_mod, "-vANY:console:emer", "-F",
vswitchd_cfg_filename, "-q", key]).Stdout()
except StandardError, e:
- log.error("config retrieval error: " + str(e))
+ XSLogError("config retrieval error: " + str(e))
return "<unknown>"
if len(output) == 0:
argv0 = sys.argv[0]
-BRCTL = "/root/vswitch/xs-original/brctl"
+BRCTL = "/usr/lib/vswitch/xs-original/brctl"
VSWITCHD_CONF = "/etc/ovs-vswitchd.conf"
# Execute the real brctl program, passing the same arguments that were passed
VNCTERM_CORE_DIR = '/var/xen/vncterm'
VSWITCH_CORE_DIR = '/var/xen/vswitch'
OVS_VSWITCH_CONF = '/etc/ovs-vswitchd.conf'
-OVS_VSWITCH_DBCACHE = '/etc/ovs-vswitch.dbcache'
+OVS_VSWITCH_DBCACHE = '/var/lib/openvswitch/dbcache'
XENSOURCE_INVENTORY = '/etc/xensource-inventory'
OEM_CONFIG_DIR = '/var/xsconfig'
OEM_CONFIG_FILES_RE = re.compile(r'^.*xensource-inventory$')
MULTIPATHD = '/sbin/multipathd'
NETSTAT = '/bin/netstat'
OMREPORT = '/opt/dell/srvadmin/oma/bin/omreport'
-OVS_DPCTL = '/root/vswitch/bin/ovs-dpctl'
-OVS_OFCTL = '/root/vswitch/bin/ovs-ofctl'
+OVS_DPCTL = '/usr/bin/ovs-dpctl'
+OVS_OFCTL = '/usr/bin/ovs-ofctl'
PS = '/bin/ps'
PVS = '/usr/sbin/pvs'
ROUTE = '/sbin/route'
--- /dev/null
+#!/usr/bin/python
+#
+# Script to retrieve extended information about VIFs that are
+# needed by the controller. This is called by the "vif" script,
+# which is run when virtual interfaces are added and removed.
+
+# Copyright (C) 2009 Nicira Networks, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved. This file is offered as-is,
+# without warranty of any kind.
+
+import sys
+import XenAPI
+import xen.lowlevel.xs
+
+# Query XenStore for the opaque reference of this vif
+def get_vif_ref(domid, devid):
+ xenstore = xen.lowlevel.xs.xs()
+ t = xenstore.transaction_start()
+ vif_ref = xenstore.read(t, '/xapi/%s/private/vif/%s/ref' % (domid, devid))
+ xenstore.transaction_end(t)
+ return vif_ref
+
+# Query XAPI for the information we need using the vif's opaque reference
+def dump_vif_info(domid, devid, vif_ref):
+ vif_info = []
+ session = XenAPI.xapi_local()
+ session.xenapi.login_with_password("root", "")
+ try:
+ vif_rec = session.xenapi.VIF.get_record(vif_ref)
+ net_rec = session.xenapi.network.get_record(vif_rec["network"])
+ vm_uuid = session.xenapi.VM.get_uuid(vif_rec["VM"])
+
+ # Data to allow vNetManager to associate VIFs with xapi data
+ add_port = '--add=port.vif%s.%s' % (domid, devid)
+ vif_info.append('%s.net-uuid=%s' % (add_port, net_rec["uuid"]))
+ vif_info.append('%s.vif-mac=%s' % (add_port, vif_rec["MAC"]))
+ vif_info.append('%s.vif-uuid=%s' % (add_port, vif_rec["uuid"]))
+ vif_info.append('%s.vm-uuid=%s' % (add_port, vm_uuid))
+
+ # vNetManager needs to know the network UUID(s) associated with
+ # each datapath. Normally interface-reconfigure adds them, but
+ # interface-reconfigure never gets called for internal networks
+ # (xapi does the addbr ioctl internally), so we have to do it
+ # here instead for internal networks. This is only acceptable
+ # because xapi is lazy about creating internal networks: it
+ # only creates one just before it adds the first vif to it.
+ # There may still be a brief delay between the initial
+ # ovs-vswitchd connection to vNetManager and setting this
+ # configuration variable, but vNetManager can tolerate that.
+ if not net_rec['PIFs']:
+ key = 'bridge.%s.xs-network-uuids' % net_rec['bridge']
+ value = net_rec['uuid']
+ vif_info.append('--del-match=%s=*' % key)
+ vif_info.append('--add=%s=%s' % (key, value))
+ finally:
+ session.xenapi.session.logout()
+ print ' '.join(vif_info)
+
+if __name__ == '__main__':
+ if len(sys.argv) != 3:
+ sys.stderr.write("ERROR: %s <domid> <devid>\n" % sys.argv[0])
+ sys.exit(1)
+
+ domid = sys.argv[1]
+ devid = sys.argv[2]
+
+ vif_ref = get_vif_ref(domid, devid)
+ if not vif_ref:
+ sys.stderr.write("ERROR: Could not find interface vif%s.%s\n"
+ % (domid, devid))
+ sys.exit(1)
+
+ dump_vif_info(domid, devid, vif_ref)
+ sys.exit(0)
--- /dev/null
+#! /bin/sh
+
+. /etc/xensource-inventory
+
+for pif in $(xe pif-list --minimal host-uuid=${INSTALLATION_UUID} currently-attached=true VLAN=-1 | sed 's/,/ /g'); do
+ printf "Refreshing PIF %s... " $pif
+ if /opt/xensource/libexec/interface-reconfigure --pif-uuid=$pif up; then
+ printf "done\n"
+ else
+ printf "error!\n"
+ fi
+done
--- /dev/null
+### Configuration options for vswitch
+
+# Copyright (C) 2009 Nicira Networks, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved. This file is offered as-is,
+# without warranty of any kind.
+
+# ENABLE_BRCOMPAT: If 'y' than emulate linux bridging interfaces
+# using the brcompat kernel module and ovs-brcompatd daemon
+# ENABLE_BRCOMPAT=y
+
+# ENABLE_FAKE_PROC_NET: If 'y' then emulate linux bonding and vlan
+# files in /proc as if the bonding and vlan demultiplexing done in
+# ovs-vswitchd were being implemented using existing Linux mechanisms.
+# This is useful in some cases when replacing existing solutions.
+# ENABLE_FAKE_PROC_NET=y
+
+# FORCE_COREFILES: If 'y' then core files will be enabled.
+# FORCE_COREFILES=y
+
+# COREFILE_PATTERN: Pattern used to determine path and filename for
+# core files when FORCE_COREFILES is 'y'. This is Linux specific.
+# See the manpage for "core".
+# COREFILE_PATTERN="/var/log/%e-%t"
+
+# VSWITCHD_CONF: File in which ovs-vswitchd stores its configuration.
+# VSWITCHD_CONF=/etc/ovs-vswitchd.conf
+
+# VSWITCHD_PIDFILE: File in which to store the pid of the running
+# ovs-vswitchd.
+# VSWITCHD_PIDFILE=/var/run/ovs-vswitchd.pid
+
+# VSWITCHD_RUN_DIR: Set the directory in which ovs-vswitchd should be
+# run. This mainly affects where core files will be placed.
+# VSWITCHD_RUN_DIR=/var/xen/vswitch
+
+# VSWITCHD_PRIORITY: "nice" priority at which to run ovs-vswitchd and related
+# processes.
+# VSWITCHD_PRIORITY=-10
+
+# VSWITCHD_LOGFILE: File to send the FILE_LOGLEVEL log messages to.
+# VSWITCHD_LOGFILE=/var/log/ovs-vswitchd.log
+
+# VSWITCHD_MEMLEAK_LOGFILE: File for logging memory leak data.
+# Enabling this option will slow ovs-vswitchd significantly. Do not
+# enable it except to debug a suspected memory leak. Use the
+# ovs-parse-leaks utility included with Open vSwitch to parse the
+# log file. For best results, you also need debug symbols.
+# VSWITCHD_MEMLEAK_LOGFILE=""
+
+# VSWITCHD_FILE_LOGLEVEL: Log level at which to log into the
+# VSWITCHD_LOG file. If this is null or not set the logfile will
+# not be created and nothing will be sent to it. This is the
+# default. The available options are: EMER, WARN, INFO and DBG.
+# VSWITCHD_FILE_LOGLEVEL=""
+
+# VSWITCHD_SYSLOG_LOGLEVEL: Log level at which to log into syslog. If
+# this is null or not set the default is to log to syslog
+# emergency and warning level messages only.
+# VSWITCHD_SYSLOG_LOGLEVEL="WARN"
+
+# VSWITCHD_STRACE_LOG: File for logging strace output.
+# If this is set to a nonempty string, then ovs-vswitchd will run
+# under strace, whose output will be logged to the specified file.
+# Enabling this option will slow ovs-vswitchd significantly.
+# VSWITCHD_STRACE_LOG and VSWITCHD_VALGRIND_LOG are mutually exclusive.
+# VSWITCHD_STRACE_LOG=""
+
+# VSWITCHD_STRACE_OPT: Options to pass to strace.
+# This option's value is honored only when VSWITCHD_STRACE_LOG is
+# set to a nonempty string.
+# VSWITCHD_STRACE_OPT=""
+
+# VSWITCHD_VALGRIND_LOG: File for logging valgrind output.
+# If this is set to a nonempty string, then ovs-vswitchd will run
+# under valgrind, whose output will be logged to the specified file.
+# Enabling this option will slow ovs-vswitchd by 100X or more.
+# valgrind is not installed by default on XenServer systems; you must
+# install it by hand to usefully enable this option.
+# VSWITCHD_STRACE_LOG and VSWITCHD_VALGRIND_LOG are mutually exclusive.
+# VSWITCHD_VALGRIND_LOG=""
+
+# VSWITCHD_VALGRIND_OPT: Options to pass to valgrind.
+# This option's value is honored only when VSWITCHD_VALGRIND_LOG is
+# set to a nonempty string.
+# VSWITCHD_VALGRIND_OPT=""
+
+# BRCOMPATD_PIDFILE: File in which to store the pid of the running
+# ovs-brcompatd (the Linux bridge compatibility daemon for ovs-vswitchd).
+# If this is the empty string, ovs-brcompatd will not be started and
+# the brcompat_mod kernel module will not be inserted. Note that
+# the default is to use brcompat!
+# BRCOMPATD_PIDFILE=/var/run/ovs-brcompatd.pid
+
+# BRCOMPATD_RUN_DIR: Set the directory in which ovs-brcompatd should be
+# run. This mainly affects where core files will be placed.
+# BRCOMPATD_RUN_DIR=/var/xen/vswitch
+
+# BRCOMPATD_PRIORITY: "nice" priority at which to run ovs-vswitchd and related
+# processes.
+# BRCOMPATD_PRIORITY=-10
+
+# BRCOMPATD_LOGFILE: File to send the FILE_LOGLEVEL log messages to.
+# BRCOMPATD_LOGFILE=/var/log/ovs-brcompatd.log
+
+# BRCOMPATD_FILE_LOGLEVEL: Log level at which to log into the
+# BRCOMPATD_LOG file. If this is null or not set the logfile will
+# not be created and nothing will be sent to it. This is the
+# default. The available options are: EMER, WARN, INFO and DBG.
+# BRCOMPATD_FILE_LOGLEVEL=""
+
+# BRCOMPATD_SYSLOG_LOGLEVEL: Log level at which to log into syslog. If
+# this is null or not set the default is to log to syslog
+# emergency and warning level messages only.
+# BRCOMPATD_SYSLOG_LOGLEVEL="WARN"
+
+# BRCOMPATD_MEMLEAK_LOGFILE: File for logging memory leak data.
+# Enabling this option will slow ovs-brcompatd significantly. Do not
+# enable it except to debug a suspected memory leak. Use the
+# ovs-parse-leaks utility included with Open vSwitch to parse the
+# log file. For best results, you also need debug symbols.
+# BRCOMPATD_MEMLEAK_LOGFILE=""
+
+# BRCOMPATD_STRACE_LOG: File for logging strace output.
+# If this is set to a nonempty string, then ovs-brcompatd will run
+# under strace, whose output will be logged to the specified file.
+# Enabling this option will slow brcompatd significantly.
+# BRCOMPATD_STRACE_LOG and BRCOMPATD_VALGRIND_LOG are mutually exclusive.
+# BRCOMPATD_STRACE_LOG=""
+
+# BRCOMPATD_STRACE_OPT: Options to pass to strace.
+# This option's value is honored only when BRCOMPATD_STRACE_LOG is
+# set to a nonempty string.
+# BRCOMPATD_STRACE_OPT=""
+
+# BRCOMPATD_VALGRIND_LOG: File for logging valgrind output.
+# If this is set to a nonempty string, then ovs-brcompatd will run
+# under valgrind, whose output will be logged to the specified file.
+# Enabling this option will slow brcompatd by 100X or more.
+# valgrind is not installed by default on XenServer systems; you must
+# install it by hand to usefully enable this option.
+# BRCOMPATD_STRACE_LOG and BRCOMPATD_VALGRIND_LOG are mutually exclusive.
+# BRCOMPATD_VALGRIND_LOG=""
+
+# BRCOMPATD_VALGRIND_OPT: Options to pass to valgrind.
+# This option's value is honored only when BRCOMPATD_VALGRIND_LOG is
+# set to a nonempty string.
+# BRCOMPATD_VALGRIND_OPT=""
# rpmbuild -D "vswitch_version 0.8.9~1+build123" -D "xen_version 2.6.18-128.1.1.el5.xs5.1.0.483.1000xen" -D "build_number --with-build-number=123" -bb /usr/src/redhat/SPECS/vswitch-xen.spec
#
%define version %{vswitch_version}-%{xen_version}
-%define _prefix /root/vswitch
Name: vswitch
Summary: Virtual switch
Source: openvswitch-%{vswitch_version}.tar.gz
Buildroot: /tmp/vswitch-xen-rpm
Requires: kernel-xen = %(echo '%{xen_version}' | sed 's/xen$//')
+# The following Conflicts prevents the "vswitch" package generated by
+# this spec file from installing at the same time as the "openvswitch"
+# package shipped with XenServer 5.5.900. In fact, the packages
+# contain some files with identical names anyhow, so they will not
+# coexist, but adding an explicit Conflicts makes this conflict more
+# obvious.
+Conflicts: openvswitch
%description
The vswitch provides standard network bridging functions augmented with
%setup -q -n openvswitch-%{vswitch_version}
%build
-./configure --prefix=%{_prefix} --localstatedir=%{_localstatedir} --with-l26=/lib/modules/%{xen_version}/build --enable-ssl %{build_number}
+./configure --prefix=/usr --sysconfdir=/etc --localstatedir=%{_localstatedir} --with-l26=/lib/modules/%{xen_version}/build --enable-ssl %{build_number}
make %{_smp_mflags}
%install
rm -rf $RPM_BUILD_ROOT
-make install DESTDIR=$RPM_BUILD_ROOT prefix=%{_prefix}
+make install DESTDIR=$RPM_BUILD_ROOT
install -d -m 755 $RPM_BUILD_ROOT/etc
install -d -m 755 $RPM_BUILD_ROOT/etc/init.d
install -m 755 xenserver/etc_init.d_vswitch \
install -d -m 755 $RPM_BUILD_ROOT/etc/xapi.d/plugins
install -m 755 xenserver/etc_xapi.d_plugins_vswitch-cfg-update \
$RPM_BUILD_ROOT/etc/xapi.d/plugins/vswitch-cfg-update
-install -d -m 755 $RPM_BUILD_ROOT%{_prefix}/scripts
+install -d -m 755 $RPM_BUILD_ROOT/usr/share/vswitch/scripts
install -m 755 xenserver/opt_xensource_libexec_interface-reconfigure \
- $RPM_BUILD_ROOT%{_prefix}/scripts/interface-reconfigure
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/interface-reconfigure
install -m 755 xenserver/etc_xensource_scripts_vif \
- $RPM_BUILD_ROOT%{_prefix}/scripts/vif
-install -m 755 xenserver/root_vswitch_scripts_dump-vif-details \
- $RPM_BUILD_ROOT%{_prefix}/scripts/dump-vif-details
-install -m 755 xenserver/root_vswitch_scripts_refresh-xs-network-uuids \
- $RPM_BUILD_ROOT%{_prefix}/scripts/refresh-xs-network-uuids
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/vif
+install -m 755 xenserver/usr_share_vswitch_scripts_dump-vif-details \
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/dump-vif-details
+install -m 755 xenserver/usr_share_vswitch_scripts_refresh-xs-network-uuids \
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/refresh-xs-network-uuids
install -m 755 xenserver/usr_sbin_xen-bugtool \
- $RPM_BUILD_ROOT%{_prefix}/scripts/xen-bugtool
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/xen-bugtool
install -m 755 xenserver/usr_sbin_brctl \
- $RPM_BUILD_ROOT%{_prefix}/scripts/brctl
-install -m 755 xenserver/root_vswitch_scripts_sysconfig.template \
- $RPM_BUILD_ROOT/root/vswitch/scripts/sysconfig.template
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/brctl
+install -m 755 xenserver/usr_share_vswitch_scripts_sysconfig.template \
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/sysconfig.template
install -m 644 \
xenserver/usr_lib_xsconsole_plugins-base_XSFeatureVSwitch.py \
- $RPM_BUILD_ROOT%{_prefix}/scripts/XSFeatureVSwitch.py
+ $RPM_BUILD_ROOT/usr/share/vswitch/scripts/XSFeatureVSwitch.py
-install -d -m 755 $RPM_BUILD_ROOT%{_prefix}/kernel_modules
-find datapath/linux-2.6 -name *.ko -exec install -m 755 \{\} $RPM_BUILD_ROOT%{_prefix}/kernel_modules/ \;
+install -d -m 755 $RPM_BUILD_ROOT/lib/modules/%{xen_version}/kernel/net/vswitch
+find datapath/linux-2.6 -name *.ko -exec install -m 755 \{\} $RPM_BUILD_ROOT/lib/modules/%{xen_version}/kernel/net/vswitch \;
# Get rid of stuff we don't want to make RPM happy.
-rm -rf \
- $RPM_BUILD_ROOT/root/vswitch/bin/ezio-term \
- $RPM_BUILD_ROOT/root/vswitch/bin/ovs-controller \
- $RPM_BUILD_ROOT/root/vswitch/bin/ovs-discover \
- $RPM_BUILD_ROOT/root/vswitch/bin/ovs-kill \
- $RPM_BUILD_ROOT/root/vswitch/bin/ovs-pki \
- $RPM_BUILD_ROOT/root/vswitch/bin/ovs-switchui \
- $RPM_BUILD_ROOT/root/vswitch/bin/ovs-wdt \
- $RPM_BUILD_ROOT/root/vswitch/bin/secchan \
- $RPM_BUILD_ROOT/root/vswitch/sbin/ovs-monitor \
- $RPM_BUILD_ROOT/root/vswitch/share/man/man8/ovs-controller.8 \
- $RPM_BUILD_ROOT/root/vswitch/share/man/man8/ovs-discover.8 \
- $RPM_BUILD_ROOT/root/vswitch/share/man/man8/ovs-kill.8 \
- $RPM_BUILD_ROOT/root/vswitch/share/man/man8/ovs-pki.8 \
- $RPM_BUILD_ROOT/root/vswitch/share/man/man8/secchan.8 \
- $RPM_BUILD_ROOT/root/vswitch/share/openvswitch
+rm \
+ $RPM_BUILD_ROOT/usr/bin/ovs-controller \
+ $RPM_BUILD_ROOT/usr/bin/ovs-discover \
+ $RPM_BUILD_ROOT/usr/bin/ovs-kill \
+ $RPM_BUILD_ROOT/usr/bin/ovs-openflowd \
+ $RPM_BUILD_ROOT/usr/bin/ovs-pki \
+ $RPM_BUILD_ROOT/usr/bin/ovs-wdt \
+ $RPM_BUILD_ROOT/usr/sbin/ovs-monitor \
+ $RPM_BUILD_ROOT/usr/share/man/man8/ovs-controller.8 \
+ $RPM_BUILD_ROOT/usr/share/man/man8/ovs-discover.8 \
+ $RPM_BUILD_ROOT/usr/share/man/man8/ovs-kill.8 \
+ $RPM_BUILD_ROOT/usr/share/man/man8/ovs-openflowd.8 \
+ $RPM_BUILD_ROOT/usr/share/man/man8/ovs-pki.8
+rm -f $RPM_BUILD_ROOT/lib/modules/%{xen_version}/kernel/net/vswitch/veth_mod.ko
+rm -r \
+ $RPM_BUILD_ROOT/usr/share/openvswitch/commands
+
+install -d -m 755 $RPM_BUILD_ROOT/var/lib/openvswitch
%clean
rm -rf $RPM_BUILD_ROOT
fi
if [ "$1" = "1" ]; then
- if ! md5sum -c --status <<EOF
+ if md5sum -c --status <<EOF
+ca141d60061dcfdade73e75abc6529b5 /usr/sbin/brctl
b8e9835862ef1a9cec2a3f477d26c989 /etc/xensource/scripts/vif
51970ad613a3996d5997e18e44db47da /opt/xensource/libexec/interface-reconfigure
5654c8c36699fcc8744ca9cd5b855414 /usr/sbin/xen-bugtool
EOF
then
- printf "\nThe original XenServer scripts replaced by this package\n"
- printf "are different than expected. This could lead to unexpected\n"
- printf "behavior of your server. Unless you are sure you know what\n"
- printf "you are doing, it is highly recommended that you remove this\n"
- printf "package immediately after the install completes, which\n"
- printf "will restore the XenServer scripts that you were previously\n"
- printf "using.\n\n"
- fi
- if test "`/usr/sbin/brctl --version`" != "bridge-utils, 1.1"; then
+ printf "\nVerified host scripts from XenServer 5.5.0.\n\n"
+ elif md5sum -c --status <<EOF
+ca141d60061dcfdade73e75abc6529b5 /usr/sbin/brctl
+b8e9835862ef1a9cec2a3f477d26c989 /etc/xensource/scripts/vif
+ce451d3c985fd1db6497a363f0d9dedb /opt/xensource/libexec/interface-reconfigure
+2b53f500431fcba5276c896e9e4281b9 /usr/sbin/xen-bugtool
+EOF
+ then
+ printf "\nVerified host scripts from XenServer 5.5.900.\n\n"
+ else
cat <<EOF
-/usr/sbin/brctl replaced by this package reports the following version:
-
-`/usr/sbin/brctl --version`
-
-The expected version was:
-
-bridge-utils, 1.1
-
-Unless you are sure you know what you are doing, it is highly recommended that
-you remove this package immediately after the install completes, which will
-restore the original /usr/sbin/brctl.
+The original XenServer scripts replaced by this package are not those
+of any supported version of XenServer. This could lead to unexpected
+behavior of your server. Unless you are sure you know what you are
+doing, it is highly recommended that you remove this package
+immediately after the install completes, which will restore the
+XenServer scripts that you were previously using.
EOF
fi
printf "Re-creating xapi database cache... "
fi
- mkdir -p /var/lib/openvswitch
- if /root/vswitch/scripts/interface-reconfigure rewrite; then
+ if /usr/share/vswitch/scripts/interface-reconfigure rewrite; then
printf "done.\n"
else
printf "FAILED\n"
fi
fi
+# Ensure that modprobe will find our modules.
+depmod %{xen_version}
+
if grep -F net.ipv4.conf.all.arp_filter /etc/sysctl.conf >/dev/null 2>&1; then :; else
cat >>/etc/sysctl.conf <<EOF
# This works around an issue in xhad, which binds to a particular
# Create default or update existing /etc/sysconfig/vswitch.
SYSCONFIG=/etc/sysconfig/vswitch
-TEMPLATE=/root/vswitch/scripts/sysconfig.template
+TEMPLATE=/usr/share/vswitch/scripts/sysconfig.template
if [ ! -e $SYSCONFIG ]; then
cp $TEMPLATE $SYSCONFIG
else
fi
# Replace XenServer files by our versions.
-mkdir -p %{_prefix}/xs-original \
+mkdir -p /usr/lib/vswitch/xs-original \
|| printf "Could not create script backup directory.\n"
for f in \
/opt/xensource/libexec/interface-reconfigure \
do
s=$(basename "$f")
t=$(readlink "$f")
- if [ "$t" != "%{_prefix}/scripts/$s" ]; then
- mv "$f" %{_prefix}/xs-original/ \
+ if [ "$t" != "/usr/share/vswitch/scripts/$s" ]; then
+ mv "$f" /usr/lib/vswitch/xs-original/ \
|| printf "Could not save original XenServer $s script\n"
- ln -s "%{_prefix}/scripts/$s" "$f" \
+ ln -s "/usr/share/vswitch/scripts/$s" "$f" \
|| printf "Could not link to vSwitch $s script\n"
fi
done
# Install xsconsole plugin
plugin=$(readlink /usr/lib/xsconsole/plugins-base/XSFeatureVSwitch.py)
-if [ "$plugin" != "/root/vswitch/scripts/XSFeatureVSwitch.py" ]; then
+if [ "$plugin" != "/usr/share/vswitch/scripts/XSFeatureVSwitch.py" ]; then
rm -f /usr/lib/xsconsole/plugins-base/XSFeatureVSwitch.py
- ln -s /root/vswitch/scripts/XSFeatureVSwitch.py /usr/lib/xsconsole/plugins-base/ || printf "Could not link to vSswitch xsconsole plugin.\n"
+ ln -s /usr/share/vswitch/scripts/XSFeatureVSwitch.py /usr/lib/xsconsole/plugins-base/ || printf "Could not link to vSswitch xsconsole plugin.\n"
fi
# Ensure all required services are set to run
/usr/sbin/brctl
do
s=$(basename "$f")
- if [ ! -f "%{_prefix}/xs-original/$s" ]; then
- printf "Original XenServer $s script not present in %{_prefix}/xs-original\n"
+ if [ ! -f "/usr/lib/vswitch/xs-original/$s" ]; then
+ printf "Original XenServer $s script not present in /usr/lib/vswitch/xs-original\n"
printf "Could not restore original XenServer script.\n"
else
(rm -f "$f" \
- && mv "%{_prefix}/xs-original/$s" "$f") \
+ && mv "/usr/lib/vswitch/xs-original/$s" "$f") \
|| printf "Could not restore original XenServer $s script.\n"
fi
done
- find %{_prefix} -type d -depth -exec rmdir \{\} \; \
- || printf "Could not remove vSwitch install directory.\n"
-
- # Remove all configuration and log files
+ # Remove all configuration files
rm -f /etc/ovs-vswitchd.conf
rm -f /etc/sysconfig/vswitch
- rm -f /var/log/vswitch*
rm -f /etc/ovs-vswitchd.cacert
rm -f /var/lib/openvswitch/dbcache
/etc/xapi.d/plugins/vswitch-cfg-update
/etc/logrotate.d/vswitch
/etc/profile.d/vswitch.sh
-/root/vswitch/kernel_modules/brcompat_mod.ko
-/root/vswitch/kernel_modules/openvswitch_mod.ko
-/root/vswitch/kernel_modules/veth_mod.ko
-/root/vswitch/scripts/dump-vif-details
-/root/vswitch/scripts/refresh-xs-network-uuids
-/root/vswitch/scripts/interface-reconfigure
-/root/vswitch/scripts/vif
-/root/vswitch/scripts/xen-bugtool
-/root/vswitch/scripts/XSFeatureVSwitch.py
-/root/vswitch/scripts/brctl
-/root/vswitch/scripts/sysconfig.template
+/lib/modules/%{xen_version}/kernel/net/vswitch/openvswitch_mod.ko
+/lib/modules/%{xen_version}/kernel/net/vswitch/brcompat_mod.ko
+/usr/share/vswitch/scripts/dump-vif-details
+/usr/share/vswitch/scripts/refresh-xs-network-uuids
+/usr/share/vswitch/scripts/interface-reconfigure
+/usr/share/vswitch/scripts/vif
+/usr/share/vswitch/scripts/xen-bugtool
+/usr/share/vswitch/scripts/XSFeatureVSwitch.py
+/usr/share/vswitch/scripts/brctl
+/usr/share/vswitch/scripts/sysconfig.template
# Following two files are generated automatically by rpm. We don't
# really need them and they won't be used on the XenServer, but there
# isn't an obvious place to get rid of them since they are generated
# after the install script runs. Since they are small, we just
# include them.
-/root/vswitch/scripts/XSFeatureVSwitch.pyc
-/root/vswitch/scripts/XSFeatureVSwitch.pyo
-/root/vswitch/sbin/ovs-brcompatd
-/root/vswitch/sbin/ovs-vswitchd
-/root/vswitch/bin/ovs-appctl
-/root/vswitch/bin/ovs-cfg-mod
-/root/vswitch/bin/ovs-dpctl
-/root/vswitch/bin/ovs-ofctl
-/root/vswitch/share/man/man5/ovs-vswitchd.conf.5
-/root/vswitch/share/man/man8/ovs-appctl.8
-/root/vswitch/share/man/man8/ovs-brcompatd.8
-/root/vswitch/share/man/man8/ovs-cfg-mod.8
-/root/vswitch/share/man/man8/ovs-dpctl.8
-/root/vswitch/share/man/man8/ovs-ofctl.8
-/root/vswitch/share/man/man8/ovs-vswitchd.8
+/usr/share/vswitch/scripts/XSFeatureVSwitch.pyc
+/usr/share/vswitch/scripts/XSFeatureVSwitch.pyo
+/usr/sbin/ovs-brcompatd
+/usr/sbin/ovs-vswitchd
+/usr/bin/ovs-appctl
+/usr/bin/ovs-cfg-mod
+/usr/bin/ovs-dpctl
+/usr/bin/ovs-ofctl
+/usr/bin/ovs-vsctl
+/usr/share/man/man5/ovs-vswitchd.conf.5.gz
+/usr/share/man/man8/ovs-appctl.8.gz
+/usr/share/man/man8/ovs-brcompatd.8.gz
+/usr/share/man/man8/ovs-cfg-mod.8.gz
+/usr/share/man/man8/ovs-dpctl.8.gz
+/usr/share/man/man8/ovs-ofctl.8.gz
+/usr/share/man/man8/ovs-vsctl.8.gz
+/usr/share/man/man8/ovs-vswitchd.8.gz
+/var/lib/openvswitch