From: Ben Pfaff Date: Thu, 27 Mar 2008 22:19:17 +0000 (-0700) Subject: Implement userspace switch. X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=98d149406a08e19ffe191e1289af4ad6412e33ef;p=openvswitch Implement userspace switch. --- diff --git a/INSTALL b/INSTALL index 5a02fbde..c0f48286 100644 --- a/INSTALL +++ b/INSTALL @@ -22,34 +22,6 @@ distribution, you will need the following software: and authenticity in the connections among OpenFlow switches and controllers. -To compile the datapath kernel module, you will additionally need: - - - A supported Linux kernel version. Please refer to README for a - list of supported versions. - - The OpenFlow datapath requires bridging support (CONFIG_BRIDGE) - to be built as a kernel module. (This is common in kernels - provided by Linux distributions.) The bridge module must not be - loaded or in use. If the bridge module is running (check with - "lsmod | grep bridge"), you must remove it ("rmmod bridge") - before starting the datapath. - - - The correct version of GCC for the kernel that you are building - the module against: - - * To build a kernel module for a Linux 2.6 kernel, you need - the same version of GCC that was used to build that kernel - (usually version 4.0 or later). - - * To build a kernel module for a Linux 2.4 kernel, you need an - earlier version of GCC, typically GCC 2.95, 3.3, or 3.4. - - - A kernel build directory corresponding to the Linux kernel image - the module is to run on. Under Debian and Ubuntu, for example, - each linux-image package containing a kernel binary has a - corresponding linux-headers package with the required build - infrastructure. - If you are working from a Git tree or snapshot (instead of from a distribution tarball), or if you modify the OpenFlow build system, you will also need the following software: @@ -61,25 +33,27 @@ will also need the following software: - pkg-config (http://pkg-config.freedesktop.org/wiki/). We test with version 0.22. -Building the Code ------------------ +The optional Linux module has additional prerequisites, described +later in the section "Building and Testing the Linux Kernel-Based +Switch". -1. In the top source directory, configure the package by running the - configure script. To compile without building a kernel module, you - can usually invoke configure without any arguments: - % ./configure +Building Userspace Programs +--------------------------- - To build a kernel module as well as the rest of the distribution, - pass the location of the kernel build directory as an argument. - Use --with-l26 for Linux 2.6, --with-l24 for Linux 2.4. For - example, to build for a running instance of Linux 2.6: - % ./configure --with-l26=/lib/modules/`uname -r`/build +These instructions describe how to build the userspace components of +the OpenFlow distribution. Refer to "Building and Testing the Linux +Kernel-Based Switch", below, for additional instructions on how to +build the optional Linux kernel module. - To build for a running instance of Linux 2.4: - % ./configure --with-l24=/lib/modules/`uname -r`/build +1. In the top source directory, configure the package by running the + configure script. You can usually invoke configure without any + arguments: + + % ./configure To use a specific C compiler for compiling OpenFlow user programs, also specify it on the configure command line, like so: + % ./configure CC=gcc-4.2 The configure script accepts a number of other options and honors @@ -87,110 +61,67 @@ Building the Code configure with the --help option. 2. Run make in the top source directory: + % make The following binaries will be built: - Datapath kernel module: - datapath/linux-2.6/openflow_mod.ko (if --with-l26 was specified) - datapath/linux-2.4/openflow_mod.o (if --with-l24 was specified) + - Switch executable: switch/switch. This executable is built + only if the configure script detects a supported interface to + network devices. Refer to README for a list of OSes whose + network device interfaces are supported. - Secure channel executable: - secchan/secchan + - Secure channel executable: secchan/secchan. - Controller executable: - controller/controller + - Controller executable: controller/controller. - Datapath administration utility: - utilities/dpctl + - Datapath administration utility: utilities/dpctl. - Runtime logging configuration utility: - utilities/vlogconf + - Runtime logging configuration utility: utilities/vlogconf. 3. (Optional) Run "make install" to install the executables and manpages into the running system, by default under /usr/local. -Installing the datapath ------------------------ - -To run the module, simply insmod it: - - (Linux 2.6) - % insmod datapath/linux-2.6/openflow_mod.ko - - (Linux 2.4) - % insmod datapath/linux-2.4/compat24_mod.o - % insmod datapath/linux-2.4/openflow_mod.o - +Testing Userspace Programs +-------------------------- -Testing the datapath --------------------- +1. Start the OpenFlow controller running in the background, by running + the "controller" program with a command like the following: -Once the OpenFlow datapath has been installed (you can verify that it is -running if it appears in lsmod's listing), you can configure it using -the dpctl command line utility. + % controller ptcp: & -1. Create a datapath instance. The command below creates a datapath with - ID 0 (see dpctl(8) for more detailed usage information). - % dpctl adddp 0 + This command causes the controller to bind to port 975 (the + default) awaiting connections from OpenFlow switches. See + controller(8) for details. - (note, while in principle openflow_mod supports multiple datapaths - within the same host, this is rarely useful in practice) +2. On the same machine, use the "switch" program to start an OpenFlow + switch, specifying network devices to use as switch ports on the -i + option as a comma-separated list, like so: -2. Use dpctl to attach the datapath to physical interfaces on the - machine. Say, for example, you want to create a trivial 2-port - switch using interfaces eth1 and eth2, you would issue the following - commands: - % dpctl addif 0 eth1 - % dpctl addif 0 eth2 - - You can verify that the interfaces were successfully added by asking - dpctl to print the current status of datapath 0: - % dpctl show 0 - -3. (Optional) You can manually add flows to the datapath to test using - dpctl add-flows and view them using dpctl dump-flows. See dpctl(8) - for more details. - -4. The simplest way to test the datapath is to run the provided sample - controller on the host machine to manage the datapath directly using - netlink: - % controller -v nl:0 - - Once the controller is running, the datapath should operate like a - learning Ethernet switch. You may monitor the flows in the datapath - flow table using "dpctl dump-flows" command. - -Running the datapath with a remote controller ---------------------------------------------- - -1. Start the datapath and attach it to two or more physical ports as - described in the previous section. - - Note: The current version of the secure channel and controller - require at least one interface not be connected to the datapath - to be functional. This interface will be used for communication - between the secure channel and the controller. Future releases will - support in-band control communication. - -2. Run the controller in passive tcp mode on the host which will act as - the controller. In the example below, the controller will bind to - port 975 (the default) awaiting connections from secure channels. - % controller -v ptcp: - - (See controller(8) for more details) + % switch tcp:127.0.0.1 -i eth1,eth2 - Make sure the machine hosting the controller is reachable by the switch. - -3. Run secchan on the datapath host to start the secure channel - connecting the datapath to a remote controller. (See secchan(8) - for usage details). The channel should be configured to connect to - the controller's IP address on the port configured in step 2. - - If the controller is running on host 192.168.1.2 port 975 (the - default port) and the datapath ID is 0, the secchan invocation - would look like: - % secchan -v nl:0 tcp:192.168.1.2 + The network devices that you specify should not have configured IP + addresses. The switch program must run as root. + +3. The controller causes each switch that connects to it to act like a + learning Ethernet switch. Thus, devices plugged into the specified + network ports should now be able to send packets to each other, as + if they were plugged into ports on a conventional Ethernet switch. + +Troubleshooting: if the commands above do not work, try using the -v +or --verbose option on the controller or switch commands, which will +cause a large amount of debug output from each program. + +Remote switches: These instructions assume that the controller and the +switch are running on the same machine. This is an easy configuration +for testing, but a more conventional setup would run a controller on +one machine and one or more switches on different machines. To do so, +simply specify the IP address of the controller as the first argument +to the switch program (in place of 127.0.0.1). (Note: The current +version of the switch and controller requires that they be connected +through a "control network" that is physically separate from the one +that they are controlling. Future releases will support in-band +control communication.) Secure operation over SSL ------------------------- @@ -208,26 +139,33 @@ Public Key Infrastructure" below. To configure the controller to listen for SSL connections on the default port, invoke it as follows: + % controller -v pssl: --private-key=PRIVKEY --certificate=CERT \ --ca-cert=CACERT + where PRIVKEY is a file containing the controller's private key, CERT is a file containing the controller CA's certificate for the controller's public key, and CACERT is a file containing the root certificate for the switch CA. If, for example, your PKI was created with the instructions below, then the invocation would look like: + % controller -v pssl: --private-key=ctl-privkey.pem \ --certificate=ctl-cert.pem --ca-cert=pki/switchca/cacert.pem To configure a switch to connect to a controller running on the default port on host 192.168.1.2 over SSL, invoke it as follows: - % secchan -v nl:0 ssl:192.168.1.2 --private-key=PRIVKEY \ + + % switch -v ssl:192.168.1.2 -i INTERFACES --private-key=PRIVKEY \ --certificate=CERT --ca-cert=CACERT -where PRIVKEY is a file containing the switch's private key, CERT is a -file containing the switch CA's certificate for the switch's public -key, and CACERT is a file containing the root certificate for the -controller CA. If, for example, your PKI was created with the + +where INTERFACES is the command-separated list of network devices +interfaces, PRIVKEY is a file containing the switch's private key, +CERT is a file containing the switch CA's certificate for the switch's +public key, and CACERT is a file containing the root certificate for +the controller CA. If, for example, your PKI was created with the instructions below, then the invocation would look like: - % secchan -v nl:0 ssl:192.168.1.2 --private-key=sc-privkey.pem \ + + % secchan -v -i INTERFACES ssl:192.168.1.2 --private-key=sc-privkey.pem \ --certificate=sc-cert.pem --ca-cert=pki/controllerca/cacert.pem [*] To be specific, OpenFlow uses TLS version 1.0 or later (TLSv1), as @@ -249,9 +187,9 @@ controllerca subdirectory contains controller certificate authority related files, including the following: - cacert.pem: Root certificate for the controller certificate - authority. This file must be provided to the secchan - program with the --ca-cert option to enable it to - authenticate valid controllers. + authority. This file must be provided to the switch or secchan + program with the --ca-cert option to enable it to authenticate + valid controllers. - private/cakey.pem: Private signing key for the controller certificate authority. This file must be kept secret. There is @@ -280,12 +218,157 @@ their original locations). The --private-key and --certificate options of controller, respectively, would point to these files. Analogously, to create a switch private key and certificate in files -named sc-privkey.pem and sc-cert.pem, for example, you could run: +named sc-privkey.pem and sc-cert.pem, for example, you could run: % ofp-pki req+sign sc switch sc-privkey.pem and sc-cert.pem would need to be copied to the switch for its use at runtime (they could then be deleted from their original -locations). The --private-key and --certificate options of secchan, -respectively, would point to these files. +locations). The --private-key and --certificate options, +respectively, of switch and secchan would point to these files. + +Building and Testing the Linux Kernel-Based Switch +-------------------------------------------------- + +The OpenFlow distribution also includes a Linux kernel module that can +be used to achieve higher switching performance at a cost in +portability and ease of installation. Compiling the kernel module has +the following prerequisites in addition to those listed in the +"Prerequisites" section above: + + - A supported Linux kernel version. Please refer to README for a + list of supported versions. + + The OpenFlow datapath requires bridging support (CONFIG_BRIDGE) + to be built as a kernel module. (This is common in kernels + provided by Linux distributions.) The bridge module must not be + loaded or in use. If the bridge module is running (check with + "lsmod | grep bridge"), you must remove it ("rmmod bridge") + before starting the datapath. + + - The correct version of GCC for the kernel that you are building + the module against: + + * To build a kernel module for a Linux 2.6 kernel, you need + the same version of GCC that was used to build that kernel + (usually version 4.0 or later). + + * To build a kernel module for a Linux 2.4 kernel, you need an + earlier version of GCC, typically GCC 2.95, 3.3, or 3.4. + + - A kernel build directory corresponding to the Linux kernel image + the module is to run on. Under Debian and Ubuntu, for example, + each linux-image package containing a kernel binary has a + corresponding linux-headers package with the required build + infrastructure. + +To build the kernel module, follow the build process described under +"Building Userspace Programs" above, but pass the location of the +kernel build directory as an additional argument to the configure +script, as described under step 1 in that section. Specify the +location on --with-l26 for Linux 2.6, --with-l24 for Linux 2.4. For +example, to build for a running instance of Linux 2.6: + + % ./configure --with-l26=/lib/modules/`uname -r`/build + +To build for a running instance of Linux 2.4: + + % ./configure --with-l24=/lib/modules/`uname -r`/build + +In addition to the binaries listed under step 2 in "Building Userspace +Programs" above, "make" will build the following kernel modules: + + datapath/linux-2.6/openflow_mod.ko (if --with-l26 was specified) + datapath/linux-2.4/openflow_mod.o (if --with-l24 was specified) + +Once you have built the kernel modules, activating them requires only +running "insmod", e.g.: + + (Linux 2.6) + % insmod datapath/linux-2.6/openflow_mod.ko + + (Linux 2.4) + % insmod datapath/linux-2.4/compat24_mod.o + % insmod datapath/linux-2.4/openflow_mod.o + +The insmod program must be run as root. You may need to specify a +full path to insmod, which is usually in the /sbin directory. To +verify that the modules have been loaded, run "lsmod" (also in /sbin) +and check that openflow_mod appears in the result. + +Testing the Kernel-Based Implementation +--------------------------------------- + +The OpenFlow kernel module must be loaded, as described in the +previous section, before it may be tested. + +1. Create a datapath instance. The command below creates a datapath with + ID 0 (see dpctl(8) for more detailed usage information). + + % dpctl adddp 0 + + (In principle, openflow_mod supports multiple datapaths within the + same host, but this is rarely useful in practice.) + +2. Use dpctl to attach the datapath to physical interfaces on the + machine. Say, for example, you want to create a trivial 2-port + switch using interfaces eth1 and eth2, you would issue the following + commands: + + % dpctl addif 0 eth1 + % dpctl addif 0 eth2 + + You can verify that the interfaces were successfully added by asking + dpctl to print the current status of datapath 0: + + % dpctl show 0 + +3. (Optional) You can manually add flows to the datapath to test using + dpctl add-flows and view them using dpctl dump-flows. See dpctl(8) + for more details. + +4. The simplest way to test the datapath is to run the provided sample + controller on the host machine to manage the datapath directly using + netlink: + + % controller -v nl:0 + + Once the controller is running, the datapath should operate like a + learning Ethernet switch. You may monitor the flows in the datapath + flow table using "dpctl dump-flows" command. + +The preceding instructions assume that the controller and the switch +are running on the same machine. This is an easy configuration for +testing, but a more conventional setup would run a controller on one +machine and one or more switches on different machines. Use the +following instructions to set up remote switches: + +1. Start the datapath and attach it to two or more physical ports as + described in the previous section. + + Note: The current version of the switch and controller requires + that they be connected through a "control network" that is + physically separate from the one that they are controlling. Future + releases will support in-band control communication. + +2. Run the controller in passive tcp mode on the host which will act as + the controller. In the example below, the controller will bind to + port 975 (the default) awaiting connections from secure channels. + + % controller -v ptcp: + + (See controller(8) for more details) + + Make sure the machine hosting the controller is reachable by the switch. + +3. Run secchan on the datapath host to start the secure channel + connecting the datapath to a remote controller. (See secchan(8) + for usage details). The channel should be configured to connect to + the controller's IP address on the port configured in step 2. + + If the controller is running on host 192.168.1.2 port 975 (the + default port) and the datapath ID is 0, the secchan invocation + would look like: + + % secchan -v nl:0 tcp:192.168.1.2 Bug Reporting ------------- diff --git a/Makefile.am b/Makefile.am index aee1a9dc..6306ab4c 100644 --- a/Makefile.am +++ b/Makefile.am @@ -1,2 +1,6 @@ AUTOMAKE_OPTIONS=foreign -SUBDIRS = lib datapath secchan controller utilities man include third-party +SUBDIRS = lib datapath secchan controller +if HAVE_IF_PACKET +SUBDIRS += switch +endif +SUBDIRS += utilities man include third-party diff --git a/README b/README index 36f91878..cb0a9fc1 100644 --- a/README +++ b/README @@ -30,17 +30,22 @@ Specification [2]. What's here? ------------ -This distribution includes a Linux-specific reference implementation -of an OpenFlow switch, comprising: +This distribution includes two different reference implementations of +an OpenFlow switch. The first implementation, which is closely tied +to Linux because it is partially implemented in the Linux kernel, has +the following components: - A Linux kernel module that implements the flow table and - OpenFlow protocol. + OpenFlow protocol, in the datapath directory. - secchan, a program that implements the secure channel component of the reference switch. - dpctl, a tool for configuring the kernel module. +The second implementation is a single user-space program, named +"switch", that integrates all three parts of an OpenFlow switch. + This distribution includes some additional software as well: - controller, a simple program that connects to any number of @@ -63,11 +68,11 @@ directory. Platform support ---------------- -Other than the Linux kernel module, the software in the OpenFlow -distribution should compile under Unix-like environments such as -Linux, FreeBSD, Mac OS X, and Solaris. Our primary test environment -is Debian GNU/Linux. Please contact us with portability-related bug -reports or patches. +Other than the Linux kernel module and userspace switch +implementation, the software in the OpenFlow distribution should +compile under Unix-like environments such as Linux, FreeBSD, Mac OS X, +and Solaris. Our primary test environment is Debian GNU/Linux. +Please contact us with portability-related bug reports or patches. The Linux kernel module is, of course, Linux-specific, and the secchan and dpctl utilities will not be as useful without the kernel module. @@ -75,6 +80,11 @@ The testing of the kernel module has focused on Linux 2.6.23. Linux 2.6 releases from 2.6.15 onward and Linux 2.4 releases from 2.4.20 onward should also work. +The userspace switch implementation should be easy to port to +Unix-like systems. The interface to network devices, in netdev.c, is +the only code that should need to change. So far, only Linux is +supported. We welcome ports to other platforms. + GCC is the expected compiler. Bugs/Shortcomings diff --git a/configure.ac b/configure.ac index cd648f34..b03d669f 100644 --- a/configure.ac +++ b/configure.ac @@ -34,6 +34,15 @@ if test "$HAVE_NETLINK" = yes; then [Define to 1 if Netlink protocol is available.]) fi +AC_CHECK_HEADER([net/if_packet.h], + [HAVE_IF_PACKET=yes], + [HAVE_IF_PACKET=no]) +AM_CONDITIONAL([HAVE_IF_PACKET], [test "$HAVE_IF_PACKET" = yes]) +if test "$HAVE_IF_PACKET" = yes; then + AC_DEFINE([HAVE_IF_PACKET], [1], + [Define to 1 if net/if_packet.h is available.]) +fi + PKG_CHECK_MODULES([SSL], [libssl], [HAVE_OPENSSL=yes], [HAVE_OPENSSL=no @@ -62,6 +71,7 @@ include/Makefile controller/Makefile utilities/Makefile secchan/Makefile +switch/Makefile datapath/tests/Makefile third-party/Makefile datapath/linux-2.6/Makefile diff --git a/man/man8/switch.8 b/man/man8/switch.8 new file mode 100644 index 00000000..78f9e571 --- /dev/null +++ b/man/man8/switch.8 @@ -0,0 +1,99 @@ +.TH secchan 8 "March 2008" "OpenFlow" "OpenFlow Manual" + +.SH NAME +switch \- userspace implementation of OpenFlow switch + +.SH SYNOPSIS +.B switch +[\fIoptions\fR] +\fB-i\fR \fInetdev\fR[\fB,\fInetdev\fR]... +\fIcontroller\fR + +.SH DESCRIPTION +The \fBswitch\fR is a userspace implementation of an OpenFlow switch. +It implements all three parts of the OpenFlow switch specification: a +``flow table'' in which each flow entry is associated with an action +telling the switch how to process the flow; a ``secure channel'' +connecting the switch to a remote process (a controller), allowing +commands and packets to be sent between the controller and the switch; +and an OpenFlow protocol implementation. + +\fBswitch\fR monitors one or more network device interfaces, +forwarding packets between them according to the entries in the flow +table. It also maintains a connection to an OpenFlow controller over +a TCP or SSL connection, relaying packets that do not match a flow +table entry to the controller and executing commands sent by the +controller. + +For access to network devices, the switch program must normally run as +root. + +The mandatory \fIcontroller\fR argument specifies how to connect to +the OpenFlow controller. It takes one of the following forms: + +.TP +\fBtcp:\fIhost\fR[\fB:\fIport\fR] +The specified TCP \fIport\fR (default: 975) on the given remote +\fIhost\fR. + +.TP +\fBssl:\fIhost\fR[\fB:\fIport\fR] +The specified SSL \fIport\fR (default: 976) on the given remote +\fIhost\fR. The \fB--private-key\fR, \fB--certificate\fR, and +\fB--ca-cert\fR options are mandatory when this form is used. + +.SH OPTIONS +.TP +\fB-i\fR, \fB--interfaces=\fR\fInetdev\fR[\fB,\fInetdev\fR]... +Specifies each \fInetdev\fR (e.g., \fBeth0\fR) as a switch port. The +specified network devices should not have any configured IP addresses. +This option may be given any number of times to specify additional +network devices. + +.TP +\fB-d\fR, \fB--datapath-id=\fIdpid\fR +Specifies the OpenFlow switch ID (a 48-bit number that uniquely +identifies a controller) as \fIdpid\fR, which consist of exactly 12 +hex digits. Without this option, \fBswitch\fR picks an ID randomly. + +.TP +\fB-p\fR, \fB--private-key=\fIprivkey.pem\fR +Specifies a PEM file containing the private key used as the switch's +identity for SSL connections to the controller. + +.TP +\fB-c\fR, \fB--certificate=\fIcert.pem\fR +Specifies a PEM file containing a certificate, signed by the +controller's certificate authority (CA), that certifies the switch's +private key to identify a trustworthy switch. + +.TP +\fB-C\fR, \fB--ca-cert=\fIcacert.pem\fR +Specifies a PEM file containing the CA certificate used to verify that +the switch is connected to a trustworthy controller. + +.TP +.BR \-h ", " \-\^\-help +Prints a brief help message to the console. + +.TP +.BR \-u ", " \-\^\-unreliable +Do not attempt to reconnect the channel if a connection drops. By +default, \fBsecchan\fR attempts to reconnect. + +.TP +.BR \-v ", " \-\^\-verbose +Prints debug messages to the console. + +.TP +.BR \-V ", " \-\^\-version +Prints version information to the console. + +.SH "SEE ALSO" + +.BR dpctl (8), +.BR controller (8) +.BR vlogconf (8) + +.SH BUGS +Currently \fBsecchan\fR does not support SSL diff --git a/switch/.gitignore b/switch/.gitignore new file mode 100644 index 00000000..2a986817 --- /dev/null +++ b/switch/.gitignore @@ -0,0 +1,3 @@ +/Makefile +/Makefile.in +/switch diff --git a/switch/Makefile.am b/switch/Makefile.am new file mode 100644 index 00000000..5e84a996 --- /dev/null +++ b/switch/Makefile.am @@ -0,0 +1,26 @@ +include ../Make.vars + +bin_PROGRAMS = switch + +switch_SOURCES = \ + chain.c \ + chain.h \ + controller.c \ + controller.h \ + crc32.c \ + crc32.h \ + datapath.c \ + datapath.h \ + forward.c \ + forward.h \ + netdev.c \ + netdev.h \ + switch.c \ + switch-flow.c \ + switch-flow.h \ + table.h \ + table-hash.c \ + table-linear.c \ + table-mac.c + +switch_LDADD = ../lib/libopenflow.la diff --git a/switch/chain.c b/switch/chain.c new file mode 100644 index 00000000..516c9304 --- /dev/null +++ b/switch/chain.c @@ -0,0 +1,174 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "chain.h" +#include +#include +#include +#include "switch-flow.h" +#include "table.h" + +#define THIS_MODULE VLM_chain +#include "vlog.h" + +/* Attempts to append 'table' to the set of tables in 'chain'. Returns 0 or + * negative error. If 'table' is null it is assumed that table creation failed + * due to out-of-memory. */ +static int add_table(struct sw_chain *chain, struct sw_table *table) +{ + if (table == NULL) + return -ENOMEM; + if (chain->n_tables >= CHAIN_MAX_TABLES) { + VLOG_ERR("too many tables in chain\n"); + table->destroy(table); + return -ENOBUFS; + } + chain->tables[chain->n_tables++] = table; + return 0; +} + +/* Creates and returns a new chain. Returns NULL if the chain cannot be + * created. */ +struct sw_chain *chain_create(void) +{ + struct sw_chain *chain = calloc(1, sizeof *chain); + if (chain == NULL) + return NULL; + + if (add_table(chain, table_mac_create(TABLE_MAC_NUM_BUCKETS, + TABLE_MAC_MAX_FLOWS)) + || add_table(chain, table_hash2_create(0x1EDC6F41, TABLE_HASH_MAX_FLOWS, + 0x741B8CD7, TABLE_HASH_MAX_FLOWS)) + || add_table(chain, table_linear_create(TABLE_LINEAR_MAX_FLOWS))) { + chain_destroy(chain); + return NULL; + } + + return chain; +} + +/* Searches 'chain' for a flow matching 'key', which must not have any wildcard + * fields. Returns the flow if successful, otherwise a null pointer. */ +struct sw_flow * +chain_lookup(struct sw_chain *chain, const struct sw_flow_key *key) +{ + int i; + + assert(!key->wildcards); + for (i = 0; i < chain->n_tables; i++) { + struct sw_table *t = chain->tables[i]; + struct sw_flow *flow = t->lookup(t, key); + if (flow) + return flow; + } + return NULL; +} + +/* Inserts 'flow' into 'chain', replacing any duplicate flow. Returns 0 if + * successful or a negative error. + * + * If successful, 'flow' becomes owned by the chain, otherwise it is retained + * by the caller. */ +int +chain_insert(struct sw_chain *chain, struct sw_flow *flow) +{ + int i; + + for (i = 0; i < chain->n_tables; i++) { + struct sw_table *t = chain->tables[i]; + if (t->insert(t, flow)) + return 0; + } + + return -ENOBUFS; +} + +/* Deletes from 'chain' any and all flows that match 'key'. Returns the number + * of flows that were deleted. + * + * Expensive in the general case as currently implemented, since it requires + * iterating through the entire contents of each table for keys that contain + * wildcards. Relatively cheap for fully specified keys. + * + * The caller need not hold any locks. */ +int +chain_delete(struct sw_chain *chain, const struct sw_flow_key *key, int strict) +{ + int count = 0; + int i; + + for (i = 0; i < chain->n_tables; i++) { + struct sw_table *t = chain->tables[i]; + count += t->delete(t, key, strict); + } + + return count; + +} + +/* Performs timeout processing on all the tables in 'chain'. Returns the + * number of flow entries deleted through expiration. + * + * Expensive as currently implemented, since it iterates through the entire + * contents of each table. + * + * The caller need not hold any locks. */ +int +chain_timeout(struct sw_chain *chain, struct datapath *dp) +{ + int count = 0; + int i; + + for (i = 0; i < chain->n_tables; i++) { + struct sw_table *t = chain->tables[i]; + count += t->timeout(dp, t); + } + return count; +} + +/* Destroys 'chain', which must not have any users. */ +void +chain_destroy(struct sw_chain *chain) +{ + int i; + + for (i = 0; i < chain->n_tables; i++) { + struct sw_table *t = chain->tables[i]; + t->destroy(t); + } + free(chain); +} + +/* Prints statistics for each of the tables in 'chain'. */ +void +chain_print_stats(struct sw_chain *chain) +{ + int i; + + printf("\n"); + for (i = 0; i < chain->n_tables; i++) { + struct sw_table *t = chain->tables[i]; + struct sw_table_stats stats; + t->stats(t, &stats); + printf("%s: %lu/%lu flows\n", + stats.name, stats.n_flows, stats.max_flows); + } +} diff --git a/switch/chain.h b/switch/chain.h new file mode 100644 index 00000000..b1c70cf4 --- /dev/null +++ b/switch/chain.h @@ -0,0 +1,49 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef CHAIN_H +#define CHAIN_H 1 + +struct sw_flow; +struct sw_flow_key; +struct datapath; + +#define TABLE_LINEAR_MAX_FLOWS 100 +#define TABLE_HASH_MAX_FLOWS 65536 +#define TABLE_MAC_MAX_FLOWS 1024 +#define TABLE_MAC_NUM_BUCKETS 1024 + +/* Set of tables chained together in sequence from cheap to expensive. */ +#define CHAIN_MAX_TABLES 4 +struct sw_chain { + int n_tables; + struct sw_table *tables[CHAIN_MAX_TABLES]; +}; + +struct sw_chain *chain_create(void); +struct sw_flow *chain_lookup(struct sw_chain *, const struct sw_flow_key *); +int chain_insert(struct sw_chain *, struct sw_flow *); +int chain_delete(struct sw_chain *, const struct sw_flow_key *, int); +int chain_timeout(struct sw_chain *, struct datapath *); +void chain_destroy(struct sw_chain *); +void chain_print_stats(struct sw_chain *); + +#endif /* chain.h */ diff --git a/switch/controller.c b/switch/controller.c new file mode 100644 index 00000000..91493ade --- /dev/null +++ b/switch/controller.c @@ -0,0 +1,180 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "controller.h" +#include +#include +#include "buffer.h" +#include "forward.h" +#include "poll-loop.h" +#include "ofp-print.h" +#include "util.h" +#include "vconn.h" + +#define THIS_MODULE VLM_controller_connection +#include "vlog.h" + +void +controller_init(struct controller_connection *cc, + const char *name, bool reliable) +{ + cc->reliable = reliable; + cc->name = name; + cc->vconn = NULL; + queue_init(&cc->txq); + cc->backoff_deadline = 0; + cc->backoff = 0; +} + +static int +try_send(struct controller_connection *cc) +{ + int retval = 0; + struct buffer *next = cc->txq.head->next; + retval = vconn_send(cc->vconn, cc->txq.head); + if (retval) { + return retval; + } + queue_advance_head(&cc->txq, next); + return 0; +} + +void +controller_run(struct controller_connection *cc, struct datapath *dp) +{ + if (!cc->vconn) { + if (time(0) >= cc->backoff_deadline) { + int retval; + + retval = vconn_open(cc->name, &cc->vconn); + if (!retval) { + cc->backoff_deadline = time(0) + cc->backoff; + cc->connected = false; + } else { + VLOG_WARN("%s: connection failed (%s)", + cc->name, strerror(retval)); + controller_disconnect(cc, 0); + } + } + } else if (!cc->connected) { + int error = vconn_connect(cc->vconn); + if (!error) { + VLOG_WARN("%s: connected", cc->name); + if (vconn_is_passive(cc->vconn)) { + fatal(0, "%s: passive vconn not supported in switch", + cc->name); + } + cc->connected = true; + } else if (error != EAGAIN) { + VLOG_WARN("%s: connection failed (%s)", + cc->name, strerror(error)); + controller_disconnect(cc, 0); + } + } else { + int iterations; + + for (iterations = 0; iterations < 50; iterations++) { + struct buffer *buffer; + int error = vconn_recv(cc->vconn, &buffer); + if (!error) { + fwd_control_input(dp, buffer->data, buffer->size); + buffer_delete(buffer); + } else if (error == EAGAIN) { + break; + } else { + controller_disconnect(cc, error); + return; + } + } + + while (cc->txq.n > 0) { + int error = try_send(cc); + if (error == EAGAIN) { + break; + } else if (error) { + controller_disconnect(cc, error); + return; + } + } + } +} + +void +controller_disconnect(struct controller_connection *cc, int error) +{ + time_t now = time(0); + + if (cc->vconn) { + if (!cc->reliable) { + fatal(0, "%s: connection dropped", cc->name); + } + + if (error > 0) { + VLOG_WARN("%s: connection dropped (%s)", + cc->name, strerror(error)); + } else if (error == EOF) { + VLOG_WARN("%s: connection closed", cc->name); + } else { + VLOG_WARN("%s: connection dropped", cc->name); + } + vconn_close(cc->vconn); + cc->vconn = NULL; + queue_clear(&cc->txq); + } + + if (now >= cc->backoff_deadline) { + cc->backoff = 1; + } else { + cc->backoff = MIN(60, MAX(1, 2 * cc->backoff)); + VLOG_WARN("%s: waiting %d seconds before reconnect\n", + cc->name, cc->backoff); + } + cc->backoff_deadline = now + cc->backoff; +} + +void +controller_wait(struct controller_connection *cc) +{ + if (cc->vconn) { + vconn_wait(cc->vconn, WAIT_RECV); + if (cc->txq.n) { + vconn_wait(cc->vconn, WAIT_SEND); + } + } else { + poll_timer_wait((cc->backoff_deadline - time(0)) * 1000); + } +} + +void +controller_send(struct controller_connection *cc, struct buffer *b) +{ + if (cc->vconn) { + if (cc->txq.n < 128) { + queue_push_tail(&cc->txq, b); + if (cc->txq.n == 1) { + try_send(cc); + } + } else { + VLOG_WARN("%s: controller queue overflow", cc->name); + buffer_delete(b); + } + } +} diff --git a/switch/controller.h b/switch/controller.h new file mode 100644 index 00000000..18672de8 --- /dev/null +++ b/switch/controller.h @@ -0,0 +1,49 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef CONTROLLER_H +#define CONTROLLER_H 1 + +#include "queue.h" +#include +#include + +struct datapath; + +struct controller_connection { + bool reliable; + const char *name; + struct vconn *vconn; + bool connected; + struct queue txq; + time_t backoff_deadline; + int backoff; +}; + +void controller_init(struct controller_connection *, + const char *name, bool reliable); +void controller_run(struct controller_connection *, struct datapath *); +void controller_connect(struct controller_connection *); +void controller_disconnect(struct controller_connection *, int error); +void controller_wait(struct controller_connection *); +void controller_send(struct controller_connection *, struct buffer *); + +#endif /* controller.h */ diff --git a/switch/crc32.c b/switch/crc32.c new file mode 100644 index 00000000..f927dc00 --- /dev/null +++ b/switch/crc32.c @@ -0,0 +1,55 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "crc32.h" + +void +crc32_init(struct crc32 *crc, unsigned int polynomial) +{ + int i; + + for (i = 0; i < CRC32_TABLE_SIZE; ++i) { + unsigned int reg = i << 24; + int j; + for (j = 0; j < CRC32_TABLE_BITS; j++) { + int topBit = (reg & 0x80000000) != 0; + reg <<= 1; + if (topBit) + reg ^= polynomial; + } + crc->table[i] = reg; + } +} + +unsigned int +crc32_calculate(const struct crc32 *crc, const void *data_, size_t n_bytes) +{ + const uint8_t *data = data_; + unsigned int result = 0; + size_t i; + + for (i = 0; i < n_bytes; i++) { + unsigned int top = result >> 24; + top ^= data[i]; + result = (result << 8) ^ crc->table[top]; + } + return result; +} diff --git a/switch/crc32.h b/switch/crc32.h new file mode 100644 index 00000000..2cbf2aa3 --- /dev/null +++ b/switch/crc32.h @@ -0,0 +1,38 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef CRC32_H +#define CRC32_H 1 + +#include +#include + +#define CRC32_TABLE_BITS 8 +#define CRC32_TABLE_SIZE (1u << CRC32_TABLE_BITS) + +struct crc32 { + unsigned int table[CRC32_TABLE_SIZE]; +}; + +void crc32_init(struct crc32 *, unsigned int polynomial); +unsigned int crc32_calculate(const struct crc32 *, const void *, size_t); + +#endif /* crc32.h */ diff --git a/switch/datapath.c b/switch/datapath.c new file mode 100644 index 00000000..20e33cc7 --- /dev/null +++ b/switch/datapath.c @@ -0,0 +1,388 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "datapath.h" +#include +#include +#include +#include +#include +#include "buffer.h" +#include "chain.h" +#include "controller.h" +#include "flow.h" +#include "forward.h" +#include "netdev.h" +#include "packets.h" +#include "poll-loop.h" +#include "table.h" +#include "xtoxll.h" + +#define THIS_MODULE VLM_datapath +#include "vlog.h" + +#define BRIDGE_PORT_NO_FLOOD 0x00000001 + +static void send_port_status(struct sw_port *p, uint8_t status); +static void del_switch_port(struct sw_port *p); +static int port_no(struct datapath *dp, struct sw_port *p) +{ + assert(p >= dp->ports && p < &dp->ports[ARRAY_SIZE(dp->ports)]); + return p - dp->ports; +} + +/* Generates a unique datapath id. It incorporates the datapath index + * and a hardware address, if available. If not, it generates a random + * one. + */ +static uint64_t +gen_datapath_id(void) +{ + /* Choose a random datapath id. */ + uint64_t id = 0; + int i; + + srand(time(0)); + + for (i = 0; i < ETH_ADDR_LEN; i++) { + id |= (uint64_t)(rand() & 0xff) << (8*(ETH_ADDR_LEN-1 - i)); + } + + return id; +} + +int +dp_new(struct datapath **dp_, uint64_t dpid, struct controller_connection *cc) +{ + struct datapath *dp; + + dp = calloc(1, sizeof *dp); + if (!dp) { + return ENOMEM; + } + + dp->last_timeout = time(0); + dp->cc = cc; + dp->id = dpid <= UINT64_C(0xffffffffffff) ? dpid : gen_datapath_id(); + dp->chain = chain_create(); + if (!dp->chain) { + VLOG_ERR("could not create chain"); + free(dp); + return ENOMEM; + } + + list_init(&dp->port_list); + dp->miss_send_len = OFP_DEFAULT_MISS_SEND_LEN; + *dp_ = dp; + return 0; +} + +int +dp_add_port(struct datapath *dp, const char *name) +{ + struct netdev *netdev; + struct sw_port *p; + int error; + + error = netdev_open(name, &netdev); + if (error) { + return error; + } + + for (p = dp->ports; ; p++) { + if (p >= &dp->ports[ARRAY_SIZE(dp->ports)]) { + return EXFULL; + } else if (!p->netdev) { + break; + } + } + + p->dp = dp; + p->netdev = netdev; + list_push_back(&dp->port_list, &p->node); + + /* Notify the ctlpath that this port has been added */ + send_port_status(p, OFPPR_ADD); + + return 0; +} + +void +dp_run(struct datapath *dp) +{ + time_t now = time(0); + struct sw_port *p, *n; + struct buffer *buffer = NULL; + + if (now != dp->last_timeout) { + chain_timeout(dp->chain, dp); + dp->last_timeout = now; + } + poll_timer_wait(1000); + + LIST_FOR_EACH_SAFE (p, n, struct sw_port, node, &dp->port_list) { + int error; + + if (!buffer) { + /* Allocate buffer with some headroom to add headers in forwarding + * to the controller or adding a vlan tag, plus an extra 2 bytes to + * allow IP headers to be aligned on a 4-byte boundary. */ + const int headroom = 128 + 2; + buffer = buffer_new(ETH_TOTAL_MAX + headroom); + buffer->data += headroom; + } + error = netdev_recv(p->netdev, buffer, false); + if (!error) { + fwd_port_input(dp, buffer, port_no(dp, p)); + buffer = NULL; + } else if (error != EAGAIN) { + VLOG_ERR("Error receiving data from %s: %s", + netdev_get_name(p->netdev), strerror(error)); + del_switch_port(p); + } + } + buffer_delete(buffer); +} + +void +dp_wait(struct datapath *dp) +{ + struct sw_port *p; + + LIST_FOR_EACH (p, struct sw_port, node, &dp->port_list) { + poll_fd_wait(netdev_get_fd(p->netdev), POLLIN, NULL); + } +} + +/* Delete 'p' from switch. */ +static void +del_switch_port(struct sw_port *p) +{ + send_port_status(p, OFPPR_DELETE); + netdev_close(p->netdev); + p->netdev = NULL; + list_remove(&p->node); +} + +void +dp_destroy(struct datapath *dp) +{ + struct sw_port *p, *n; + + if (!dp) { + return; + } + + LIST_FOR_EACH_SAFE (p, n, struct sw_port, node, &dp->port_list) { + del_switch_port(p); + } + chain_destroy(dp->chain); + free(dp); +} + +static int +flood(struct datapath *dp, struct buffer *buffer, int in_port) +{ + struct sw_port *p; + struct sw_port *prev_port; + + prev_port = NULL; + LIST_FOR_EACH (p, struct sw_port, node, &dp->port_list) { + if (port_no(dp, p) == in_port || p->flags & BRIDGE_PORT_NO_FLOOD) { + continue; + } + if (prev_port) { + struct buffer *clone = buffer_clone(buffer); + if (!clone) { + buffer_delete(buffer); + return -ENOMEM; + } + dp_output_port(dp, clone, in_port, port_no(dp, prev_port)); + } + prev_port = p; + } + if (prev_port) + dp_output_port(dp, buffer, in_port, port_no(dp, prev_port)); + else + buffer_delete(buffer); + + return 0; +} + +void +output_packet(struct datapath *dp, struct buffer *buffer, int out_port) +{ + if (out_port >= 0 && out_port < OFPP_MAX) { + struct sw_port *p = &dp->ports[out_port]; + if (p->netdev != NULL) { + /* FIXME: queue packets. */ + netdev_send(p->netdev, buffer, false); + return; + } + } + + buffer_delete(buffer); + /* FIXME: ratelimit */ + VLOG_DBG("can't forward to bad port %d\n", out_port); +} + +/* Takes ownership of 'buffer' and transmits it to 'out_port' on 'dp'. + */ +void +dp_output_port(struct datapath *dp, struct buffer *buffer, + int in_port, int out_port) +{ + + assert(buffer); + if (out_port == OFPP_FLOOD) { + flood(dp, buffer, in_port); + } else if (out_port == OFPP_CONTROLLER) { + dp_output_control(dp, buffer, in_port, fwd_save_buffer(buffer), 0, + OFPR_ACTION); + } else { + output_packet(dp, buffer, out_port); + } +} + +/* Takes ownership of 'buffer' and transmits it to 'dp''s controller. If + * 'buffer_id' != -1, then only the first 64 bytes of 'buffer' are sent; + * otherwise, all of 'buffer' is sent. 'reason' indicates why 'buffer' is + * being sent. 'max_len' sets the maximum number of bytes that the caller wants + * to be sent; a value of 0 indicates the entire packet should be sent. */ +void +dp_output_control(struct datapath *dp, struct buffer *buffer, int in_port, + uint32_t buffer_id, size_t max_len, int reason) +{ + struct ofp_packet_in *opi; + size_t total_len; + + total_len = buffer->size; + if (buffer_id != UINT32_MAX && max_len > buffer->size) { + buffer->size = max_len; + } + + opi = buffer_push_uninit(buffer, offsetof(struct ofp_packet_in, data)); + opi->header.version = OFP_VERSION; + opi->header.type = OFPT_PACKET_IN; + opi->header.length = htons(buffer->size); + opi->header.xid = htonl(0); + opi->buffer_id = htonl(buffer_id); + opi->total_len = htons(total_len); + opi->in_port = htons(in_port); + opi->reason = reason; + opi->pad = 0; + controller_send(dp->cc, buffer); +} + +static void fill_port_desc(struct datapath *dp, struct sw_port *p, + struct ofp_phy_port *desc) +{ + desc->port_no = htons(port_no(dp, p)); + strncpy((char *) desc->name, netdev_get_name(p->netdev), + sizeof desc->name); + desc->name[sizeof desc->name - 1] = '\0'; + memcpy(desc->hw_addr, netdev_get_etheraddr(p->netdev), ETH_ADDR_LEN); + desc->flags = htonl(p->flags); + desc->features = htonl(netdev_get_features(p->netdev)); + desc->speed = htonl(netdev_get_speed(p->netdev)); +} + +void +dp_send_hello(struct datapath *dp) +{ + struct buffer *buffer; + struct ofp_data_hello *odh; + struct sw_port *p; + + buffer = buffer_new(sizeof *odh); + odh = buffer_put_uninit(buffer, sizeof *odh); + memset(odh, 0, sizeof *odh); + odh->header.version = OFP_VERSION; + odh->header.type = OFPT_DATA_HELLO; + odh->header.xid = htonl(0); + odh->datapath_id = htonll(dp->id); + odh->n_exact = htonl(2 * TABLE_HASH_MAX_FLOWS); + odh->n_mac_only = htonl(TABLE_MAC_MAX_FLOWS); + odh->n_compression = 0; /* Not supported */ + odh->n_general = htonl(TABLE_LINEAR_MAX_FLOWS); + odh->buffer_mb = htonl(UINT32_MAX); + odh->n_buffers = htonl(N_PKT_BUFFERS); + odh->capabilities = htonl(OFP_SUPPORTED_CAPABILITIES); + odh->actions = htonl(OFP_SUPPORTED_ACTIONS); + odh->miss_send_len = htons(dp->miss_send_len); + LIST_FOR_EACH (p, struct sw_port, node, &dp->port_list) { + struct ofp_phy_port *opp = buffer_put_uninit(buffer, sizeof *opp); + memset(opp, 0, sizeof *opp); + fill_port_desc(dp, p, opp); + } + odh = buffer_at_assert(buffer, 0, sizeof *odh); + odh->header.length = htons(buffer->size); + controller_send(dp->cc, buffer); +} + +void +dp_update_port_flags(struct datapath *dp, const struct ofp_phy_port *opp) +{ + struct sw_port *p; + + p = &dp->ports[htons(opp->port_no)]; + + /* Make sure the port id hasn't changed since this was sent */ + if (!p || memcmp(opp->hw_addr, netdev_get_etheraddr(p->netdev), + ETH_ADDR_LEN) != 0) + return; + + p->flags = htonl(opp->flags); +} + +static void +send_port_status(struct sw_port *p, uint8_t status) +{ + struct buffer *buffer; + struct ofp_port_status *ops; + buffer = buffer_new(sizeof *ops); + ops = buffer_put_uninit(buffer, sizeof *ops); + ops->header.version = OFP_VERSION; + ops->header.type = OFPT_PORT_STATUS; + ops->header.length = htons(sizeof(*ops)); + ops->header.xid = htonl(0); + ops->reason = status; + fill_port_desc(p->dp, p, &ops->desc); + controller_send(p->dp->cc, buffer); +} + +void +dp_send_flow_expired(struct datapath *dp, struct sw_flow *flow) +{ + struct buffer *buffer; + struct ofp_flow_expired *ofe; + buffer = buffer_new(sizeof *ofe); + ofe = buffer_put_uninit(buffer, sizeof *ofe); + ofe->header.version = OFP_VERSION; + ofe->header.type = OFPT_FLOW_EXPIRED; + ofe->header.length = htons(sizeof(*ofe)); + ofe->header.xid = htonl(0); + flow_fill_match(&ofe->match, &flow->key); + ofe->duration = htonl(flow->timeout - flow->max_idle - flow->created); + ofe->packet_count = htonll(flow->packet_count); + ofe->byte_count = htonll(flow->byte_count); + controller_send(dp->cc, buffer); +} diff --git a/switch/datapath.h b/switch/datapath.h new file mode 100644 index 00000000..5c21a6b2 --- /dev/null +++ b/switch/datapath.h @@ -0,0 +1,89 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +/* Interface exported by OpenFlow module. */ + +#ifndef DATAPATH_H +#define DATAPATH_H 1 + +#include +#include "openflow.h" +#include "switch-flow.h" +#include "buffer.h" +#include "list.h" + +#define NL_FLOWS_PER_MESSAGE 100 + +/* Capabilities supported by this implementation. */ +#define OFP_SUPPORTED_CAPABILITIES (OFPC_MULTI_PHY_TX) + +/* Actions supported by this implementation. */ +#define OFP_SUPPORTED_ACTIONS ( (1 << OFPAT_OUTPUT) \ + | (1 << OFPAT_SET_DL_VLAN) \ + | (1 << OFPAT_SET_DL_SRC) \ + | (1 << OFPAT_SET_DL_DST) \ + | (1 << OFPAT_SET_NW_SRC) \ + | (1 << OFPAT_SET_NW_DST) \ + | (1 << OFPAT_SET_TP_SRC) \ + | (1 << OFPAT_SET_TP_DST) ) + +struct sw_port { + uint32_t flags; + struct datapath *dp; + struct netdev *netdev; + struct list node; /* Element in datapath.ports. */ +}; + +struct datapath { + struct controller_connection *cc; + + time_t last_timeout; + + /* Unique identifier for this datapath */ + uint64_t id; + + struct sw_chain *chain; /* Forwarding rules. */ + + /* Flags from the control hello message */ + uint16_t hello_flags; + + /* Maximum number of bytes that should be sent for flow misses */ + uint16_t miss_send_len; + + /* Switch ports. */ + struct sw_port ports[OFPP_MAX]; + struct list port_list; /* List of ports, for flooding. */ +}; + +int dp_new(struct datapath **, uint64_t dpid, struct controller_connection *); +int dp_add_port(struct datapath *, const char *netdev); +void dp_run(struct datapath *); +void dp_wait(struct datapath *); + +void dp_output_port(struct datapath *, struct buffer *, + int in_port, int out_port); +void dp_output_control(struct datapath *, struct buffer *, int in_port, + uint32_t buffer_id, size_t max_len, int reason); +void dp_send_hello(struct datapath *); +void dp_send_flow_expired(struct datapath *, struct sw_flow *); +void dp_update_port_flags(struct datapath *dp, const struct ofp_phy_port *opp); + +#endif /* datapath.h */ diff --git a/switch/forward.c b/switch/forward.c new file mode 100644 index 00000000..694a65e8 --- /dev/null +++ b/switch/forward.c @@ -0,0 +1,532 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "forward.h" +#include +#include +#include +#include +#include +#include "datapath.h" +#include "chain.h" +#include "flow.h" +#include "packets.h" + +static void execute_actions(struct datapath *, struct buffer *, + int in_port, const struct sw_flow_key *, + const struct ofp_action *, int n_actions); + +static struct buffer *retrieve_buffer(uint32_t id); +static void discard_buffer(uint32_t id); + +/* 'buffer' was received on 'in_port', a physical switch port between 0 and + * OFPP_MAX. Process it according to 'chain'. */ +void fwd_port_input(struct datapath *dp, struct buffer *buffer, int in_port) +{ + struct sw_flow_key key; + struct sw_flow *flow; + + key.wildcards = 0; + flow_extract(buffer, in_port, &key.flow); + flow = chain_lookup(dp->chain, &key); + if (flow != NULL) { + flow_used(flow, buffer); + execute_actions(dp, buffer, in_port, &key, + flow->actions, flow->n_actions); + } else { + dp_output_control(dp, buffer, in_port, fwd_save_buffer(buffer), + dp->miss_send_len, OFPR_NO_MATCH); + } +} + +static void +do_output(struct datapath *dp, struct buffer *buffer, int in_port, + size_t max_len, int out_port) +{ + if (out_port != OFPP_CONTROLLER) { + dp_output_port(dp, buffer, in_port, out_port); + } else { + dp_output_control(dp, buffer, in_port, fwd_save_buffer(buffer), + max_len, OFPR_ACTION); + } +} + +static void execute_actions(struct datapath *dp, struct buffer *buffer, + int in_port, const struct sw_flow_key *key, + const struct ofp_action *actions, int n_actions) +{ + /* Every output action needs a separate clone of 'buffer', but the common + * case is just a single output action, so that doing a clone and then + * freeing the original buffer is wasteful. So the following code is + * slightly obscure just to avoid that. */ + int prev_port; + size_t max_len=0; /* Initialze to make compiler happy */ + uint16_t eth_proto; + int i; + + prev_port = -1; + eth_proto = ntohs(key->flow.dl_type); + + for (i = 0; i < n_actions; i++) { + const struct ofp_action *a = &actions[i]; + + if (prev_port != -1) { + do_output(dp, buffer_clone(buffer), in_port, max_len, prev_port); + prev_port = -1; + } + + if (a->type == ntohs(OFPAT_OUTPUT)) { + prev_port = ntohs(a->arg.output.port); + max_len = ntohs(a->arg.output.max_len); + } else { + buffer = execute_setter(buffer, eth_proto, key, a); + } + } + if (prev_port != -1) + do_output(dp, buffer, in_port, max_len, prev_port); + else + buffer_delete(buffer); +} + +/* Returns the new checksum for a packet in which the checksum field previously + * contained 'old_csum' and in which a field that contained 'old_u16' was + * changed to contain 'new_u16'. */ +static uint16_t +recalc_csum16(uint16_t old_csum, uint16_t old_u16, uint16_t new_u16) +{ + /* Ones-complement arithmetic is endian-independent, so this code does not + * use htons() or ntohs(). + * + * See RFC 1624 for formula and explanation. */ + uint16_t hc_complement = ~old_csum; + uint16_t m_complement = ~old_u16; + uint16_t m_prime = new_u16; + uint32_t sum = hc_complement + m_complement + m_prime; + uint16_t hc_prime_complement = sum + (sum >> 16); + return ~hc_prime_complement; +} + +/* Returns the new checksum for a packet in which the checksum field previously + * contained 'old_csum' and in which a field that contained 'old_u32' was + * changed to contain 'new_u32'. */ +static uint16_t +recalc_csum32(uint16_t old_csum, uint32_t old_u32, uint32_t new_u32) +{ + return recalc_csum16(recalc_csum16(old_csum, old_u32, new_u32), + old_u32 >> 16, new_u32 >> 16); +} + +static void modify_nh(struct buffer *buffer, uint16_t eth_proto, + uint8_t nw_proto, const struct ofp_action *a) +{ + if (eth_proto == ETH_TYPE_IP) { + struct ip_header *nh = buffer->l3; + uint32_t new, *field; + + new = a->arg.nw_addr; + field = a->type == OFPAT_SET_NW_SRC ? &nh->ip_src : &nh->ip_dst; + if (nw_proto == IP_TYPE_TCP) { + struct tcp_header *th = buffer->l4; + th->tcp_csum = recalc_csum32(th->tcp_csum, *field, new); + } else if (nw_proto == IP_TYPE_UDP) { + struct udp_header *th = buffer->l4; + if (th->udp_csum) { + th->udp_csum = recalc_csum32(th->udp_csum, *field, new); + if (!th->udp_csum) { + th->udp_csum = 0xffff; + } + } + } + nh->ip_csum = recalc_csum32(nh->ip_csum, *field, new); + *field = new; + } +} + +static void modify_th(struct buffer *buffer, uint16_t eth_proto, + uint8_t nw_proto, const struct ofp_action *a) +{ + if (eth_proto == ETH_TYPE_IP) { + uint16_t new, *field; + + new = a->arg.tp; + + if (nw_proto == IP_TYPE_TCP) { + struct tcp_header *th = buffer->l4; + field = a->type == OFPAT_SET_TP_SRC ? &th->tcp_src : &th->tcp_dst; + th->tcp_csum = recalc_csum16(th->tcp_csum, *field, new); + *field = new; + } else if (nw_proto == IP_TYPE_UDP) { + struct udp_header *th = buffer->l4; + field = a->type == OFPAT_SET_TP_SRC ? &th->udp_src : &th->udp_dst; + th->udp_csum = recalc_csum16(th->udp_csum, *field, new); + *field = new; + } + } +} + +static struct buffer * +modify_vlan(struct buffer *buffer, + const struct sw_flow_key *key, const struct ofp_action *a) +{ + uint16_t new_id = a->arg.vlan_id; + struct vlan_eth_header *veh; + + if (new_id != OFP_VLAN_NONE) { + if (key->flow.dl_vlan != htons(OFP_VLAN_NONE)) { + /* Modify vlan id, but maintain other TCI values */ + veh = buffer->l2; + veh->veth_tci &= ~htons(VLAN_VID); + veh->veth_tci |= htons(new_id); + } else { + /* Insert new vlan id. */ + struct eth_header *eh = buffer->l2; + struct vlan_eth_header tmp; + memcpy(tmp.veth_dst, eh->eth_dst, ETH_ADDR_LEN); + memcpy(tmp.veth_src, eh->eth_src, ETH_ADDR_LEN); + tmp.veth_type = htons(ETH_TYPE_VLAN); + tmp.veth_tci = new_id; + tmp.veth_next_type = eh->eth_type; + + veh = buffer_push_uninit(buffer, VLAN_HEADER_LEN); + memcpy(veh, &tmp, sizeof tmp); + buffer->l2 -= VLAN_HEADER_LEN; + } + } else { + /* Remove an existing vlan header if it exists */ + veh = buffer->l2; + if (veh->veth_type == htons(ETH_TYPE_VLAN)) { + struct eth_header tmp; + + memcpy(tmp.eth_dst, veh->veth_dst, ETH_ADDR_LEN); + memcpy(tmp.eth_src, veh->veth_src, ETH_ADDR_LEN); + tmp.eth_type = veh->veth_next_type; + + buffer->size -= VLAN_HEADER_LEN; + buffer->data += VLAN_HEADER_LEN; + buffer->l2 += VLAN_HEADER_LEN; + memcpy(buffer->data, &tmp, sizeof tmp); + } + } + + return buffer; +} + +struct buffer *execute_setter(struct buffer *buffer, uint16_t eth_proto, + const struct sw_flow_key *key, const struct ofp_action *a) +{ + switch (a->type) { + case OFPAT_SET_DL_VLAN: + buffer = modify_vlan(buffer, key, a); + break; + + case OFPAT_SET_DL_SRC: { + struct eth_header *eh = buffer->l2; + memcpy(eh->eth_src, a->arg.dl_addr, sizeof eh->eth_src); + break; + } + case OFPAT_SET_DL_DST: { + struct eth_header *eh = buffer->l2; + memcpy(eh->eth_dst, a->arg.dl_addr, sizeof eh->eth_dst); + break; + } + + case OFPAT_SET_NW_SRC: + case OFPAT_SET_NW_DST: + modify_nh(buffer, eth_proto, key->flow.nw_proto, a); + break; + + case OFPAT_SET_TP_SRC: + case OFPAT_SET_TP_DST: + modify_th(buffer, eth_proto, key->flow.nw_proto, a); + break; + + default: + NOT_REACHED(); + } + + return buffer; +} + +static int +recv_control_hello(struct datapath *dp, const void *msg) +{ + const struct ofp_control_hello *och = msg; + + printf("control_hello(version=%d)\n", ntohl(och->version)); + + if (ntohs(och->miss_send_len) != OFP_MISS_SEND_LEN_UNCHANGED) { + dp->miss_send_len = ntohs(och->miss_send_len); + } + + dp->hello_flags = ntohs(och->flags); + + dp_send_hello(dp); + + return 0; +} + +static int +recv_packet_out(struct datapath *dp, const void *msg) +{ + const struct ofp_packet_out *opo = msg; + + if (ntohl(opo->buffer_id) == (uint32_t) -1) { + /* FIXME: can we avoid copying data here? */ + int data_len = ntohs(opo->header.length) - sizeof *opo; + struct buffer *buffer = buffer_new(data_len); + buffer_put(buffer, opo->u.data, data_len); + dp_output_port(dp, buffer, + ntohs(opo->in_port), ntohs(opo->out_port)); + } else { + struct sw_flow_key key; + struct buffer *buffer; + int n_acts; + + buffer = retrieve_buffer(ntohl(opo->buffer_id)); + if (!buffer) { + return -ESRCH; + } + + n_acts = (ntohs(opo->header.length) - sizeof *opo) + / sizeof *opo->u.actions; + flow_extract(buffer, ntohs(opo->in_port), &key.flow); + execute_actions(dp, buffer, ntohs(opo->in_port), + &key, opo->u.actions, n_acts); + } + return 0; +} + +static int +recv_port_mod(struct datapath *dp, const void *msg) +{ + const struct ofp_port_mod *opm = msg; + + dp_update_port_flags(dp, &opm->desc); + + return 0; +} + +static int +add_flow(struct datapath *dp, const struct ofp_flow_mod *ofm) +{ + int error = -ENOMEM; + int n_acts; + struct sw_flow *flow; + + + /* Check number of actions. */ + n_acts = (ntohs(ofm->header.length) - sizeof *ofm) / sizeof *ofm->actions; + if (n_acts > MAX_ACTIONS) { + error = -E2BIG; + goto error; + } + + /* Allocate memory. */ + flow = flow_alloc(n_acts); + if (flow == NULL) + goto error; + + /* Fill out flow. */ + flow_extract_match(&flow->key, &ofm->match); + flow->group_id = ntohl(ofm->group_id); + flow->max_idle = ntohs(ofm->max_idle); + flow->timeout = time(0) + flow->max_idle; /* FIXME */ + flow->n_actions = n_acts; + flow->created = time(0); /* FIXME */ + flow->byte_count = 0; + flow->packet_count = 0; + memcpy(flow->actions, ofm->actions, n_acts * sizeof *flow->actions); + + /* Act. */ + error = chain_insert(dp->chain, flow); + if (error) { + goto error_free_flow; + } + error = 0; + if (ntohl(ofm->buffer_id) != UINT32_MAX) { + struct buffer *buffer = retrieve_buffer(ntohl(ofm->buffer_id)); + if (buffer) { + struct sw_flow_key key; + uint16_t in_port = ntohs(ofm->match.in_port); + flow_used(flow, buffer); + flow_extract(buffer, in_port, &key.flow); + execute_actions(dp, buffer, in_port, + &key, ofm->actions, n_acts); + } else { + error = -ESRCH; + } + } + return error; + +error_free_flow: + flow_free(flow); +error: + if (ntohl(ofm->buffer_id) != (uint32_t) -1) + discard_buffer(ntohl(ofm->buffer_id)); + return error; +} + +static int +recv_flow(struct datapath *dp, const void *msg) +{ + const struct ofp_flow_mod *ofm = msg; + uint16_t command = ntohs(ofm->command); + + if (command == OFPFC_ADD) { + return add_flow(dp, ofm); + } else if (command == OFPFC_DELETE) { + struct sw_flow_key key; + flow_extract_match(&key, &ofm->match); + return chain_delete(dp->chain, &key, 0) ? 0 : -ESRCH; + } else if (command == OFPFC_DELETE_STRICT) { + struct sw_flow_key key; + flow_extract_match(&key, &ofm->match); + return chain_delete(dp->chain, &key, 1) ? 0 : -ESRCH; + } else { + return -ENODEV; + } +} + +/* 'msg', which is 'length' bytes long, was received from the control path. + * Apply it to 'chain'. */ +int +fwd_control_input(struct datapath *dp, const void *msg, size_t length) +{ + + struct openflow_packet { + size_t min_size; + int (*handler)(struct datapath *, const void *); + }; + + static const struct openflow_packet packets[] = { + [OFPT_CONTROL_HELLO] = { + sizeof (struct ofp_control_hello), + recv_control_hello, + }, + [OFPT_PACKET_OUT] = { + sizeof (struct ofp_packet_out), + recv_packet_out, + }, + [OFPT_FLOW_MOD] = { + sizeof (struct ofp_flow_mod), + recv_flow, + }, + [OFPT_PORT_MOD] = { + sizeof (struct ofp_port_mod), + recv_port_mod, + }, + }; + + const struct openflow_packet *pkt; + struct ofp_header *oh; + + if (length < sizeof(struct ofp_header)) + return -EINVAL; + + oh = (struct ofp_header *) msg; + if (oh->version != 1 || oh->type >= ARRAY_SIZE(packets) + || ntohs(oh->length) > length) + return -EINVAL; + + pkt = &packets[oh->type]; + if (!pkt->handler) + return -ENOSYS; + if (length < pkt->min_size) + return -EFAULT; + + return pkt->handler(dp, msg); +} + +/* Packet buffering. */ + +#define OVERWRITE_SECS 1 + +struct packet_buffer { + struct buffer *buffer; + uint32_t cookie; + time_t timeout; +}; + +static struct packet_buffer buffers[N_PKT_BUFFERS]; +static unsigned int buffer_idx; + +uint32_t fwd_save_buffer(struct buffer *buffer) +{ + struct packet_buffer *p; + uint32_t id; + + buffer_idx = (buffer_idx + 1) & PKT_BUFFER_MASK; + p = &buffers[buffer_idx]; + if (p->buffer) { + /* Don't buffer packet if existing entry is less than + * OVERWRITE_SECS old. */ + if (time(0) < p->timeout) { /* FIXME */ + return -1; + } else { + buffer_delete(p->buffer); + } + } + /* Don't use maximum cookie value since the all-bits-1 id is + * special. */ + if (++p->cookie >= (1u << PKT_COOKIE_BITS) - 1) + p->cookie = 0; + p->buffer = buffer_clone(buffer); /* FIXME */ + p->timeout = time(0) + OVERWRITE_SECS; /* FIXME */ + id = buffer_idx | (p->cookie << PKT_BUFFER_BITS); + + return id; +} + +static struct buffer *retrieve_buffer(uint32_t id) +{ + struct buffer *buffer = NULL; + struct packet_buffer *p; + + p = &buffers[id & PKT_BUFFER_MASK]; + if (p->cookie == id >> PKT_BUFFER_BITS) { + buffer = p->buffer; + p->buffer = NULL; + } else { + printf("cookie mismatch: %x != %x\n", + id >> PKT_BUFFER_BITS, p->cookie); + } + + return buffer; +} + +static void discard_buffer(uint32_t id) +{ + struct packet_buffer *p; + + p = &buffers[id & PKT_BUFFER_MASK]; + if (p->cookie == id >> PKT_BUFFER_BITS) { + buffer_delete(p->buffer); + p->buffer = NULL; + } +} + +void fwd_exit(void) +{ + int i; + + for (i = 0; i < N_PKT_BUFFERS; i++) + buffer_delete(buffers[i].buffer); +} diff --git a/switch/forward.h b/switch/forward.h new file mode 100644 index 00000000..6afad4fe --- /dev/null +++ b/switch/forward.h @@ -0,0 +1,56 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef FORWARD_H +#define FORWARD_H 1 + +#include +#include + +struct buffer; +struct datapath; +struct ofp_action; +struct sw_flow_key; + +/* Buffers are identified to userspace by a 31-bit opaque ID. We divide the ID + * into a buffer number (low bits) and a cookie (high bits). The buffer number + * is an index into an array of buffers. The cookie distinguishes between + * different packets that have occupied a single buffer. Thus, the more + * buffers we have, the lower-quality the cookie... */ +#define PKT_BUFFER_BITS 8 +#define N_PKT_BUFFERS (1 << PKT_BUFFER_BITS) +#define PKT_BUFFER_MASK (N_PKT_BUFFERS - 1) + +#define PKT_COOKIE_BITS (32 - PKT_BUFFER_BITS) + + +void fwd_port_input(struct datapath *, struct buffer *, int in_port); +int fwd_control_input(struct datapath *, const void *, size_t); + +uint32_t fwd_save_buffer(struct buffer *); + +void fwd_exit(void); + +struct buffer *execute_setter(struct buffer *, uint16_t, + const struct sw_flow_key *, + const struct ofp_action *); + +#endif /* forward.h */ diff --git a/switch/netdev.c b/switch/netdev.c new file mode 100644 index 00000000..b606c04d --- /dev/null +++ b/switch/netdev.c @@ -0,0 +1,487 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "netdev.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "list.h" +#include "fatal-signal.h" +#include "buffer.h" +#include "openflow.h" +#include "packets.h" + +#define THIS_MODULE VLM_netdev +#include "vlog.h" + +struct netdev { + struct list node; + char *name; + int fd; + uint8_t etheraddr[ETH_ADDR_LEN]; + int speed; + uint32_t features; + int save_flags; +}; + +static struct list netdev_list = LIST_INITIALIZER(&netdev_list); + +static void init_netdev(void); +static int restore_flags(struct netdev *netdev); + +/* Check whether device NAME has an IPv4 address assigned to it and, if so, log + * an error. */ +static void +check_ipv4_address(const char *name) +{ + int sock; + struct ifreq ifr; + + sock = socket(AF_INET, SOCK_DGRAM, 0); + if (sock < 0) { + VLOG_WARN("socket(AF_INET): %s", strerror(errno)); + return; + } + + strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name); + ifr.ifr_addr.sa_family = AF_INET; + if (ioctl(sock, SIOCGIFADDR, &ifr) == 0) { + VLOG_ERR("%s device has assigned IP address %s", name, + inet_ntoa(((struct sockaddr_in*) &ifr.ifr_addr)->sin_addr)); + } + + close(sock); +} + +static void +check_ipv6_address(const char *name) +{ + FILE *file; + char line[128]; + + file = fopen("/proc/net/if_inet6", "r"); + if (file == NULL) { + return; + } + + while (fgets(line, sizeof line, file)) { + struct in6_addr in6; + uint8_t *s6 = in6.s6_addr; + char ifname[16 + 1]; + +#define X8 "%2"SCNx8 + if (sscanf(line, " "X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 X8 + "%*x %*x %*x %*x %16s\n", + &s6[0], &s6[1], &s6[2], &s6[3], + &s6[4], &s6[5], &s6[6], &s6[7], + &s6[8], &s6[9], &s6[10], &s6[11], + &s6[12], &s6[13], &s6[14], &s6[15], + ifname) == 17 + && !strcmp(name, ifname)) + { + char in6_name[INET6_ADDRSTRLEN + 1]; + inet_ntop(AF_INET6, &in6, in6_name, sizeof in6_name); + VLOG_ERR("%s device has assigned IPv6 address %s", + name, in6_name); + } + } + + fclose(file); +} + +static void +do_ethtool(struct netdev *netdev) +{ + struct ifreq ifr; + struct ethtool_cmd ecmd; + + netdev->speed = 0; + netdev->features = 0; + + memset(&ifr, 0, sizeof ifr); + strncpy(ifr.ifr_name, netdev->name, sizeof ifr.ifr_name); + ifr.ifr_data = (caddr_t) &ecmd; + + memset(&ecmd, 0, sizeof ecmd); + ecmd.cmd = ETHTOOL_GSET; + if (ioctl(netdev->fd, SIOCETHTOOL, &ifr) == 0) { + if (ecmd.supported & SUPPORTED_10baseT_Half) { + netdev->features |= OFPPF_10MB_HD; + } + if (ecmd.supported & SUPPORTED_10baseT_Full) { + netdev->features |= OFPPF_10MB_FD; + } + if (ecmd.supported & SUPPORTED_100baseT_Half) { + netdev->features |= OFPPF_100MB_HD; + } + if (ecmd.supported & SUPPORTED_100baseT_Full) { + netdev->features |= OFPPF_100MB_FD; + } + if (ecmd.supported & SUPPORTED_1000baseT_Half) { + netdev->features |= OFPPF_1GB_HD; + } + if (ecmd.supported & SUPPORTED_1000baseT_Full) { + netdev->features |= OFPPF_1GB_FD; + } + /* 10Gbps half-duplex doesn't exist... */ + if (ecmd.supported & SUPPORTED_10000baseT_Full) { + netdev->features |= OFPPF_10GB_FD; + } + + switch (ecmd.speed) { + case SPEED_10: + netdev->speed = 10; + break; + + case SPEED_100: + netdev->speed = 100; + break; + + case SPEED_1000: + netdev->speed = 1000; + break; + + case SPEED_2500: + netdev->speed = 2500; + break; + + case SPEED_10000: + netdev->speed = 10000; + break; + } + } else { + VLOG_DBG("ioctl(SIOCETHTOOL) failed: %s", strerror(errno)); + } +} + +int +netdev_open(const char *name, struct netdev **netdev_) +{ + int fd; + struct sockaddr sa; + struct ifreq ifr; + unsigned int ifindex; + socklen_t rcvbuf_len; + size_t rcvbuf; + uint8_t etheraddr[ETH_ADDR_LEN]; + int error; + struct netdev *netdev; + + *netdev_ = NULL; + init_netdev(); + + /* Create raw socket. + * + * We have to use SOCK_PACKET, despite its deprecation, because only + * SOCK_PACKET lets us set the hardware source address of outgoing + * packets. */ + fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL)); + if (fd < 0) { + return errno; + } + + /* Bind to specific ethernet device. */ + memset(&sa, 0, sizeof sa); + sa.sa_family = AF_UNSPEC; + strncpy((char *) sa.sa_data, name, sizeof sa.sa_data); + if (bind(fd, &sa, sizeof sa) < 0) { + VLOG_ERR("bind to %s failed: %s", name, strerror(errno)); + goto error; + } + + /* Between the socket() and bind() calls above, the socket receives all + * packets on all system interfaces. We do not want to receive that + * data, but there is no way to avoid it. So we must now drain out the + * receive queue. There is no way to know how long the receive queue is, + * but we know that the total number of byted queued does not exceed the + * receive buffer size, so we pull packets until none are left or we've + * read that many bytes. */ + rcvbuf_len = sizeof rcvbuf; + if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, &rcvbuf_len) < 0) { + VLOG_ERR("getsockopt(SO_RCVBUF) on %s device failed: %s", + name, strerror(errno)); + goto error; + } + while (rcvbuf > 0) { + char buffer; + ssize_t n_bytes = recv(fd, &buffer, 1, MSG_TRUNC | MSG_DONTWAIT); + if (n_bytes <= 0) { + break; + } + rcvbuf -= n_bytes; + } + + /* Get ethernet device index and hardware address. */ + strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name); + if (ioctl(fd, SIOCGIFINDEX, &ifr) < 0) { + VLOG_ERR("ioctl(SIOCGIFINDEX) on %s device failed: %s", + name, strerror(errno)); + goto error; + } + ifindex = ifr.ifr_ifindex; + if (ioctl(fd, SIOCGIFHWADDR, &ifr) < 0) { + VLOG_ERR("ioctl(SIOCGIFHWADDR) on %s device failed: %s", + name, strerror(errno)); + goto error; + } + if (ifr.ifr_hwaddr.sa_family != AF_UNSPEC + && ifr.ifr_hwaddr.sa_family != ARPHRD_ETHER) { + VLOG_WARN("%s device has unknown hardware address family %d", + name, (int) ifr.ifr_hwaddr.sa_family); + } + memcpy(etheraddr, ifr.ifr_hwaddr.sa_data, sizeof etheraddr); + + /* Allocate network device. */ + netdev = xmalloc(sizeof *netdev); + netdev->name = xstrdup(name); + netdev->fd = fd; + memcpy(netdev->etheraddr, etheraddr, sizeof etheraddr); + + /* Get speed, features. */ + do_ethtool(netdev); + + /* Save flags to restore at close or exit. */ + if (ioctl(fd, SIOCGIFFLAGS, &ifr) < 0) { + VLOG_ERR("ioctl(SIOCGIFFLAGS) on %s device failed: %s", + name, strerror(errno)); + goto error; + } + netdev->save_flags = ifr.ifr_flags; + fatal_signal_block(); + list_push_back(&netdev_list, &netdev->node); + fatal_signal_unblock(); + + /* Bring up interface and set promiscuous mode. */ + ifr.ifr_flags |= IFF_PROMISC | IFF_UP; + if (ioctl(fd, SIOCSIFFLAGS, &ifr) < 0) { + error = errno; + VLOG_ERR("failed to set promiscuous mode on %s device: %s", + name, strerror(errno)); + netdev_close(netdev); + return error; + } + + /* Report IP addresses to administrator. */ + check_ipv4_address(name); + check_ipv6_address(name); + + /* Success! */ + *netdev_ = netdev; + return 0; + +error: + error = errno; + close(fd); + return error; +} + +void +netdev_close(struct netdev *netdev) +{ + if (netdev) { + /* Bring down interface and drop promiscuous mode, if we brought up + * the interface or enabled promiscuous mode. */ + int error; + fatal_signal_block(); + error = restore_flags(netdev); + list_remove(&netdev->node); + fatal_signal_unblock(); + if (error) { + VLOG_WARN("failed to restore network device flags on %s: %s", + netdev->name, strerror(error)); + } + + /* Free. */ + free(netdev->name); + close(netdev->fd); + free(netdev); + } +} + +static void +pad_to_minimum_length(struct buffer *buffer) +{ + if (buffer->size < ETH_TOTAL_MIN) { + size_t shortage = ETH_TOTAL_MIN - buffer->size; + memset(buffer_put_uninit(buffer, shortage), 0, shortage); + } +} + +int +netdev_recv(struct netdev *netdev, struct buffer *buffer, bool block) +{ + ssize_t n_bytes; + + assert(buffer->size == 0); + assert(buffer_tailroom(buffer) >= ETH_TOTAL_MIN); + do { + n_bytes = recv(netdev->fd, + buffer_tail(buffer), buffer_tailroom(buffer), + block ? 0 : MSG_DONTWAIT); + } while (n_bytes < 0 && errno == EINTR); + if (n_bytes < 0) { + if (errno != EAGAIN) { + VLOG_WARN("error receiving Ethernet packet on %s: %s", + strerror(errno), netdev->name); + } + return errno; + } else { + buffer->size += n_bytes; + + /* When the kernel internally sends out an Ethernet frame on an + * interface, it gives us a copy *before* padding the frame to the + * minimum length. Thus, when it sends out something like an ARP + * request, we see a too-short frame. So pad it out to the minimum + * length. */ + pad_to_minimum_length(buffer); + return 0; + } +} + +int +netdev_send(struct netdev *netdev, struct buffer *buffer, bool block) +{ + ssize_t n_bytes; + const struct eth_header *eh; + struct sockaddr_pkt spkt; + + /* Ensure packet is long enough. (Although all incoming packets are at + * least ETH_TOTAL_MIN bytes long, we could have trimmed some data off a + * minimum-size packet, e.g. by dropping a vlan header.) */ + pad_to_minimum_length(buffer); + + /* Construct packet sockaddr, which SOCK_PACKET requires. */ + spkt.spkt_family = AF_PACKET; + strncpy((char *) spkt.spkt_device, netdev->name, sizeof spkt.spkt_device); + eh = buffer_at_assert(buffer, 0, sizeof *eh); + spkt.spkt_protocol = eh->eth_type; + + do { + n_bytes = sendto(netdev->fd, buffer->data, buffer->size, + block ? 0 : MSG_DONTWAIT, + (const struct sockaddr *) &spkt, sizeof spkt); + } while (n_bytes < 0 && errno == EINTR); + if (n_bytes < 0) { + if (errno != EAGAIN) { + VLOG_WARN("error sending Ethernet packet on %s: %s", + netdev->name, strerror(errno)); + } + return errno; + } else if (n_bytes != buffer->size) { + VLOG_WARN("send partial Ethernet packet (%d bytes of %d) on %s", + (int) n_bytes, buffer->size, netdev->name); + return EMSGSIZE; + } else { + return 0; + } +} + +const uint8_t * +netdev_get_etheraddr(const struct netdev *netdev) +{ + return netdev->etheraddr; +} + +int +netdev_get_fd(const struct netdev *netdev) +{ + return netdev->fd; +} + +const char * +netdev_get_name(const struct netdev *netdev) +{ + return netdev->name; +} + +int +netdev_get_speed(const struct netdev *netdev) +{ + return netdev->speed; +} + +uint32_t +netdev_get_features(const struct netdev *netdev) +{ + return netdev->features; +} + +static void restore_all_flags(void *aux); + +static void +init_netdev(void) +{ + static bool inited; + if (!inited) { + inited = true; + fatal_signal_add_hook(restore_all_flags, NULL); + } +} + +static int +restore_flags(struct netdev *netdev) +{ + struct ifreq ifr; + + /* Get current flags. */ + strncpy(ifr.ifr_name, netdev->name, sizeof ifr.ifr_name); + if (ioctl(netdev->fd, SIOCGIFFLAGS, &ifr) < 0) { + return errno; + } + + /* Restore flags that we might have changed, if necessary. */ + if ((ifr.ifr_flags ^ netdev->save_flags) & (IFF_PROMISC | IFF_UP)) { + ifr.ifr_flags &= ~(IFF_PROMISC | IFF_UP); + ifr.ifr_flags |= netdev->save_flags & (IFF_PROMISC | IFF_UP); + if (ioctl(netdev->fd, SIOCSIFFLAGS, &ifr) < 0) { + return errno; + } + } + + return 0; +} + +static void +restore_all_flags(void *aux UNUSED) +{ + struct netdev *netdev; + LIST_FOR_EACH (netdev, struct netdev, node, &netdev_list) { + restore_flags(netdev); + } +} diff --git a/switch/netdev.h b/switch/netdev.h new file mode 100644 index 00000000..aa1a5269 --- /dev/null +++ b/switch/netdev.h @@ -0,0 +1,41 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef NETDEV_H +#define NETDEV_H 1 + +#include +#include + +struct buffer; + +struct netdev; +int netdev_open(const char *name, struct netdev **); +void netdev_close(struct netdev *); +int netdev_recv(struct netdev *, struct buffer *, bool block); +int netdev_send(struct netdev *, struct buffer *, bool block); +const uint8_t *netdev_get_etheraddr(const struct netdev *); +int netdev_get_fd(const struct netdev *); +const char *netdev_get_name(const struct netdev *); +int netdev_get_speed(const struct netdev *); +uint32_t netdev_get_features(const struct netdev *); + +#endif /* netdev.h */ diff --git a/switch/switch-flow.c b/switch/switch-flow.c new file mode 100644 index 00000000..cacf690a --- /dev/null +++ b/switch/switch-flow.c @@ -0,0 +1,168 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "switch-flow.h" +#include +#include +#include +#include +#include "buffer.h" +#include "openflow.h" +#include "packets.h" + +/* Internal function used to compare fields in flow. */ +static inline +int flow_fields_match(const struct flow *a, const struct flow *b, uint16_t w) +{ + return ((w & OFPFW_IN_PORT || a->in_port == b->in_port) + && (w & OFPFW_DL_VLAN || a->dl_vlan == b->dl_vlan) + && (w & OFPFW_DL_SRC || !memcmp(a->dl_src, b->dl_src, ETH_ADDR_LEN)) + && (w & OFPFW_DL_DST || !memcmp(a->dl_dst, b->dl_dst, ETH_ADDR_LEN)) + && (w & OFPFW_DL_TYPE || a->dl_type == b->dl_type) + && (w & OFPFW_NW_SRC || a->nw_src == b->nw_src) + && (w & OFPFW_NW_DST || a->nw_dst == b->nw_dst) + && (w & OFPFW_NW_PROTO || a->nw_proto == b->nw_proto) + && (w & OFPFW_TP_SRC || a->tp_src == b->tp_src) + && (w & OFPFW_TP_DST || a->tp_dst == b->tp_dst)); +} + +/* Returns nonzero if 'a' and 'b' match, that is, if their fields are equal + * modulo wildcards, zero otherwise. */ +inline +int flow_matches(const struct sw_flow_key *a, const struct sw_flow_key *b) +{ + return flow_fields_match(&a->flow, &b->flow, a->wildcards | b->wildcards); +} + +/* Returns nonzero if 't' (the table entry's key) and 'd' (the key + * describing the deletion) match, that is, if their fields are + * equal modulo wildcards, zero otherwise. If 'strict' is nonzero, the + * wildcards must match in both 't_key' and 'd_key'. Note that the + * table's wildcards are ignored unless 'strict' is set. */ +inline +int flow_del_matches(const struct sw_flow_key *t, const struct sw_flow_key *d, int strict) +{ + if (strict && t->wildcards != d->wildcards) + return 0; + + return flow_fields_match(&t->flow, &d->flow, d->wildcards); +} + +void flow_extract_match(struct sw_flow_key* to, const struct ofp_match* from) +{ + to->wildcards = ntohs(from->wildcards) & OFPFW_ALL; + to->flow.in_port = from->in_port; + to->flow.dl_vlan = from->dl_vlan; + memcpy(to->flow.dl_src, from->dl_src, ETH_ADDR_LEN); + memcpy(to->flow.dl_dst, from->dl_dst, ETH_ADDR_LEN); + to->flow.dl_type = from->dl_type; + to->flow.nw_src = from->nw_src; + to->flow.nw_dst = from->nw_dst; + to->flow.nw_proto = from->nw_proto; + to->flow.tp_src = from->tp_src; + to->flow.tp_dst = from->tp_dst; + to->flow.reserved = 0; +} + +void flow_fill_match(struct ofp_match* to, const struct sw_flow_key* from) +{ + to->wildcards = htons(from->wildcards); + to->in_port = from->flow.in_port; + to->dl_vlan = from->flow.dl_vlan; + memcpy(to->dl_src, from->flow.dl_src, ETH_ADDR_LEN); + memcpy(to->dl_dst, from->flow.dl_dst, ETH_ADDR_LEN); + to->dl_type = from->flow.dl_type; + to->nw_src = from->flow.nw_src; + to->nw_dst = from->flow.nw_dst; + to->nw_proto = from->flow.nw_proto; + to->tp_src = from->flow.tp_src; + to->tp_dst = from->flow.tp_dst; + memset(to->pad, '\0', sizeof(to->pad)); +} + +/* Allocates and returns a new flow with 'n_actions' action, using allocation + * flags 'flags'. Returns the new flow or a null pointer on failure. */ +struct sw_flow *flow_alloc(int n_actions) +{ + struct sw_flow *flow = malloc(sizeof *flow); + if (!flow) + return NULL; + + flow->n_actions = n_actions; + flow->actions = malloc(n_actions * sizeof *flow->actions); + if (!flow->actions && n_actions > 0) { + free(flow); + return NULL; + } + return flow; +} + +/* Frees 'flow' immediately. */ +void flow_free(struct sw_flow *flow) +{ + if (!flow) { + return; + } + free(flow->actions); + free(flow); +} + +/* Prints a representation of 'key' to the kernel log. */ +void print_flow(const struct sw_flow_key *key) +{ + const struct flow *f = &key->flow; + printf("wild%04x port%04x:vlan%04x mac%02x:%02x:%02x:%02x:%02x:%02x" + "->%02x:%02x:%02x:%02x:%02x:%02x " + "proto%04x ip%u.%u.%u.%u->%u.%u.%u.%u port%d->%d\n", + key->wildcards, ntohs(f->in_port), ntohs(f->dl_vlan), + f->dl_src[0], f->dl_src[1], f->dl_src[2], + f->dl_src[3], f->dl_src[4], f->dl_src[5], + f->dl_dst[0], f->dl_dst[1], f->dl_dst[2], + f->dl_dst[3], f->dl_dst[4], f->dl_dst[5], + ntohs(f->dl_type), + ((unsigned char *)&f->nw_src)[0], + ((unsigned char *)&f->nw_src)[1], + ((unsigned char *)&f->nw_src)[2], + ((unsigned char *)&f->nw_src)[3], + ((unsigned char *)&f->nw_dst)[0], + ((unsigned char *)&f->nw_dst)[1], + ((unsigned char *)&f->nw_dst)[2], + ((unsigned char *)&f->nw_dst)[3], + ntohs(f->tp_src), ntohs(f->tp_dst)); +} + +int flow_timeout(struct sw_flow *flow) +{ + if (flow->max_idle == OFP_FLOW_PERMANENT) + return 0; + + /* FIXME */ + return time(0) > flow->timeout; +} + +void flow_used(struct sw_flow *flow, struct buffer *buffer) +{ + if (flow->max_idle != OFP_FLOW_PERMANENT) + flow->timeout = time(0) + flow->max_idle; + + flow->packet_count++; + flow->byte_count += buffer->size; +} diff --git a/switch/switch-flow.h b/switch/switch-flow.h new file mode 100644 index 00000000..69417583 --- /dev/null +++ b/switch/switch-flow.h @@ -0,0 +1,69 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef SWITCH_FLOW_H +#define SWITCH_FLOW_H 1 + +#include +#include "flow.h" +#include "list.h" + +struct ofp_match; + +/* Identification data for a flow. */ +struct sw_flow_key { + struct flow flow; /* Flow data (in network byte order). */ + uint32_t wildcards; /* Wildcard fields (in host byte order). */ +}; + +/* Maximum number of actions in a single flow entry. */ +#define MAX_ACTIONS 16 + +struct sw_flow { + struct sw_flow_key key; + + uint32_t group_id; /* Flow group ID (for QoS). */ + uint16_t max_idle; /* Idle time before discarding (seconds). */ + time_t created; /* When the flow was created. */ + time_t timeout; /* When the flow expires (if idle). */ + uint64_t packet_count; /* Number of packets seen. */ + uint64_t byte_count; /* Number of bytes seen. */ + struct list node; + + /* Actions (XXX probably most flows have only a single action). */ + unsigned int n_actions; + struct ofp_action *actions; +}; + +int flow_matches(const struct sw_flow_key *, const struct sw_flow_key *); +int flow_del_matches(const struct sw_flow_key *, const struct sw_flow_key *, + int); +struct sw_flow *flow_alloc(int n_actions); +void flow_free(struct sw_flow *); +void flow_deferred_free(struct sw_flow *); +void flow_extract_match(struct sw_flow_key* to, const struct ofp_match* from); +void flow_fill_match(struct ofp_match* to, const struct sw_flow_key* from); + +void print_flow(const struct sw_flow_key *); +int flow_timeout(struct sw_flow *flow); +void flow_used(struct sw_flow *flow, struct buffer *buffer); + +#endif /* switch-flow.h */ diff --git a/switch/switch.c b/switch/switch.c new file mode 100644 index 00000000..65f8f575 --- /dev/null +++ b/switch/switch.c @@ -0,0 +1,225 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include "command-line.h" +#include "controller.h" +#include "datapath.h" +#include "fault.h" +#include "openflow.h" +#include "poll-loop.h" +#include "queue.h" +#include "util.h" +#include "vconn.h" +#include "vconn-ssl.h" +#include "vlog-socket.h" + +#define THIS_MODULE VLM_switch +#include "vlog.h" + +static void parse_options(int argc, char *argv[]); +static void usage(void) NO_RETURN; + +static bool reliable = true; +static struct datapath *dp; +static uint64_t dpid = UINT64_MAX; +static char *port_list; + +static void add_ports(struct datapath *dp, char *port_list); + +int +main(int argc, char *argv[]) +{ + struct controller_connection cc; + int error; + + set_program_name(argv[0]); + register_fault_handlers(); + vlog_init(); + parse_options(argc, argv); + + if (argc - optind != 1) { + fatal(0, "missing controller argument; use --help for usage"); + } + + controller_init(&cc, argv[optind], reliable); + error = dp_new(&dp, dpid, &cc); + if (error) { + fatal(error, "could not create datapath"); + } + if (port_list) { + add_ports(dp, port_list); + } + + error = vlog_server_listen(NULL, NULL); + if (error) { + fatal(error, "could not listen for vlog connections"); + } + + for (;;) { + controller_run(&cc, dp); + dp_run(dp); + dp_wait(dp); + controller_wait(&cc); + poll_block(); + } + + return 0; +} + +static void +add_ports(struct datapath *dp, char *port_list) +{ + char *port, *save_ptr; + + /* Glibc 2.7 has a bug in strtok_r when compiling with optimization that + * can cause segfaults here: + * http://sources.redhat.com/bugzilla/show_bug.cgi?id=5614. + * Using ",," instead of the obvious "," works around it. */ + for (port = strtok_r(port_list, ",,", &save_ptr); port; + port = strtok_r(NULL, ",,", &save_ptr)) { + int error = dp_add_port(dp, port); + if (error) { + fatal(error, "failed to add port %s", port); + } + } +} + +static void +parse_options(int argc, char *argv[]) +{ + static struct option long_options[] = { + {"interfaces", required_argument, 0, 'i'}, + {"unreliable", no_argument, 0, 'u'}, + {"datapath-id", required_argument, 0, 'd'}, + {"verbose", optional_argument, 0, 'v'}, + {"help", no_argument, 0, 'h'}, + {"version", no_argument, 0, 'V'}, +#ifdef HAVE_OPENSSL + {"private-key", required_argument, 0, 'p'}, + {"certificate", required_argument, 0, 'c'}, + {"ca-cert", required_argument, 0, 'C'}, +#endif + {0, 0, 0, 0}, + }; + char *short_options = long_options_to_short_options(long_options); + + for (;;) { + int indexptr; + int c; + + c = getopt_long(argc, argv, short_options, long_options, &indexptr); + if (c == -1) { + break; + } + + switch (c) { + case 'u': + reliable = false; + break; + + case 'd': + if (strlen(optarg) != 12 + || strspn(optarg, "0123456789abcdefABCDEF") != 12) { + fatal(0, "argument to -d or --datapath-id must be " + "exactly 12 hex digits"); + } + dpid = strtoll(optarg, NULL, 16); + if (!dpid) { + fatal(0, "argument to -d or --datapath-id must be nonzero"); + } + break; + + case 'h': + usage(); + + case 'V': + printf("%s "VERSION" compiled "__DATE__" "__TIME__"\n", argv[0]); + exit(EXIT_SUCCESS); + + case 'v': + vlog_set_verbosity(optarg); + break; + + case 'i': + if (!port_list) { + port_list = optarg; + } else { + port_list = xasprintf("%s,%s", port_list, optarg); + } + break; + +#ifdef HAVE_OPENSSL + case 'p': + vconn_ssl_set_private_key_file(optarg); + break; + + case 'c': + vconn_ssl_set_certificate_file(optarg); + break; + + case 'C': + vconn_ssl_set_ca_cert_file(optarg); + break; +#endif + + case '?': + exit(EXIT_FAILURE); + + default: + abort(); + } + } + free(short_options); +} + +static void +usage(void) +{ + printf("%s: userspace OpenFlow switch\n" + "usage: %s [OPTIONS] CONTROLLER\n" + "CONTROLLER must be one of the following:\n" + " tcp:HOST[:PORT] PORT (default: %d) on remote TCP HOST\n", + program_name, program_name, OFP_TCP_PORT); +#ifdef HAVE_OPENSSL + printf(" ssl:HOST[:PORT] SSL PORT (default: %d) on remote HOST\n" + "\nPKI configuration (required to use SSL):\n" + " -p, --private-key=FILE file with private key\n" + " -c, --certificate=FILE file with certificate for private key\n" + " -C, --ca-cert=FILE file with peer CA certificate\n", + OFP_SSL_PORT); +#endif + printf("Options:\n" + " -i, --interfaces=NETDEV[,NETDEV]...\n" + " add specified initial switch ports\n" + " -d, --datapath-id=ID Use ID as the OpenFlow switch ID\n" + " (ID must consist of 12 hex digits)\n" + " -u, --unreliable do not reconnect to controller\n" + " -v, --verbose set maximum verbosity level\n" + " -h, --help display this help message\n" + " -V, --version display version information\n"); + exit(EXIT_SUCCESS); +} diff --git a/switch/table-hash.c b/switch/table-hash.c new file mode 100644 index 00000000..ad5e5379 --- /dev/null +++ b/switch/table-hash.c @@ -0,0 +1,426 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "table.h" +#include +#include +#include +#include "crc32.h" +#include "flow.h" +#include "datapath.h" + +struct sw_table_hash { + struct sw_table swt; + struct crc32 crc32; + unsigned int n_flows; + unsigned int bucket_mask; /* Number of buckets minus 1. */ + struct sw_flow **buckets; +}; + +static struct sw_flow **find_bucket(struct sw_table *swt, + const struct sw_flow_key *key) +{ + struct sw_table_hash *th = (struct sw_table_hash *) swt; + unsigned int crc = crc32_calculate(&th->crc32, key, sizeof *key); + return &th->buckets[crc & th->bucket_mask]; +} + +static struct sw_flow *table_hash_lookup(struct sw_table *swt, + const struct sw_flow_key *key) +{ + struct sw_flow *flow = *find_bucket(swt, key); + return flow && !memcmp(&flow->key, key, sizeof *key) ? flow : NULL; +} + +static int table_hash_insert(struct sw_table *swt, struct sw_flow *flow) +{ + struct sw_table_hash *th = (struct sw_table_hash *) swt; + struct sw_flow **bucket; + int retval; + + if (flow->key.wildcards != 0) + return 0; + + bucket = find_bucket(swt, &flow->key); + if (*bucket == NULL) { + th->n_flows++; + *bucket = flow; + retval = 1; + } else { + struct sw_flow *old_flow = *bucket; + if (!memcmp(&old_flow->key, &flow->key, sizeof flow->key)) { + *bucket = flow; + flow_free(old_flow); + retval = 1; + } else { + retval = 0; + } + } + return retval; +} + +/* Caller must update n_flows. */ +static void +do_delete(struct sw_flow **bucket) +{ + flow_free(*bucket); + *bucket = NULL; +} + +/* Returns number of deleted flows. */ +static int table_hash_delete(struct sw_table *swt, + const struct sw_flow_key *key, int strict) +{ + struct sw_table_hash *th = (struct sw_table_hash *) swt; + unsigned int count = 0; + + if (key->wildcards == 0) { + struct sw_flow **bucket = find_bucket(swt, key); + struct sw_flow *flow = *bucket; + if (flow && !memcmp(&flow->key, key, sizeof *key)) { + do_delete(bucket); + count = 1; + } + } else { + unsigned int i; + + for (i = 0; i <= th->bucket_mask; i++) { + struct sw_flow **bucket = &th->buckets[i]; + struct sw_flow *flow = *bucket; + if (flow && flow_del_matches(&flow->key, key, strict)) { + do_delete(bucket); + count++; + } + } + } + th->n_flows -= count; + return count; +} + +static int table_hash_timeout(struct datapath *dp, struct sw_table *swt) +{ + struct sw_table_hash *th = (struct sw_table_hash *) swt; + unsigned int i; + int count = 0; + + for (i = 0; i <= th->bucket_mask; i++) { + struct sw_flow **bucket = &th->buckets[i]; + struct sw_flow *flow = *bucket; + if (flow && flow_timeout(flow)) { + dp_send_flow_expired(dp, flow); + do_delete(bucket); + count++; + } + } + th->n_flows -= count; + return count; +} + +static void table_hash_destroy(struct sw_table *swt) +{ + struct sw_table_hash *th = (struct sw_table_hash *) swt; + unsigned int i; + for (i = 0; i <= th->bucket_mask; i++) { + if (th->buckets[i]) { + flow_free(th->buckets[i]); + } + } + free(th->buckets); + free(th); +} + +struct swt_iterator_hash { + struct sw_table_hash *th; + unsigned int bucket_i; +}; + +static struct sw_flow *next_flow(struct swt_iterator_hash *ih) +{ + for (;ih->bucket_i <= ih->th->bucket_mask; ih->bucket_i++) { + struct sw_flow *f = ih->th->buckets[ih->bucket_i]; + if (f != NULL) + return f; + } + + return NULL; +} + +static int table_hash_iterator(struct sw_table *swt, + struct swt_iterator *swt_iter) +{ + struct swt_iterator_hash *ih; + + swt_iter->private = ih = malloc(sizeof *ih); + + if (ih == NULL) + return 0; + + ih->th = (struct sw_table_hash *) swt; + + ih->bucket_i = 0; + swt_iter->flow = next_flow(ih); + + return 1; +} + +static void table_hash_next(struct swt_iterator *swt_iter) +{ + struct swt_iterator_hash *ih; + + if (swt_iter->flow == NULL) + return; + + ih = (struct swt_iterator_hash *) swt_iter->private; + + ih->bucket_i++; + swt_iter->flow = next_flow(ih); +} + +static void table_hash_iterator_destroy(struct swt_iterator *swt_iter) +{ + free(swt_iter->private); +} + +static void table_hash_stats(struct sw_table *swt, + struct sw_table_stats *stats) +{ + struct sw_table_hash *th = (struct sw_table_hash *) swt; + stats->name = "hash"; + stats->n_flows = th->n_flows; + stats->max_flows = th->bucket_mask + 1; +} + +struct sw_table *table_hash_create(unsigned int polynomial, + unsigned int n_buckets) +{ + struct sw_table_hash *th; + struct sw_table *swt; + + th = malloc(sizeof *th); + if (th == NULL) + return NULL; + + assert(!(n_buckets & (n_buckets - 1))); + th->buckets = calloc(n_buckets, sizeof *th->buckets); + if (th->buckets == NULL) { + printf("failed to allocate %u buckets\n", n_buckets); + free(th); + return NULL; + } + th->bucket_mask = n_buckets - 1; + + swt = &th->swt; + swt->lookup = table_hash_lookup; + swt->insert = table_hash_insert; + swt->delete = table_hash_delete; + swt->timeout = table_hash_timeout; + swt->destroy = table_hash_destroy; + swt->iterator = table_hash_iterator; + swt->iterator_next = table_hash_next; + swt->iterator_destroy = table_hash_iterator_destroy; + swt->stats = table_hash_stats; + + crc32_init(&th->crc32, polynomial); + + return swt; +} + +/* Double-hashing table. */ + +struct sw_table_hash2 { + struct sw_table swt; + struct sw_table *subtable[2]; +}; + +static struct sw_flow *table_hash2_lookup(struct sw_table *swt, + const struct sw_flow_key *key) +{ + struct sw_table_hash2 *t2 = (struct sw_table_hash2 *) swt; + int i; + + for (i = 0; i < 2; i++) { + struct sw_flow *flow = *find_bucket(t2->subtable[i], key); + if (flow && !memcmp(&flow->key, key, sizeof *key)) + return flow; + } + return NULL; +} + +static int table_hash2_insert(struct sw_table *swt, struct sw_flow *flow) +{ + struct sw_table_hash2 *t2 = (struct sw_table_hash2 *) swt; + + if (table_hash_insert(t2->subtable[0], flow)) + return 1; + return table_hash_insert(t2->subtable[1], flow); +} + +static int table_hash2_delete(struct sw_table *swt, + const struct sw_flow_key *key, int strict) +{ + struct sw_table_hash2 *t2 = (struct sw_table_hash2 *) swt; + return (table_hash_delete(t2->subtable[0], key, strict) + + table_hash_delete(t2->subtable[1], key, strict)); +} + +static int table_hash2_timeout(struct datapath *dp, struct sw_table *swt) +{ + struct sw_table_hash2 *t2 = (struct sw_table_hash2 *) swt; + return (table_hash_timeout(dp, t2->subtable[0]) + + table_hash_timeout(dp, t2->subtable[1])); +} + +static void table_hash2_destroy(struct sw_table *swt) +{ + struct sw_table_hash2 *t2 = (struct sw_table_hash2 *) swt; + table_hash_destroy(t2->subtable[0]); + table_hash_destroy(t2->subtable[1]); + free(t2); +} + +struct swt_iterator_hash2 { + struct sw_table_hash2 *th2; + struct swt_iterator ih; + uint8_t table_i; +}; + +static int table_hash2_iterator(struct sw_table *swt, + struct swt_iterator *swt_iter) +{ + struct swt_iterator_hash2 *ih2; + + swt_iter->private = ih2 = malloc(sizeof *ih2); + if (ih2 == NULL) + return 0; + + ih2->th2 = (struct sw_table_hash2 *) swt; + if (!table_hash_iterator(ih2->th2->subtable[0], &ih2->ih)) { + free(ih2); + return 0; + } + + if (ih2->ih.flow != NULL) { + swt_iter->flow = ih2->ih.flow; + ih2->table_i = 0; + } else { + table_hash_iterator_destroy(&ih2->ih); + ih2->table_i = 1; + if (!table_hash_iterator(ih2->th2->subtable[1], &ih2->ih)) { + free(ih2); + return 0; + } + swt_iter->flow = ih2->ih.flow; + } + + return 1; +} + +static void table_hash2_next(struct swt_iterator *swt_iter) +{ + struct swt_iterator_hash2 *ih2; + + if (swt_iter->flow == NULL) + return; + + ih2 = (struct swt_iterator_hash2 *) swt_iter->private; + table_hash_next(&ih2->ih); + + if (ih2->ih.flow != NULL) { + swt_iter->flow = ih2->ih.flow; + } else { + if (ih2->table_i == 0) { + table_hash_iterator_destroy(&ih2->ih); + ih2->table_i = 1; + if (!table_hash_iterator(ih2->th2->subtable[1], &ih2->ih)) { + ih2->ih.private = NULL; + swt_iter->flow = NULL; + } else { + swt_iter->flow = ih2->ih.flow; + } + } else { + swt_iter->flow = NULL; + } + } +} + +static void table_hash2_iterator_destroy(struct swt_iterator *swt_iter) +{ + struct swt_iterator_hash2 *ih2; + + ih2 = (struct swt_iterator_hash2 *) swt_iter->private; + if (ih2->ih.private != NULL) + table_hash_iterator_destroy(&ih2->ih); + free(ih2); +} + +static void table_hash2_stats(struct sw_table *swt, + struct sw_table_stats *stats) +{ + struct sw_table_hash2 *t2 = (struct sw_table_hash2 *) swt; + struct sw_table_stats substats[2]; + int i; + + for (i = 0; i < 2; i++) + table_hash_stats(t2->subtable[i], &substats[i]); + stats->name = "hash2"; + stats->n_flows = substats[0].n_flows + substats[1].n_flows; + stats->max_flows = substats[0].max_flows + substats[1].max_flows; +} + +struct sw_table *table_hash2_create(unsigned int poly0, unsigned int buckets0, + unsigned int poly1, unsigned int buckets1) + +{ + struct sw_table_hash2 *t2; + struct sw_table *swt; + + t2 = malloc(sizeof *t2); + if (t2 == NULL) + return NULL; + + t2->subtable[0] = table_hash_create(poly0, buckets0); + if (t2->subtable[0] == NULL) + goto out_free_t2; + + t2->subtable[1] = table_hash_create(poly1, buckets1); + if (t2->subtable[1] == NULL) + goto out_free_subtable0; + + swt = &t2->swt; + swt->lookup = table_hash2_lookup; + swt->insert = table_hash2_insert; + swt->delete = table_hash2_delete; + swt->timeout = table_hash2_timeout; + swt->destroy = table_hash2_destroy; + swt->stats = table_hash2_stats; + + swt->iterator = table_hash2_iterator; + swt->iterator_next = table_hash2_next; + swt->iterator_destroy = table_hash2_iterator_destroy; + + return swt; + +out_free_subtable0: + table_hash_destroy(t2->subtable[0]); +out_free_t2: + free(t2); + return NULL; +} diff --git a/switch/table-linear.c b/switch/table-linear.c new file mode 100644 index 00000000..816c813b --- /dev/null +++ b/switch/table-linear.c @@ -0,0 +1,202 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "table.h" +#include +#include "flow.h" +#include "list.h" +#include "switch-flow.h" +#include "datapath.h" + +struct sw_table_linear { + struct sw_table swt; + + unsigned int max_flows; + unsigned int n_flows; + struct list flows; +}; + +static struct sw_flow *table_linear_lookup(struct sw_table *swt, + const struct sw_flow_key *key) +{ + struct sw_table_linear *tl = (struct sw_table_linear *) swt; + struct sw_flow *flow; + LIST_FOR_EACH (flow, struct sw_flow, node, &tl->flows) { + if (flow_matches(&flow->key, key)) + return flow; + } + return NULL; +} + +static int table_linear_insert(struct sw_table *swt, struct sw_flow *flow) +{ + struct sw_table_linear *tl = (struct sw_table_linear *) swt; + struct sw_flow *f; + + /* Replace flows that match exactly. */ + LIST_FOR_EACH (f, struct sw_flow, node, &tl->flows) { + if (f->key.wildcards == flow->key.wildcards + && flow_matches(&f->key, &flow->key)) { + list_replace(&flow->node, &f->node); + flow_free(f); + return 1; + } + } + + /* Table overflow? */ + if (tl->n_flows >= tl->max_flows) { + return 0; + } + tl->n_flows++; + + /* FIXME: need to order rules from most to least specific. */ + list_push_back(&tl->flows, &flow->node); + return 1; +} + +static void +do_delete(struct sw_flow *flow) +{ + list_remove(&flow->node); + flow_free(flow); +} + +static int table_linear_delete(struct sw_table *swt, + const struct sw_flow_key *key, int strict) +{ + struct sw_table_linear *tl = (struct sw_table_linear *) swt; + struct sw_flow *flow, *n; + unsigned int count = 0; + + LIST_FOR_EACH_SAFE (flow, n, struct sw_flow, node, &tl->flows) { + if (flow_del_matches(&flow->key, key, strict)) { + do_delete(flow); + count++; + } + } + tl->n_flows -= count; + return count; +} + +static int table_linear_timeout(struct datapath *dp, struct sw_table *swt) +{ + struct sw_table_linear *tl = (struct sw_table_linear *) swt; + struct sw_flow *flow, *n; + int count = 0; + + LIST_FOR_EACH_SAFE (flow, n, struct sw_flow, node, &tl->flows) { + if (flow_timeout(flow)) { + dp_send_flow_expired(dp, flow); + do_delete(flow); + count++; + } + } + tl->n_flows -= count; + return count; +} + +static void table_linear_destroy(struct sw_table *swt) +{ + struct sw_table_linear *tl = (struct sw_table_linear *) swt; + + while (!list_is_empty(&tl->flows)) { + struct sw_flow *flow = CONTAINER_OF(list_front(&tl->flows), + struct sw_flow, node); + list_remove(&flow->node); + flow_free(flow); + } + free(tl); +} + +/* Linear table's private data is just a pointer to the table */ + +static int table_linear_iterator(struct sw_table *swt, + struct swt_iterator *swt_iter) +{ + struct sw_table_linear *tl = (struct sw_table_linear *) swt; + + swt_iter->private = tl; + + if (!tl->n_flows) + swt_iter->flow = NULL; + else + swt_iter->flow = CONTAINER_OF(list_front(&tl->flows), struct sw_flow, node); + + return 1; +} + +static void table_linear_next(struct swt_iterator *swt_iter) +{ + struct sw_table_linear *tl; + struct list *next; + + if (swt_iter->flow == NULL) + return; + + tl = (struct sw_table_linear *) swt_iter->private; + + next = swt_iter->flow->node.next; + if (next == &tl->flows) + swt_iter->flow = NULL; + else + swt_iter->flow = CONTAINER_OF(next, struct sw_flow, node); +} + +static void table_linear_iterator_destroy(struct swt_iterator *swt_iter) +{} + +static void table_linear_stats(struct sw_table *swt, + struct sw_table_stats *stats) +{ + struct sw_table_linear *tl = (struct sw_table_linear *) swt; + stats->name = "linear"; + stats->n_flows = tl->n_flows; + stats->max_flows = tl->max_flows; +} + + +struct sw_table *table_linear_create(unsigned int max_flows) +{ + struct sw_table_linear *tl; + struct sw_table *swt; + + tl = calloc(1, sizeof *tl); + if (tl == NULL) + return NULL; + + swt = &tl->swt; + swt->lookup = table_linear_lookup; + swt->insert = table_linear_insert; + swt->delete = table_linear_delete; + swt->timeout = table_linear_timeout; + swt->destroy = table_linear_destroy; + swt->stats = table_linear_stats; + + swt->iterator = table_linear_iterator; + swt->iterator_next = table_linear_next; + swt->iterator_destroy = table_linear_iterator_destroy; + + tl->max_flows = max_flows; + tl->n_flows = 0; + list_init(&tl->flows); + + return swt; +} diff --git a/switch/table-mac.c b/switch/table-mac.c new file mode 100644 index 00000000..716dca44 --- /dev/null +++ b/switch/table-mac.c @@ -0,0 +1,278 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "table.h" +#include +#include +#include +#include "crc32.h" +#include "switch-flow.h" +#include "openflow.h" +#include "datapath.h" + +struct sw_table_mac { + struct sw_table swt; + struct crc32 crc32; + unsigned int n_flows; + unsigned int max_flows; + unsigned int bucket_mask; /* Number of buckets minus 1. */ + struct list *buckets; +}; + +static struct list *find_bucket(struct sw_table *swt, + const struct sw_flow_key *key) +{ + struct sw_table_mac *tm = (struct sw_table_mac *) swt; + unsigned int crc = crc32_calculate(&tm->crc32, key, sizeof *key); + return &tm->buckets[crc & tm->bucket_mask]; +} + +static struct sw_flow *table_mac_lookup(struct sw_table *swt, + const struct sw_flow_key *key) +{ + struct list *bucket = find_bucket(swt, key); + struct sw_flow *flow; + LIST_FOR_EACH (flow, struct sw_flow, node, bucket) { + if (!memcmp(key->flow.dl_src, flow->key.flow.dl_src, 6)) { + return flow; + } + } + return NULL; +} + +static int table_mac_insert(struct sw_table *swt, struct sw_flow *flow) +{ + struct sw_table_mac *tm = (struct sw_table_mac *) swt; + struct list *bucket; + struct sw_flow *f; + + /* MAC table only handles flows that match on Ethernet + source address and wildcard everything else. */ + if (flow->key.wildcards != (OFPFW_ALL & ~OFPFW_DL_SRC)) + return 0; + bucket = find_bucket(swt, &flow->key); + + LIST_FOR_EACH (f, struct sw_flow, node, bucket) { + if (!memcmp(f->key.flow.dl_src, flow->key.flow.dl_src, 6)) { + list_replace(&flow->node, &f->node); + flow_free(f); + return 1; + } + } + + /* Table overflow? */ + if (tm->n_flows >= tm->max_flows) { + return 0; + } + tm->n_flows++; + + list_push_front(bucket, &flow->node); + return 1; +} + +static void +do_delete(struct sw_flow *flow) +{ + list_remove(&flow->node); + flow_free(flow); +} + +/* Returns number of deleted flows. */ +static int table_mac_delete(struct sw_table *swt, + const struct sw_flow_key *key, int strict) +{ + struct sw_table_mac *tm = (struct sw_table_mac *) swt; + + if (key->wildcards == (OFPFW_ALL & ~OFPFW_DL_SRC)) { + struct sw_flow *flow = table_mac_lookup(swt, key); + if (flow) { + do_delete(flow); + tm->n_flows--; + return 1; + } + return 0; + } else { + unsigned int i; + int count = 0; + for (i = 0; i <= tm->bucket_mask; i++) { + struct list *bucket = &tm->buckets[i]; + struct sw_flow *flow; + LIST_FOR_EACH (flow, struct sw_flow, node, bucket) { + if (flow_del_matches(&flow->key, key, strict)) { + do_delete(flow); + count++; + } + } + } + tm->n_flows -= count; + return count; + } +} + +static int table_mac_timeout(struct datapath *dp, struct sw_table *swt) +{ + struct sw_table_mac *tm = (struct sw_table_mac *) swt; + unsigned int i; + int count = 0; + + for (i = 0; i <= tm->bucket_mask; i++) { + struct list *bucket = &tm->buckets[i]; + struct sw_flow *flow; + LIST_FOR_EACH (flow, struct sw_flow, node, bucket) { + if (flow_timeout(flow)) { + dp_send_flow_expired(dp, flow); + do_delete(flow); + count++; + } + } + } + tm->n_flows -= count; + return count; +} + +static void table_mac_destroy(struct sw_table *swt) +{ + struct sw_table_mac *tm = (struct sw_table_mac *) swt; + unsigned int i; + for (i = 0; i <= tm->bucket_mask; i++) { + struct list *list = &tm->buckets[i]; + while (!list_is_empty(list)) { + struct sw_flow *flow = CONTAINER_OF(list_front(list), + struct sw_flow, node); + list_remove(&flow->node); + flow_free(flow); + } + } + free(tm->buckets); + free(tm); +} + +struct swt_iterator_mac { + struct sw_table_mac *tm; + unsigned int bucket_i; +}; + +static struct sw_flow *next_head_flow(struct swt_iterator_mac *im) +{ + for (; im->bucket_i <= im->tm->bucket_mask; im->bucket_i++) { + struct list *bucket = &im->tm->buckets[im->bucket_i]; + if (!list_is_empty(bucket)) { + return CONTAINER_OF(bucket, struct sw_flow, node); + } + } + return NULL; +} + +static int table_mac_iterator(struct sw_table *swt, + struct swt_iterator *swt_iter) +{ + struct swt_iterator_mac *im; + + swt_iter->private = im = malloc(sizeof *im); + if (im == NULL) + return 0; + + im->tm = (struct sw_table_mac *) swt; + + if (!im->tm->n_flows) + swt_iter->flow = NULL; + else { + im->bucket_i = 0; + swt_iter->flow = next_head_flow(im); + } + + return 1; +} + +static void table_mac_next(struct swt_iterator *swt_iter) +{ + struct swt_iterator_mac *im; + struct list *next; + + if (swt_iter->flow == NULL) + return; + + im = (struct swt_iterator_mac *) swt_iter->private; + + next = swt_iter->flow->node.next; + if (next != NULL) { + swt_iter->flow = CONTAINER_OF(next, struct sw_flow, node); + } else { + im->bucket_i++; + swt_iter->flow = next_head_flow(im); + } +} + +static void table_mac_iterator_destroy(struct swt_iterator *swt_iter) +{ + free(swt_iter->private); +} + +static void table_mac_stats(struct sw_table *swt, struct sw_table_stats *stats) +{ + struct sw_table_mac *tm = (struct sw_table_mac *) swt; + stats->name = "mac"; + stats->n_flows = tm->n_flows; + stats->max_flows = tm->max_flows; +} + +struct sw_table *table_mac_create(unsigned int n_buckets, + unsigned int max_flows) +{ + struct sw_table_mac *tm; + struct sw_table *swt; + unsigned int i; + + tm = calloc(1, sizeof *tm); + if (tm == NULL) + return NULL; + + assert(!(n_buckets & (n_buckets - 1))); + + tm->buckets = malloc(n_buckets * sizeof *tm->buckets); + if (tm->buckets == NULL) { + printf("failed to allocate %u buckets\n", n_buckets); + free(tm); + return NULL; + } + for (i = 0; i < n_buckets; i++) { + list_init(&tm->buckets[i]); + } + tm->bucket_mask = n_buckets - 1; + + swt = &tm->swt; + swt->lookup = table_mac_lookup; + swt->insert = table_mac_insert; + swt->delete = table_mac_delete; + swt->timeout = table_mac_timeout; + swt->destroy = table_mac_destroy; + swt->stats = table_mac_stats; + + swt->iterator = table_mac_iterator; + swt->iterator_next = table_mac_next; + swt->iterator_destroy = table_mac_iterator_destroy; + + crc32_init(&tm->crc32, 0x04C11DB7); /* Ethernet CRC. */ + tm->n_flows = 0; + tm->max_flows = max_flows; + + return swt; +} diff --git a/switch/table.h b/switch/table.h new file mode 100644 index 00000000..5342d5ad --- /dev/null +++ b/switch/table.h @@ -0,0 +1,91 @@ +/* Copyright (C) 2008 Board of Trustees, Leland Stanford Jr. University. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +/* Individual switching tables. Generally grouped together in a chain (see + * chain.h). */ + +#ifndef TABLE_H +#define TABLE_H 1 + +struct sw_flow; +struct sw_flow_key; +struct datapath; + +/* Iterator through the flows stored in a table. */ +struct swt_iterator { + struct sw_flow *flow; /* Current flow, for use by client. */ + void *private; +}; + +/* Table statistics. */ +struct sw_table_stats { + const char *name; /* Human-readable name. */ + unsigned long int n_flows; /* Number of active flows. */ + unsigned long int max_flows; /* Flow capacity. */ +}; + +/* A single table of flows. */ +struct sw_table { + /* Searches 'table' for a flow matching 'key', which must not have any + * wildcard fields. Returns the flow if successful, a null pointer + * otherwise. */ + struct sw_flow *(*lookup)(struct sw_table *table, + const struct sw_flow_key *key); + + /* Inserts 'flow' into 'table', replacing any duplicate flow. Returns + * 0 if successful or a negative error. Error can be due to an + * over-capacity table or because the flow is not one of the kind that + * the table accepts. + * + * If successful, 'flow' becomes owned by 'table', otherwise it is + * retained by the caller. */ + int (*insert)(struct sw_table *table, struct sw_flow *flow); + + /* Deletes from 'table' any and all flows that match 'key' from + * 'table'. If 'strict' set, wildcards must match. Returns the + * number of flows that were deleted. */ + int (*delete)(struct sw_table *table, const struct sw_flow_key *key, + int strict); + + /* Performs timeout processing on all the flow entries in 'table'. + * Returns the number of flow entries deleted through expiration. */ + int (*timeout)(struct datapath *dp, struct sw_table *table); + + /* Destroys 'table', which must not have any users. */ + void (*destroy)(struct sw_table *table); + + int (*iterator)(struct sw_table *, struct swt_iterator *); + void (*iterator_next)(struct swt_iterator *); + void (*iterator_destroy)(struct swt_iterator *); + + /* Dumps statistics for 'table' into 'stats'. */ + void (*stats)(struct sw_table *table, struct sw_table_stats *stats); +}; + +struct sw_table *table_mac_create(unsigned int n_buckets, + unsigned int max_flows); +struct sw_table *table_hash_create(unsigned int polynomial, + unsigned int n_buckets); +struct sw_table *table_hash2_create(unsigned int poly0, unsigned int buckets0, + unsigned int poly1, unsigned int buckets1); +struct sw_table *table_linear_create(unsigned int max_flows); + +#endif /* table.h */