Ben Pfaff [Thu, 3 Sep 2009 19:51:51 +0000 (12:51 -0700)]
xenserver: Create vswitchd configuration file if it does not exist.
/etc/ovs-vswitchd.conf should always be there. Nevertheless, it is not
nice to entirely break vswitch if it is accidentally deleted. This commit
makes /etc/init.d/vswitch create an empty configuration file if it is
missing.
Bug #1821.
Ben Pfaff [Thu, 3 Sep 2009 19:58:25 +0000 (12:58 -0700)]
vswitchd: Explain why mirroring to a VLAN can cause network problems.
Bug #1963.
Ben Pfaff [Thu, 3 Sep 2009 18:53:31 +0000 (11:53 -0700)]
xenserver: Document all the /etc/sysconfig/vswitch settings.
Bug #1853.
Ben Pfaff [Wed, 2 Sep 2009 19:52:50 +0000 (12:52 -0700)]
rconn: Speed up in-band control connections, by caching the remote address.
In-band control needs to know the IP and port of the controller, so that
it can set up the correct flows to talk to that controller. Until now,
the rconn code has only made this available when a connection was actually
in progress. This means that, say, ARP packets will not be allowed through
when the rconn backs off. The same is true of packets sent by switches
that access the controller through this one.
This commit makes the rconn cache the remote IP and port and local IP
across connection attempts, improving the situation. In particular, it
reduces the overall amount of time that it takes to connect in my own
simple test case from over 10 seconds to about 2 seconds.
Justin Pettit [Mon, 31 Aug 2009 22:18:43 +0000 (15:18 -0700)]
in-band: Implement L3-based in-band control
Previously, in-band control was L2-based. This worked well when the
controller was on the same network segment as the switch. However, many
configurations are not set up this way. These changes allow a switch and
controller to be on different subnets.
This set of changes also fixes some problems related to passing DHCP
traffic as described in Bug #1618.
A full description of the reasoning and supported configurations of
in-band will be forthcoming.
Justin Pettit [Sat, 29 Aug 2009 23:02:56 +0000 (16:02 -0700)]
dpif: Add dpif_port_get_name call
Add ability to lookup a device name by its dpif port number.
Justin Pettit [Sat, 29 Aug 2009 22:58:40 +0000 (15:58 -0700)]
netdev: Add netdev_get_next_hop call
Add ability to determine the next hop IP address and device used to
reach a given host.
Justin Pettit [Thu, 16 Jul 2009 19:58:28 +0000 (12:58 -0700)]
Add ability for the datapath to match IP address in ARPs
The ability to match the IP addresses in ARP packets allows for fine-grained
control of ARP processing. Some forthcoming changes to allow in-band
control to operate over L3 requires this support if we don't want to
allow overly broad rules regarding ARPs to always be white-listed.
Unfortunately, OpenFlow does not support this sort of processing yet, so
we must treat OpenFlow ARP rules as having wildcarded those L3 fields.
Justin Pettit [Sat, 11 Jul 2009 00:21:43 +0000 (17:21 -0700)]
Return netmask along with IP address when querying through netdev
The call netdev_get_in4() now allows the caller to also retrieve the
associated netmask.
Justin Pettit [Wed, 8 Jul 2009 22:41:42 +0000 (15:41 -0700)]
secchan: Tighten in-band traffic always allowed into switch
In-band control sets up a bunch of invisible flows that allow the switch
and controller to communicate over OpenFlow. The rules may have been a
bit too permissive, since it allowed any traffic to reach the
connection's interface. This set of changes tries to tighten that to
only OpenFlow traffic and ARPs.
Justin Pettit [Thu, 27 Aug 2009 22:13:59 +0000 (15:13 -0700)]
in-band: Fix status checks that could prevent in-band updates
The method the status callback was using to retrieve the local and
remote MAC addresses pushed back the refresh timer. If this were done
frequently, it could prevent in-band control from updating its rules.
Justin Pettit [Sat, 29 Aug 2009 22:54:28 +0000 (15:54 -0700)]
ofproto: Cleanup bridge/dump-flows output
Add separator that was missing from the output of the "bridge/dump-flows"
command from ovs-appctl.
Justin Pettit [Mon, 31 Aug 2009 06:51:41 +0000 (23:51 -0700)]
netdev: Fix reversed arguments in netdev_recv warning.
Ben Pfaff [Tue, 1 Sep 2009 17:31:32 +0000 (10:31 -0700)]
datapath: Use hash table more tolerant of collisions for flow table.
The hash table used until now in the kernel datapath for storing the flow
table provides only two slots that a given flow can occupy. If both of
those slots are already full, for a given flow, then that flow cannot be
added at all and its packets must be handled entirely in userspace, taking
a performance hit. The code does attempt to compensate for this by making
the flow table rather large: 8 slots per flow actually in the flow table.
In practice, this is usually good enough, but some of the tests that we
have run show bad enough performance degradation or even timeouts of
various kinds that we want to implement something better.
This commit replaces the existing hash table by one with a completely
different design in which buckets are flexibly sized and can accept any
number of collisions. By use of suitable levels of indirection, this
design is both simple and RCU-compatible. I did consider other schemes,
but none of the ones that I came up with shared both of those two
properties.
This commit also adds kerneldoc comments for all of the flow table
non-static functions and data structures.
This has been lightly tested for correctness. It has not been tested for
performance.
Bug #1656. Bug #1851.
Ben Pfaff [Fri, 28 Aug 2009 20:05:48 +0000 (13:05 -0700)]
corekeeper: Always include PID in core dump names.
Some distributions automatically set /proc/sys/kernel/core_uses_pid to 1
and others leave it at its default setting of 0. That means that, with the
core_pattern that corekeeper was setting, on the former distributions the
PID would be included in core names and on the latter the PID would be
omitted. For consistency, this commit forces the PID to be in the core
file name in either case (note that putting %p in core_pattern causes
the core_uses_pid setting to be disregarded).
CC: Martin Casado <casado@nicira.com>
Ben Pfaff [Fri, 28 Aug 2009 21:59:42 +0000 (14:59 -0700)]
secchan: Avoid sending NetFlow packets for empty flows.
There is no value in sending out NetFlow messages when the byte counter
(hence, packet counter) is 0. This does not often happen, but it can in
corner cases where a flow gets installed but never sees any traffic before
it is uninstalled.
CC: Peter Balland <peter@nicira.com>
Ben Pfaff [Mon, 24 Aug 2009 18:06:34 +0000 (11:06 -0700)]
vswitchd: Mirror nothing, not everything, if mirror ports don't exist.
If all of the ports specified as mirror selection criteria actually do not
exist, then until now the bridge would mirror all incoming packets (on
specified VLAN(s), if any). This matches the behavior that occurs if no
mirror selection ports were specified at all, and so it makes a certain
amount of logical sense.
But it is far more likely that the user simply misspelled a port name, or
specified the name of a port that does not always exist. In fact we have
seen this behavior in practice when the controller has not caught up to
the switch's current configuration. So this commit changes the bridge to
instead disable a mirror if ports are specified and none of those ports
exist.
Bug #1904.
Ben Pfaff [Mon, 24 Aug 2009 17:42:44 +0000 (10:42 -0700)]
vswitchd: Avoid output port explosion with mirrors that output to VLANs.
compose_dsts() was updating the VLAN of packets sent to VLAN mirrors
before it changed the VLAN value, but of course it's the final VLAN value
that actually matters.
Thanks to Reid for his good work tracking this one down.
Bug #1898.
Justin Pettit [Tue, 25 Aug 2009 19:34:45 +0000 (12:34 -0700)]
mgmt: Cleanup handling of extended messages
OpenFlow has a maximum messages size of 65536 bytes, but management
messages can be greater than that. The management protocol's Extended
Data message is used to get around that limitation. This commit cleans
up some problems with our implementation and adds some additional
sanity-checking to received messages.
Related to vNetManager Bug #1843.
Justin Pettit [Tue, 25 Aug 2009 20:17:26 +0000 (13:17 -0700)]
datapath: Return EFBIG instead of EXFULL when no room in flow table
The EXFULL errno is only defined in Linux. While this datapath is
Linux-specific, the userspace that interacts with it is not.
Ben Pfaff [Fri, 21 Aug 2009 20:46:47 +0000 (13:46 -0700)]
netflow: Remove stray debug printf().
Ben Pfaff [Thu, 20 Aug 2009 22:39:01 +0000 (15:39 -0700)]
xenserver: Compute correct physical PIFs for VLANs on bonds.
Otherwise the bond device is considered the physical PIF of a VLAN-on-bond
PIF, and various bad stuff happens.
Ben Pfaff [Wed, 19 Aug 2009 22:59:18 +0000 (15:59 -0700)]
xenserver: Renice netback process to priority 0 by default.
Under heavy VM network load, we have observed that ovs-vswitchd can be
starved for CPU time, which prevents flows from being set up. This can
in turn cause connections to XAPI in Dom0 to time out (among other issues).
It is probably not necessary to renice netback all the way to priority 0
as done in this commit. That is simply the value that we have tested. QA
has not reported any ill side-effects of this choice of value (yet). One
reasonable alternative, should any problems be noticed, would be to leave
netback at its default -5 priority and simply boost ovs-vswitchd's priority
to say -6 or -7.
Bug #1656.
Ben Pfaff [Wed, 19 Aug 2009 21:49:18 +0000 (14:49 -0700)]
xenserver: Use = instead of == as operator for "test" in shell scripts.
The "test" program uses =, not ==, as the test for equality. Fortunately
most implementations are tolerant but it's better to follow the spec.
Ben Pfaff [Wed, 19 Aug 2009 21:14:40 +0000 (14:14 -0700)]
xenserver: Fix "brctl show" compatibility by introducing "brctl" wrapper.
Bug NIC-19, which reported that "brctl show" did not format its output in
the way expected by Citrix QA scripts, was believed fixed by commit
35c979bff4 "vswitchd: Support creating fake bond device interfaces."
Unfortunately, this commit was not tested on a XenServer before it was
committed. Due to differences in the actual test environment and the
XenServer environment, which have different versions of the bridge-utils
package that contains brctl, that commit did not fix the problem observed
by Citrix QA. In particular, the XenServer brctl uses sysfs to obtain
the information displayed by "brctl show", but the previous commit only
fixed up the information output by the bridge ioctls.
The natural way to fix this problem would be to fix up the sysfs support
as well. I started out along that path, but became bogged down in all
the details of the kernel sysfs.
This commit takes an alternate approach, by introducing a wrapper around
the system brctl binary that implements "brctl show" itself and delegates
all other functionality to the original binary (in a different location).
This will not fix tools that do not call into brctl, but to the best of
my knowledge there are no such tools used in the Citrix QA process.
Thanks to Justin and Reid for much feedback.
Bug NIC-19.
Ben Pfaff [Wed, 19 Aug 2009 19:59:56 +0000 (12:59 -0700)]
xenserver: Completely ignore datapath devices for renaming purposes.
Commit
2bb451b69 "xenserver: Rename network devices to match MAC addresses
of physical PIFs" started renaming network devices so that they match
the MAC address that we expect them to have. This worked OK at the time.
Commit
35c979bff "vswitchd: Support creating fake bond device interfaces"
later started creating fake bond devices to make the Citrix QA scripts
happier.
Unfortunately these commits interact badly: the bond devices created by
the latter commit are sometimes chosen as the physical devices to be
renamed over the physical PIF device names. This is because we do allow
datapath internal ports to be chosen as "physical devices" as a last
resort. This commit reverses this decision, eliminating that possibility.
This probably won't become a problem unless somehow we encounter a physical
Ethernet card driver that lacks a queue, but that is unlikely since the
performance would be awful.
Ben Pfaff [Wed, 19 Aug 2009 19:51:27 +0000 (12:51 -0700)]
datapath: Additional fixes for datapath device renaming.
Commit
c874dc6d6b "secchan: Fix behavior when a network device is renamed."
fixed a crash in the datapath when network devices within a datapath were
renamed. However, this missed the case where the device that was renamed
was a datapath's internal port: these devices have their br_port members
set to NULL, so we have to determine that they belong to a datapath another
way. This commit does so.
This commit also changes the initialization order in dp_dev_create().
Otherwise, dp_device_event() will dereference null when it is called via
register_netdevice(), because the newly created device is a datapath device
but its members are not yet initialized.
Ben Pfaff [Mon, 17 Aug 2009 16:30:29 +0000 (09:30 -0700)]
ovs-brcompatd: Don't include the local port in BRCTL_GET_PORT_LIST output.
The BRCTL_GET_PORT_LIST ioctl is not supposed to include the bridge port
itself in the list of ports, but ovs-brcompatd was doing that.
Ben Pfaff [Fri, 14 Aug 2009 20:44:27 +0000 (13:44 -0700)]
ovs-brcompatd: Fix memory leak.
Ben Pfaff [Fri, 14 Aug 2009 20:44:06 +0000 (13:44 -0700)]
ovs-brcompatd: Fix use of uninitialized svec.
Ben Pfaff [Fri, 14 Aug 2009 20:41:44 +0000 (13:41 -0700)]
datapath: Improve comments.
Justin Pettit [Mon, 17 Aug 2009 19:43:59 +0000 (12:43 -0700)]
xenserver: Add missing argument for fake-iface config
The interface-reconfigure script may add the 'fake-iface' configuration
option to the ovs-vswitchd.conf, but neglects to mention the interface.
This resulted in a "bonding.%s.fake-iface=true" line, which is clearly
wrong. This commit corrects that behavior.
Justin Pettit [Thu, 13 Aug 2009 22:14:39 +0000 (15:14 -0700)]
ovs-pki: Add uniqueness to CA certs
When ovs-pki is used for CA cert generation, it generates certificates
that are identical except for the public key. If multiple controllers are
their own certificate authorities, the switch will receive multiple CA
certs that are identical other than their key. Unfortunately, OpenSSL
cannot distinguish between them. This is an excerpt of the
SSL_CTX_load_verify_locations function used by vconn-ssl:
Certificate matching is done based on the subject name, the key
identifier (if present), and the serial number as taken from the
certificate to be verified. If these data do not match, the next
certificate will be tried. If a first certificate matching the
parameters is found, the verification process will be performed; no
other certificates for the same parameters will be searched in case of
failure.
To work around this, we add a bit of uniqueness to each certificate. In
this commit, we add the generation time to the subject name. Please note
that the CN field is limited to 64 bytes, so a bit of name compression
needed to take place in order to fit the time.
Bug #1782
Justin Pettit [Tue, 11 Aug 2009 01:48:15 +0000 (18:48 -0700)]
mgmt: Local config changes can cause update failures from controller.
If ovs-vswitchd.conf is locally modified, but ovs-vswitchd is not told
to reload it, updates from the controller will be refused. This is
because we attempt to lock the file with SHA-1 snapshot of the config
file. Since the hashes will not match, we will never be able to lock
the file, and all remote updates will fail. There is not much that can
be done about this, since we don't want to presume the current state of
the file is correct, since it could be in the process of being updated.
With this commit, we attempt to detect this problem and log a message
describing how to rectify it.
Bug #1516
Justin Pettit [Tue, 11 Aug 2009 01:09:33 +0000 (18:09 -0700)]
secchan: Clarify log message about failing open.
In commit
e10dfcf357 "rconn: Be pickier about what constitutes a
successful connection", the criteria for determining a successful
OpenFlow connection was tightened. When rconn connects at a socket
level, it prints messages stating that it "connected" and the switch is
taken out of fail-open mode. If it is determined that it is not a
"successful" OpenFlow connection, then the connection is closed and
fail-open is re-enabled. When this occurs, fail-open logs the following
potentially confusing message:
Could not connect to controller for XXX seconds, failing open
Where XXX is the number of seconds since the last "succesful" connection
rather than simple socket-level connection. This commit changes the
message to:
Could not establish OpenFlow channel to controller for XXX
seconds, failing open
Bug #1163
Justin Pettit [Mon, 10 Aug 2009 22:34:13 +0000 (15:34 -0700)]
XenServer: Increase ulimit on open file descriptors in init script
Open vSwitch allows up to 256 datapaths to be created. Each one
requires a few file descriptors. By default, XenServer limits each
process to 1024 file descriptors, which causes us to run out with roughly
140 datapaths. This change raises the limit to 4096, so we've got
additional wiggle room.
Feature #1820
Ben Pfaff [Mon, 10 Aug 2009 21:13:45 +0000 (14:13 -0700)]
xenserver: Configure MTU, Ethtool on PIFs in interface-reconfigure.
MTU and Ethtool settings on physical devices are supposed to come from
the PIF records, but we weren't configuring them at all.
Ben Pfaff [Fri, 7 Aug 2009 21:00:59 +0000 (14:00 -0700)]
xenserver: Obtain Ethtool, MTU, static routes from network instead of PIF.
Our version of interface-reconfigure was pulling the Ethtool, MTU,
and static route configuration for the local port from the PIF for the
local port, but these settings need to come from the network record
instead.
Bug #1798.
Ben Pfaff [Fri, 7 Aug 2009 21:02:50 +0000 (14:02 -0700)]
xenserver: Factor MTU, Ethtool into functions in interface-reconfigure.
interface-reconfigure needs to configure MTU and Ethtool settings not just
on the local port, as it currently does, but on the physical devices as
well. This commit factors out the code for this so that it can be called
from multiple places.
Ben Pfaff [Fri, 7 Aug 2009 20:51:33 +0000 (13:51 -0700)]
xenserver: Rename functions, variables in interface-reconfigure.
This commit adds "get_" prefixes to the physdev_names and physdev_pifs
functions' names, because I want to use those names as variable names,
and then renames variables appropriately.
Ben Pfaff [Fri, 7 Aug 2009 22:45:41 +0000 (15:45 -0700)]
xenserver: Fix typo in adding static routes in interface-reconfigure.
Ben Pfaff [Sat, 8 Aug 2009 00:02:34 +0000 (17:02 -0700)]
xenserver: Bring up physical devices before configuring local port.
If DHCP is in use, then the physical devices have to be up before we
configure the local port, otherwise the DHCP request will never hit the
wire and we have no hope of getting an IP address.
This problem has been here for a long time, but it was masked until
commit
c170afc1 "xenserver: Really take devices down in
interface-reconfigure." actually caused devices to go down and stay down.
Fixes bug #1809 "vswitch upgrade broke the xenserver".
Ben Pfaff [Fri, 7 Aug 2009 17:31:34 +0000 (10:31 -0700)]
brcompat: Remove no-longer-needed #includes.
Ben Pfaff [Fri, 7 Aug 2009 17:30:48 +0000 (10:30 -0700)]
brcompat: Remove requirement that that no datapaths exist at load time.
We previously required that brcompat_mod be loaded before any datapaths
were created, because creation and destruction of datapaths and ports
differed when brcompat_mod was loaded, but the latter is no longer the
case so there is no reason for the former anymore.
Ben Pfaff [Fri, 7 Aug 2009 17:33:41 +0000 (10:33 -0700)]
vswitchd: Support creating fake bond device interfaces.
Citrix QA scripts expect that "brctl show" shows a bond interface for each
bond that is added to a bridge. The only way to do that without modifying
brctl itself is to create an actual network device by that name, so this
commit adds a new bonding configuration key that causes an internal
device by the name of the bond to be created.
This feature is also necessary, but not sufficient, to allow XenCenter to
accurately show the link status and statistics of bridges (bug #1363).
This new configuration key is intentionally undocumented, because I don't
want anyone to use it.
Bug NIC-19.
Ben Pfaff [Fri, 7 Aug 2009 22:11:40 +0000 (15:11 -0700)]
brcompat: Move BRCTL_GET_BRIDGES, BRCTL_GET_PORT_LIST into userspace.
The Citrix QA scripts assume that "brctl show" will show a topology as if
the Linux bridge were still in use; that is, as if there were one bridge
per VLAN and as if bonds were network devices of their own instead of
separate devices. However, we were showing the datapath topology, i.e.
all VLANs and bond devices lumped together into a single datapath. This
commit fixes the VLAN end of the problem, by moving the implementation of
the ioctls that brctl uses into userspace in ovs-brcompatd and putting the
necessary translation logic into ovs-brcompatd.
By itself, this commit does not fix the problem for bonds: the port name
for a bond does not (normally) under Open vSwitch exist as an actual
Linux network device, and thus it has no ifindex, and so ovs-brcompatd
can't pass it back to the kernel to report to brctl. This fix will have
to wait for another commit.
Bug NIC-19.
Ben Pfaff [Thu, 30 Jul 2009 20:56:56 +0000 (13:56 -0700)]
brcompatd: Factor code out of handle_fdb_query_cmd().
An upcoming commit wants to do the same thing in another place, so break
the logic into a function.
Ben Pfaff [Thu, 30 Jul 2009 20:55:24 +0000 (13:55 -0700)]
brcompatd: Fix handle_fdb_query_cmd() return value on error.
handle_fdb_query_cmd() should return an error when an error occurs, but
in this case it was always returning 0, because error is known to have
value 0 at this point.
Nothing really uses the return value here, and so eventually it would make
sense to just change the return type here and in the rest of the
handle_*() functions to void.
Ben Pfaff [Thu, 30 Jul 2009 20:52:27 +0000 (13:52 -0700)]
brcompatd: Break send_reply() up into more functions for flexibility.
Upcoming commits will require sending Netlink messages to the kernel that
have additional attributes. Instead of adding more arguments to
send_reply() to handle these, it's cleaner to break up send_reply() into
two functions and let the caller add those attributes itself.
Ben Pfaff [Thu, 30 Jul 2009 20:45:52 +0000 (13:45 -0700)]
brcompatd: Make parse_command() parse commands without dp arguments.
The BRCTL_GET_BRIDGES ioctl is going to move to userspace, but that ioctl
doesn't provide a bridge name as argument, so we need to support that.
Ben Pfaff [Thu, 30 Jul 2009 20:41:21 +0000 (13:41 -0700)]
svec: New convenience macro SVEC_FOR_EACH.
Ben Pfaff [Fri, 7 Aug 2009 17:11:01 +0000 (10:11 -0700)]
datapath: Unexport functions only used in openvswitch_mod.
Ben Pfaff [Fri, 7 Aug 2009 21:46:58 +0000 (14:46 -0700)]
xenserver: Fix infinite recursion in get_netdev_by_mac.
Oops.
Justin Pettit [Wed, 5 Aug 2009 23:02:19 +0000 (16:02 -0700)]
XenServer: Add knowledge of vswitch to xen-bugtool
The xen-bugtool tool gathers information that may be useful in debugging
problems. This commit modifies that script to add support for vswitch.
Currently, the information we collect is:
- /etc/ovs-vswitchd.conf
- ovs-dpctl show
- Cores in /var/xen/vswitch
- ovs-ofctl show <bridges>
- ovs-ofctl status <bridges>
- ovs-ofctl dump-flows <bridges>
- ovs-dpctl dump-flows <bridges>
This commit also modifies the way cores are handled. They are now
enabled by default and placed in "/var/xen/vswitch".
Feature #1570
Justin Pettit [Wed, 5 Aug 2009 21:20:24 +0000 (14:20 -0700)]
daemon: Remove short options from daemon library
The daemon library provides a few short options, but these then take
away their availability from programs that wish to use the library.
Since the daemon options are generally going to be called from a script
(which doesn't care how much typing is involved), we'll only provide
long options.
Justin Pettit [Wed, 5 Aug 2009 21:16:38 +0000 (14:16 -0700)]
secchan: Remove mention of "-f" flag as being equivalent "--fail"
In an error message, it mentions a required argument to "-f or --fail",
but "-f" is not a short form of "--fail".
Justin Pettit [Wed, 5 Aug 2009 05:41:46 +0000 (22:41 -0700)]
daemon: Provide option to not chdir to root
By default, Open vSwitch daemons change their working directories to the
root directory. This commit provides a --no-chdir option to prevent this
behavior.
Justin Pettit [Tue, 4 Aug 2009 22:15:48 +0000 (15:15 -0700)]
ovs-dpctl: Remove UNUSED attributed in do_show
The do_show() function declared the argc argument as UNUSED, but it
always is.
Justin Pettit [Tue, 4 Aug 2009 22:13:40 +0000 (15:13 -0700)]
ovs-dpctl: Add dump-dps command
The "dump-dps" command prints the name of each datapath on a separate
line.
Ian Campbell [Thu, 6 Aug 2009 20:39:24 +0000 (13:39 -0700)]
xenserver: Cope gracefully with non-integer MTU in interface-reconfigure.
CP-1148.
Ian Campbell [Thu, 6 Aug 2009 21:11:10 +0000 (14:11 -0700)]
xenserver: Log attempts to enable promiscuous mode for bridge ports on vif plug.
This functionality is replaced by vSwitch support for SPAN/RSPAN.
CP-1148.
Ian Campbell [Thu, 6 Aug 2009 20:34:56 +0000 (13:34 -0700)]
xenserver: Update copyright on interface-configure and vif hotplug script.
CP-1148.
Ben Pfaff [Fri, 7 Aug 2009 00:01:53 +0000 (17:01 -0700)]
xenserver: Rename network devices to match MAC addresses of physical PIFs.
XenServer does not rely on Linux to keep the naming of network devices
stable from one boot to the next. Instead, it requires
interface-reconfigure to ensure that network devices are named such that
they have the MAC address specified for the corresponding physical PIF
in the xapi database.
At one point, we fulfilled this requirement by calling out to the Centos
ifup/ifdown scripts, which rename netdevs as necessary to match the
"HWADDR=" lines in /etc/sysconfig/network-scripts/ifcfg-<devname>. When
we rewrote interface-reconfigure not to use those scripts, however, we
accidentally dropped that support. This commit adds back in that renaming.
Bug NIC-20.
Ben Pfaff [Wed, 5 Aug 2009 21:53:48 +0000 (14:53 -0700)]
xenserver: Add new helper function to interface-reconfigure.
This will be useful in the followin commit.
Ben Pfaff [Wed, 5 Aug 2009 21:50:54 +0000 (14:50 -0700)]
vswitchd: Fix use of uninitialized variable in bridge_pick_local_hw_addr().
When a port's MAC is explicitly specified in the config file, we did not
initialize 'iface' and therefore later we could dereference a wild pointer.
This commit fixes the problem.
Ben Pfaff [Wed, 5 Aug 2009 21:47:16 +0000 (14:47 -0700)]
secchan: Fix behavior when a network device is renamed.
update_port() deals with the case where we have been notified that a
network device with a given name, that is part of the datapath, has changed
in some way. In particular it breaks the problem space up into ports that
have been added, deleted, or modified.
But the code here deals badly with the case where the only change is that
the network device associated with a port has been renamed (which is
reported to it with 'devname' as the network device's new named): it
looks up devname in the ofproto's index by name and doesn't find it, then
it looks up the port number assigned to the netdev in the ofproto's index
by datapath index and sees that there already is one. This makes it
think that it's a new port, but with a port number that conflicts with an
existing port (under the old name for the port), which makes it discard
the notification and keep the old netdev name, and so afterward nothing
on the netdev will work since it still has the old netdev name.
This rewrite fixes the problem and simplifies the code.
Ben Pfaff [Wed, 5 Aug 2009 21:36:21 +0000 (14:36 -0700)]
datapath: Update sysfs links when network devices are renamed.
We create symlinks from /sys/class/net/<bridgename>/brif/<devname> to
/sys/class/net/<devname>/brport, but until now we have never updated the
links when network devices are renamed. This commit fixes this problem.
(Only the <devname> in /sys/class/net/<bridgename>/brif/<devname> needs to
be updated. Symlinks within sysfs have stable targets; that is, no matter
how the object that a sysfs symlink points to moves around, the link is
still maintained correctly.)
Ben Pfaff [Wed, 5 Aug 2009 22:22:25 +0000 (15:22 -0700)]
datapath: Fix OOPS when dp_sysfs_add_if() fails.
Until now, when dp_sysfs_add_if() failed, the caller ignored the failure.
This is a minor problem, because everything else should continue working,
without sysfs entries for the interface, in theory anyhow. In actual
practice, the error exit path of dp_sysfs_add_if() does a kobject_put(),
and that kobject_put() calls release_nbp(), so that the new port gets
freed. The next reference to the new port (usually in an ovs-vswitchd call
to the ODP_PORT_LIST ioctl) will then use the freed data and probably OOPS.
The fix is to make the datapath code, as opposed to the sysfs code,
responsible for creating and destroying the net_bridge_port kobject. The
dp_sysfs_{add,del}_if() functions then just attach and detach the kobject
to sysfs and their cleanup routines no longer need to destroy the kobject
and indeed we don't care whether dp_sysfs_add_if() really succeeds.
This commit also makes the same transformation to the datapath's ifobj,
for consistency.
It is easy to trigger the OOPS fixed by this commit by adding a network
device A to a datapath, then renaming network device A to B, then renaming
network device C to A, then adding A to the datapath. The last attempt to
add A will fail because a file named /sys/class/net/<datapath>/brif/A
already exists from the time that C was added to the datapath under the
name A.
This commit also adds some compatibility infrastructure, because it moves
code out of #ifdef SUPPORT_SYSFS and it otherwise wouldn't build.
Ben Pfaff [Wed, 5 Aug 2009 20:45:13 +0000 (13:45 -0700)]
datapath: Prepare to extend lifetime of kobjects.
The following commit will move the initialization of the datapath and
net_bridge_port kobjects earlier and the destruction later, without
changing when those kobjects are attached to sysfs. To do so, the
initialization of kobjects and attaching to sysfs has to be done as
separate steps. That's already the case for net_bridge_port kobjects, and
this commit makes it so for datapath kobjects too.
This commit also simplifies some code, since the split API exists both
before and after 2.6.25, but the combined functions changed names.
Also, in dp_sysfs_add_if() call kobject_init() after initializing the
kset member, since kobject_init() expects that. This makes no actual
difference in this case since the kobj is obtained from kzalloc(), but
it still seems better.
Ben Pfaff [Wed, 5 Aug 2009 19:56:23 +0000 (12:56 -0700)]
datapath: Rename brc_sysfs_* to dp_sysfs_*.
These files and names are now part of the datapath, not brcompat, so name
them appropriately so as not to confuse anyone.
Ben Pfaff [Wed, 5 Aug 2009 19:51:30 +0000 (12:51 -0700)]
datapath: Move sysfs support from brcompat_mod into openvswitch_mod.
In the past problems have arisen due to the different ways that datapaths
are created and destroyed in the three different cases:
1. sysfs supported, brcompat_mod loaded.
2. sysfs supported, brcompat_mod not loaded.
3. sysfs not supported.
The brcompat_mod loaded versus not loaded distinction is the stickiest
because we have to do all the calls into brcompat_mod through hook
functions, which in turn causes pressure to keep the number of hook
functions small and well-defined, which makes it really difficult to put
the hook call points at exactly the right place. Witness, for example,
this piece of code in datapath.c:
int dp_del_port(struct net_bridge_port *p)
{
ASSERT_RTNL();
#ifdef SUPPORT_SYSFS
if (p->port_no != ODPP_LOCAL && dp_del_if_hook)
sysfs_remove_link(&p->dp->ifobj, p->dev->name);
#endif
The code inside the #ifdef is logically part of the brcompat_mod sysfs
support, but the author of this code (quite reasonably) didn't want to
add a hook function call there. After all, what would you call the
hook function? There's no obvious name from the dp_del_port() caller's
perspective.
All this argues that sysfs support should be in openvswitch_mod itself,
since it has to be tightly integrated, not bolted on. So this commit
moves it there.
Now, this is not to say that openvswitch_mod should actually be
implementing bridge-compatible sysfs. In the future, it probably should
not be; rather, it should implement something appropriate for Open vSwitch
datapaths instead. But right now we have bridge-compatible sysfs, and so
that's what this commit moves.
Ben Pfaff [Wed, 5 Aug 2009 18:39:03 +0000 (11:39 -0700)]
xenserver: Really take devices down in interface-reconfigure.
When down_netdev was called with deconfigure=True (which is the default),
it would invoke, e.g. "/sbin/ifconfig eth0 down 0.0.0.0", with the
intention of taking the interface down and removing any IP address from it
at the same time.
In fact, this removed any IP address from it and brought the device *up*,
because ifconfig executes its commands in the order that they are
specified, and setting or unsetting an IP address brings a device up.
We could specify the commands in the opposite order ("0.0.0.0 down") but
it seems to me more obviously correct to just run ifconfig twice, so that
is what this commit does.
This commit could break things, because it could be that there is a bug
elsewhere that depends on down_netdev not actually bringing a device down,
but it is needed for the upcoming device renaming commit (to fix bug
NIC-20), because a network device has to be down to be renamed.
Ben Pfaff [Wed, 5 Aug 2009 20:09:15 +0000 (13:09 -0700)]
brcompat: Add comments to sysfs code.
I got tired of figuring out over and over what these function calls do.
Ben Pfaff [Thu, 6 Aug 2009 17:33:26 +0000 (10:33 -0700)]
vswitchd: Fix logged warnings for new internal ports.
Justin reported that adding an internal port to a bridge caused
ovs-vswitchd to log a pair of warnings. This commit suppresses those
warnings, which were harmless.
Justin Pettit [Sat, 1 Aug 2009 07:09:56 +0000 (00:09 -0700)]
datapath: Support jumbo frames in the datapath device
The datapath has no problems switching jumbo frames (frames with a payload
greater than 1500 bytes), but it has not supported sending and receiving
them to the device itself. With this commit, the MTU can be set as large
as the minimum MTU size of the devices that are directly attached, or 1500
bytes if there are none. This mimics the behavior of the Linux bridge.
Feature #1736
Justin Pettit [Sat, 1 Aug 2009 08:03:23 +0000 (01:03 -0700)]
vswitchd: Initialize cfg properly and check return values
A previous checkin added the cfg_init() function, so we now call it. We
also check the return value of the initial call to cfg_read(), since if
it fails, there's not much point in continuing.
Justin Pettit [Sat, 1 Aug 2009 07:58:31 +0000 (00:58 -0700)]
cfg: Terminate cfg to prevent crashes
If cfg_* accessor calls were made before cfg_read() was called (or it
returned error), they could cause segfault. This checkin terminates the
cfg structure in such a way that will prevent these run-time problems.
Bug #1693
Ben Pfaff [Wed, 29 Jul 2009 16:56:46 +0000 (09:56 -0700)]
vswitchd: Update /proc/net/bonding when bonded port properties change.
Until now ovs-vswitchd has created the files in /proc/net/bonding, but not
updated them, because there was little need. But the Citrix QA tests check
that the list of bond hashes in that file is kept up-to-date, so we need
to update them whenever the bond hashes (or other data in the file) change.
This commit does that.
Bug NIC-16.
Ben Pfaff [Wed, 29 Jul 2009 00:10:10 +0000 (17:10 -0700)]
vswitchd: Add bond hashes to /proc/net/bonding compatibility layer.
The Citrix QA scripts require the bond hashes and their assigned devices
to be noted in /proc/net/bonding. We weren't doing that, so this commit
adds them.
Bug NIC-16.
Justin Pettit [Thu, 30 Jul 2009 06:37:28 +0000 (23:37 -0700)]
XenServer: Correct license in spec file
The license file indicated that the software is licensed under the GPLv3
license. This commit corrects that to state that it's licensed under
the Apache 2.0 license with the exception of the "datapath" directory,
which is GPLv2. The style and abbreviations were from the following
page:
http://fedoraproject.org/wiki/Packaging/LicensingGuidelines
Justin Pettit [Tue, 14 Jul 2009 20:03:57 +0000 (13:03 -0700)]
vswitchd: Add unixctl command to dump all flows, including hidden ones
Previously, the only way to query the flow table was to run "ovs-ofctl
dump-flows". This returned most flows, but not those marked hidden by
secchan. Hidden flows are setup by mechanisms such as in-band control,
since they must not be modified by users of the controller. However,
when debugging problems on the switch, it is often useful to see what
the flow table is actually doing. The new "bridge/dump-flows" command
added to ovs-appctl shows all flows being used by the OpenFlow stack.
Justin Pettit [Thu, 16 Jul 2009 22:02:53 +0000 (15:02 -0700)]
Fix DHCP request source port and add port #define's
DHCP requests were sent with a source port of 66, when it should be 68.
This code has been tested, so apparently many DHCP servers don't pay
attention to the source port. This commit also adds #define's for the
DHCP ports, so that magic numbers don't need to be used.
Justin Pettit [Thu, 16 Jul 2009 22:21:47 +0000 (15:21 -0700)]
Fix tab/space issue in datapath-protocol.h
Justin Pettit [Wed, 8 Jul 2009 21:51:47 +0000 (14:51 -0700)]
vswitchd: Remove non-used function to prevent compiler warnings
The function send_ofmp_error_msg() is not currently used, so it was
producing compiler warnings that it was defined. Remove the function
until it's needed.
Justin Pettit [Wed, 8 Jul 2009 21:43:29 +0000 (14:43 -0700)]
Cleanup warnings about "save_ptr" in strtok_r argument
The compiler warns about the "save_ptr" argument to strtok_r being
unitialized. This cleans that up.
Ben Pfaff [Wed, 29 Jul 2009 18:31:07 +0000 (11:31 -0700)]
Reduce default probe interval to 5 seconds (and fail-open timeout to 15).
Before now, the default probe interval (the idle time after which an echo
request is sent on an OpenFlow connection) was set to 15 seconds. The
fail-open timeout is 3 times the probe interval, so this meant that it
took 45 seconds for a switch to fail open.
Users at Nicira have commented that this is too long. They don't like the
idea that the network will be down for most of a minute before it begins to
recover. So this commit changes the default probe interval to 5 seconds,
hence the fail-open timeout to 15 seconds.
Ben Pfaff [Tue, 28 Jul 2009 17:21:03 +0000 (10:21 -0700)]
Reduce default maximum connection timeout from 15 seconds to 8 seconds.
Users at Nicira have commented that a maximum reconnection time of 15
seconds, which was the default, is too long. This commit cuts it to 8
seconds, on the theory that an administrator is willing to wait that long
before deciding that a change that should restore connectivity did not
work.
Ben Pfaff [Tue, 28 Jul 2009 22:35:57 +0000 (15:35 -0700)]
xenserver: Do not set or remove vSwitchVersion xapi parameter anymore.
The other-config:vSwitchVersion parameter was used to announce the Open
vSwitch version installed on a XenServer host, but this had the problem
that it could not be read or updated if the connection to the pool master
was down. Because of this problem, the only user of this parameter in
Open vSwitch was removed (in commit
3cdc31a4c3a "xenserver: Retrieve
vSwitch version from binary in xsconsole"). So this commit finished the
process and removes the parameter entirely.
This should fix hangs on Open vSwitch installation and removal due to
waiting on the connection to the pool master.
Ben Pfaff [Wed, 22 Jul 2009 19:21:25 +0000 (12:21 -0700)]
xenserver: Honor the MAC address specified in xapi database for bonds.
The xapi database for PIFs specifies the MAC address that should be used
for bonds, but interface-reconfigure didn't honor it and ovs-vswitchd
didn't have a way to configure it anyhow. This commit fixes both problems.
Bug #1645.
Ben Pfaff [Tue, 28 Jul 2009 22:44:58 +0000 (15:44 -0700)]
xenserver: Enable ARP filtering to work around xhad bug.
This works around a bug in xhad, which binds to a particular Ethernet
device, which in turn causes ICMP port unreachable messages if packets are
received are on the wrong interface, which in turn can happen if we send
out ARP replies on every interface (as Linux does by default) instead of
just on the interface that has the IP address being ARPed for, which this
sysctl setting in turn works around.
Justin Pettit did most of the work tracking down the origin of this bug.
Bug #1378.
Justin Pettit [Tue, 28 Jul 2009 22:24:32 +0000 (15:24 -0700)]
xenserver: Retrieve vSwitch version from binary in xsconsole
The xsconsole plugin shows status information about Open vSwitch. The
version information was retrieved from XAPI, but this could cause
problems. The most easily reproduced is to make a XenServer part of a
pool, then remove it. The version string is no longer in the
XenServer's local XAPI view, so it reports "<unknown>". A more direct
way to get the information is to directly query the binary, which is
what this commit does.
Bug #1626
Justin Pettit [Tue, 28 Jul 2009 18:05:28 +0000 (11:05 -0700)]
Label the current "citrix" release 0.90.4.
Ben Pfaff [Wed, 15 Jul 2009 22:29:49 +0000 (15:29 -0700)]
Do not try to resolve DNS for OpenFlow controllers or netflow collectors.
Until now, setting a netflow collector to a DNS name would cause
secchan to attempt to resolve that DNS name each time that the set of
netflow collectors is re-set. For the vswitch, this is every time that
the vswitch reconfigures itself.
Unfortunately, DNS lookup within secchan cannot work as currently
implemented, because it needs both an asynchronous DNS resolver library
and in-band control updates. Currently we have neither. Attempting to
look up DNS anyway just hangs.
This commit disables DNS lookup entirely, and updates the documentation to
change user expectations. DNS still won't work, but at least it won't
hang.
Bug #1609.
Ben Pfaff [Fri, 17 Jul 2009 19:37:01 +0000 (12:37 -0700)]
xenserver: Fix creating, destroying bonds with the management connection.
Creating a bond from the network device that holds the Xen management
connection automatically transfers the management connection to that bond.
However, we weren't properly removing the IP address from the network
devices that constituted the bond. This commit fixes that problem.
Bug #1566.
Ben Pfaff [Thu, 16 Jul 2009 22:16:06 +0000 (15:16 -0700)]
brcompat: Make "brctl showmacs" honor Linux notion of bridge composition.
The kernel only handles a single VLAN per bridge, but vswitchd can deal
with all the VLANs on a single bridge. This commit makes "brctl showmacs"
pretend that the former is the case even though the latter is the
implementation.
Bug #1567.
Ben Pfaff [Thu, 16 Jul 2009 22:15:02 +0000 (15:15 -0700)]
vswitchd: Make "fdb/show" output more meaningful port numbers.
The "fdb/show" unixctl command was showing vswitch-internal port indexes,
which cannot be meaningfully interpreted by software outside vswitchd.
Also, they potentially change every time the vswitchd configuration file
changes. This commit changes it to use a datapath port index instead,
which are both more meaningful and more stable.
Ben Pfaff [Thu, 16 Jul 2009 22:07:05 +0000 (15:07 -0700)]
cfg: New function cfg_get_matches().
Ben Pfaff [Wed, 15 Jul 2009 19:46:06 +0000 (12:46 -0700)]
Implement "brctl showmacs" support in brcompat and ovs-brcompatd.
This is needed by the Citrix test infrastructure.
Ben Pfaff [Wed, 15 Jul 2009 19:40:16 +0000 (12:40 -0700)]
brcompat: Refactor infrastructure for communication with ovs-brcompatd.
In an upcoming change, brcompat will need to receive bulk data from
ovs-brcompatd. The existing brcompat infrastructure for receiving data
only supports returning an error code, so this commit adds the ability to
receive an arbitrary sk_buff and updates the existing users to handle it.
Ben Pfaff [Wed, 15 Jul 2009 19:09:43 +0000 (12:09 -0700)]
brcompatd: Factor code out of prune_ports().
An upcoming commit will also need to gather all of the interfaces on a
specified bridge, so break this code into a helper function.