Ben Pfaff [Thu, 14 May 2009 18:20:10 +0000 (11:20 -0700)]
datapath: Fix VLAN-related kernel OOPS on XenServer.
In deleting internal ports (other than the local port) we were failing to
call dp_del_if_hook even though we had called dp_add_if_hook when we added
it. This prevented the sysfs kobject from being released and caused the
wrong address to be passed to kfree. The former could cause random
memory corruption; the latter may be benign since the address was still in
the same slab object.
Ben Pfaff [Thu, 14 May 2009 16:08:20 +0000 (09:08 -0700)]
Apply temporary band-aid to VLAN-related OOPS on XenServer.
Now, VLAN devices will be disabled by default. To enable them, create a
file named /etc/vswitchd.enable-vlans.
This commit will be reverted when a real fix is available.
Ben Pfaff [Wed, 13 May 2009 22:11:37 +0000 (15:11 -0700)]
Move EZIO utilities from vswitchext into openvswitch.
Keith Amidon [Wed, 13 May 2009 21:48:42 +0000 (14:48 -0700)]
Fix for typo in warning message.
Ben Pfaff [Wed, 13 May 2009 21:17:09 +0000 (14:17 -0700)]
xenserver: Add comments describing open issues for interface-reconfigure.
Ben Pfaff [Wed, 13 May 2009 21:15:58 +0000 (14:15 -0700)]
xenserver: Fix --force up/down behavior in a resource pool.
The PIFs key of a network lists one PIF for each member of the pool, not
one PIF per bond or whatever I had in mind. So we need to iterate over
all the PIFs in the network and find the one for our current host.
Ben Pfaff [Wed, 13 May 2009 21:01:32 +0000 (14:01 -0700)]
Add support for Citrix XenServer.
This was previously in openflowext. Now we are adding it to openvswitch.
Ben Pfaff [Wed, 13 May 2009 19:14:46 +0000 (12:14 -0700)]
datapath: Fix build warnings and errors on Linux 2.6.15, 2.6.16, 2.6.17.
Ben Pfaff [Mon, 11 May 2009 21:26:44 +0000 (14:26 -0700)]
datapath: Add support for "internal" ports similar to the local port.
The datapath has supported a simulated "local port" for a long time, but it
has never been possible to create additional ports with the same
characteristics. One way to do this is using the veth driver, but this is
somewhat awkward, since there is no desire to create a pair of devices;
one suffices.
The immediate purpose for this feature is to allow an IP address to be put
on both a physical interface and a tagged VLAN attached to that interface
on Xen.
Justin Pettit [Wed, 13 May 2009 06:35:45 +0000 (23:35 -0700)]
Don't print warning about removing policy on startup.
The policing code attempts to delete any traffic control configuration on
startup, so that interfaces come up in a known state. If the interface
didn't have any traffic control configuration, this would cause it to
print a couple of scary sounding warning messages. This commit makes it
so those no longer print.
Justin Pettit [Wed, 13 May 2009 05:58:25 +0000 (22:58 -0700)]
Fix return value call on send() when sending NetFlow messages.
When sending NetFlow messages, we use the send() call, but were checking
the wrong return value. It would report an error when any non-zero
value was returned. The send() call returns the number of bytes sent or
-1 on error. Thus, whenever a NetFlow message was sent, it would
generate an error message. Now, we only log a message when a value of
-1 is returned. (Bug #1166)
Ben Pfaff [Thu, 7 May 2009 00:35:40 +0000 (17:35 -0700)]
datapath: Call rcu_barrier() before unloading module.
According to article "RCU and Unloadable Modules" available at lwn.net,
a module that uses RCU callbacks should call rcu_barrier() before
unloading, because synchronize_rcu() does not ensure that all RCU callbacks
have actually completed, only that a grace period has elapsed.
Ben Pfaff [Thu, 7 May 2009 00:14:37 +0000 (17:14 -0700)]
datapath: Always call dp_process_received_packet() with BHs disabled.
dp_process_received_packet() was assuming that bottom-halves were disabled,
but this was not true where it was called from dp_dev_do_xmit().
Allow, add comments documenting synchronization.
Ben Pfaff [Fri, 8 May 2009 18:24:58 +0000 (11:24 -0700)]
datapath: Omit sysfs-specific data when sysfs is not enabled or not supported.
This saves a few bytes of memory but it also makes it clear to the reader
what data is used for what.
Ben Pfaff [Tue, 5 May 2009 21:23:32 +0000 (14:23 -0700)]
datapath: Omit SNAT-specific data when SNAT is not enabled.
This saves a few bytes of memory but it also makes it clear to the reader
what data is used for what.
Ben Pfaff [Fri, 8 May 2009 17:46:19 +0000 (10:46 -0700)]
brcompatd: Log high-level actions and their results.
brcompatd did not log the addbr, delbr, addif, and delif actions that it
was taking. This commit adds that logging.
Ben Pfaff [Tue, 12 May 2009 20:48:26 +0000 (13:48 -0700)]
cfg-mod: Add --changes option for logging configuration changes.
This makes it a lot easier to see what actually changed.
Ben Pfaff [Tue, 12 May 2009 20:41:02 +0000 (13:41 -0700)]
cfg-mod: Make --query print all values, not just those that are valid keys.
A "key" has a strict syntax, so calling cfg_get_all_keys() will discard
all the values that don't have that syntax.
Ben Pfaff [Fri, 8 May 2009 17:39:17 +0000 (10:39 -0700)]
cfg: Log changes to config, not whole config, in cfg_read().
The configuration file is re-read on a regular basis by brcompatd and
vswitchd in practice. When debug-level logging is enabled on the cfg
module, it was logging the entire config file each time. Not only is this
a waste of log-file space, it's difficult for humans to see what actually
changed, if anything.
So this commit changes cfg_read() to log a diff instead of the whole config
file.
Ben Pfaff [Thu, 7 May 2009 17:35:17 +0000 (10:35 -0700)]
cfg: Improve comment.
Justin Pettit [Tue, 12 May 2009 17:44:36 +0000 (10:44 -0700)]
Only send NetFlow notifications for IP traffic.
NetFlow only supports exporting information about IP. We were sending a
notification for any flow that expired, which included non-IP packets.
This would generate NetFlow messages with nearly all fields set to zero.
Now, we only send NetFlow for packets that are IP. (Bug #1256)
Ben Pfaff [Tue, 12 May 2009 17:25:37 +0000 (10:25 -0700)]
Remove the ChangeLog since it is no longer relevant for OpenVSwitch.
It might make perfect sense to start a new file here for OpenVSwitch, but
the first item should be something like "1 June 2009: Initial public
release".
Ben Pfaff [Tue, 12 May 2009 17:21:43 +0000 (10:21 -0700)]
Remove spanning tree documentation, since STP doesn't work right now.
Justin Pettit [Mon, 11 May 2009 23:33:08 +0000 (16:33 -0700)]
Update OpenFlow tcpdump patch to work with latest code.
With some of the recent (and not so recent) changes to the source code,
the OpenFlow patch to tcpdump came out of sync. This brings it back so
it works again.
Justin Pettit [Mon, 11 May 2009 23:01:14 +0000 (16:01 -0700)]
Rename strlcpy to ovs_strlcpy.
If strlpy is not defined on the build system, we build our own and added
it to the OpenVSwitch library. Unfortunately, programs that link
against the library may do the same thing, and there will be a name
conflict. This renames our implementation to prevent these linking errors.
Ben Pfaff [Mon, 11 May 2009 17:36:32 +0000 (10:36 -0700)]
Rename the project to OpenVSwitch and change version number to 0.90.0.
The Debian packages have not been renamed yet, since they need plenty of
other work at the moment too.
Ben Pfaff [Tue, 5 May 2009 20:59:17 +0000 (13:59 -0700)]
datapath: Remove hardware table support.
This support was broken anyhow. We have no immediate plans to fix it, so
it's better not to claim to support it.
Ben Pfaff [Wed, 6 May 2009 22:40:21 +0000 (15:40 -0700)]
datapath: Compare entire flow during lookup, not just first 4 or 8 bytes.
The size of a pointer is not the size of the referent.
Only God knows how much havoc this was wreaking.
(The change in dp_table_lookup is for conformance with kernel style only.)
Ben Pfaff [Wed, 6 May 2009 22:35:25 +0000 (15:35 -0700)]
datapath: Make sure that the "reserved" byte in user-provided flow is zero.
Otherwise we could return a "false negative" lookup result to the user.
(This is not known to fix any real bug; for it to do so, there would have
to be userspace code that doesn't initialize the "reserved" byte, but I
don't know of any.)
Ben Pfaff [Tue, 5 May 2009 20:26:08 +0000 (13:26 -0700)]
Fix complaint from "make distcheck" about failing to clean cfg-mod.8.
Ben Pfaff [Tue, 5 May 2009 20:22:49 +0000 (13:22 -0700)]
datapath: Remove support for Linux 2.4.
Ben Pfaff [Tue, 5 May 2009 18:47:36 +0000 (11:47 -0700)]
vswitch: Restore MAC learning for broadcast ARP replies on bonds.
Bonding has a special exception for MAC learning: don't learn from packets
on bonded ports if we already have learned it on another port. This is
because packets sent out one port can be received on the other, which would
cause us to learn incorrect locations.
But we need to make an exception for broadcast ARP replies, which indicate
that the MAC in question has moved to another switch. Before commit
76fdb7e57 "Implement OFPP_NORMAL action in secchan and hook into vswitchd"
we did so, and this commit restores that behavior.
Ben Pfaff [Tue, 5 May 2009 17:45:16 +0000 (10:45 -0700)]
vswitch: Eliminate dead code.
The bridge had a flow_idle_time member that was set to a constant value
and never modified. This commit removes it.
(secchan is now responsible for configuring the flow idle time, so it is
not desirable to revive this member.)
Ben Pfaff [Tue, 5 May 2009 17:23:02 +0000 (10:23 -0700)]
Add support for coverage counters.
This commit implements a simple form of coverage instrumentation. Points
in source code that are of interest must be explicitly annotated with
COVERAGE_INC. The coverage counters may be logged at any time with
coverage_log().
This form of coverage instrumentation is intended to be so lightweight that
it can be enabled in production builds. It is obviously not a substitute
for traditional coverage instrumentation with e.g. "gcov", but it is still
a useful debugging tool.
Ben Pfaff [Mon, 4 May 2009 23:18:19 +0000 (16:18 -0700)]
secchan: When listing flows, uninstall rules that shouldn't be installed.
To implement flow expiration, secchan periodically queries all the flows
in the datapath flow table. Until now, it has then uninstalled flows that
do not have corresponding rules at all. It has not uninstalled flows that
do have rules that are not supposed to be installed. This commit makes it
also uninstall the latter.
(This is not known to fix any real problem. It is only for completeness.)
Ben Pfaff [Mon, 4 May 2009 23:14:35 +0000 (16:14 -0700)]
secchan: Reinstall flows deleted externally.
If something external to secchan deletes flows from the datapath (e.g.
the administrator runs "dpctl dp-del-flows") then until now secchan would
switch all of those packets manually, using dpif_execute(). Better
behavior is to reinstall the flow. This commit implements that.
Ben Pfaff [Mon, 4 May 2009 23:10:28 +0000 (16:10 -0700)]
datapath: Generalize flow creation and modification.
The ODP_FLOW_ADD and ODP_FLOW_SET_ACTS datapath commands can be usefully
generalized based on whether they should be allowed to create or modify
flows, or both, and whether they reset flow statistics when they modify
an existing flow. This commit does so by merging them into a single
ODP_FLOW_PUT command and adding a set of flags.
In particular this is needed to allow flows to be reinstalled if it is
uncertain whether they have been externally deleted (e.g. with "dpctl
dp-del-flows") without requiring first reading the flow table (as
handle_odp_msg() wants to do), and to replace a flow's actions and reset
its statistics without first deleting it (as rule_update_actions() wants
to do).
Also renames some other datapath commands, for naming consistency.
Also adapts userspace to these changes.
Ben Pfaff [Fri, 1 May 2009 20:35:36 +0000 (13:35 -0700)]
secchan: Honor OFPPC_NO_RECV, OFPPC_NO_RECV_STP, OFPPC_NO_FWD.
The refactoring of secchan and the kernel module dropped support for these
(required) OpenFlow port flags. This commit reimplements them.
Ben Pfaff [Fri, 1 May 2009 20:20:07 +0000 (13:20 -0700)]
secchan: Don't let queued packets exhaust memory.
The ofproto code was queuing OpenFlow messages to connections without
limiting the maximum number that could be queued at a time. Thus, the
backlog could grow without bound and exhaust all system memory.
This commit introduces a cap on the maximum number of queued messages
in two different categories: packet-in messages and replies to OpenFlow
requests.
Ben Pfaff [Thu, 30 Apr 2009 00:19:08 +0000 (17:19 -0700)]
secchan: Fix TCP flags and IP TOS tracking for packets sent from userspace.
Ben Pfaff [Wed, 29 Apr 2009 22:45:20 +0000 (15:45 -0700)]
secchan: Fix flow statistics tracking.
Updates of flow statistics have been ad hoc and somewhat broken for some
time now. This commit makes them much more systematic and more likely
to be correct.
Ben Pfaff [Wed, 29 Apr 2009 22:43:54 +0000 (15:43 -0700)]
secchan: Clean up and simplify handle_odp_msg().
Ben Pfaff [Wed, 29 Apr 2009 22:43:24 +0000 (15:43 -0700)]
secchan: Factor common code into new function rule_update_actions().
rule_update() was only called in two places, and each time it was done
in the same way, so factor this out into a single new function
rule_update_actions().
Ben Pfaff [Wed, 29 Apr 2009 22:42:59 +0000 (15:42 -0700)]
secchan: Optimize no-change case in modify_flow().
Ben Pfaff [Wed, 29 Apr 2009 22:42:16 +0000 (15:42 -0700)]
secchan: Factor common code into rule_remove().
Several pieces of code were calling rule_uninstall(), classifier_remove(),
then rule_destroy() in sequence. Factor this out into a helper function.
Ben Pfaff [Tue, 28 Apr 2009 00:07:16 +0000 (17:07 -0700)]
secchan: Factor common code into new function rule_insert().
This is primarily a code cleanup. It also fixes a corner case for
statistics that formerly was properly handled in add_flow() but not in
ofproto_add_flow().
Ben Pfaff [Mon, 27 Apr 2009 23:56:44 +0000 (16:56 -0700)]
secchan: Remove unused parameter from ofproto_add_flow().
The 'packet != NULL' case was effectively dead, since every caller passed
a constant NULL here, so delete the parameter and the code to handle the
non-NULL case.
Ben Pfaff [Wed, 29 Apr 2009 22:36:44 +0000 (15:36 -0700)]
secchan: Eliminate UNKNOWN_SUPER.
When a super-rule is destroyed, secchan must reassess each of its subrules.
Each subrule might now have no super-rule (which we suspect is the common
case) or it might have a new super-rule.
Until now, secchan has "optimized" this reassessment by initially assigning
each of the deleted super-rule's subrules a super-rule of UNKNOWN_SUPER,
which is not a valid rule at all. It did this in the hope that the
subrule would get deleted before we need to know what its super-rule is.
However, this has repeatedly led to bugs, since it's not always obvious
what code will need to find a rule's super-rule.
This commit fixes the problem by removing the "optimization" (in quotes
because there is no evidence that it was a useful optimization in
practice).
Ben Pfaff [Tue, 28 Apr 2009 17:28:03 +0000 (10:28 -0700)]
classifier: Make classifier_for_each() easier to use.
classifier_for_each() and classifier_for_each_match() previously had the
restriction that the callback could not delete any rule that would be
visited in the same call, even if it was in a different table (except for
the rule actually passed to the callback). But a number of callers do
want to delete rules in other tables, and it is easy to eliminate that
restriction, so this commit does so.
Ben Pfaff [Wed, 29 Apr 2009 22:28:43 +0000 (15:28 -0700)]
secchan: Reduce redundancy in handle_odp_msg().
Code cleanup.
Ben Pfaff [Mon, 27 Apr 2009 21:28:54 +0000 (14:28 -0700)]
secchan: Fix OpenFlow matching on output port with OFPFC_DELETE.
The implementation of matching on out_port was only half-implemented for
OFPFC_DELETE. It was probably just overlooked. This commit fixes it, by
supplying the other half.
Ben Pfaff [Wed, 29 Apr 2009 22:27:46 +0000 (15:27 -0700)]
secchan: Update byte, packet counts for packets switched by hand.
Sometimes packets can get passed down to userspace, in which case secchan
has to send them using dpif_execute(). When this happened we weren't
updating the packet or byte counters. Fix this.
Ben Pfaff [Fri, 1 May 2009 17:20:21 +0000 (10:20 -0700)]
datapath: Eliminate synchronize_rcu() in table swap.
We found out some time ago that synchronize_rcu() can block for multiple
seconds in some cases, so it's a good idea to eliminate as many of them
as we can.
This commit eliminates a call to synchronize_rcu() from functions that
expand or flush flow tables. To avoid adding a member to dp_table that
specifies the "free_flows" argument to dp_table_destroy(), the commit
uses two different callback functions and manually inlines dp_table_swap()
into its callers.
Bug #1233.
Ben Pfaff [Fri, 1 May 2009 17:20:35 +0000 (10:20 -0700)]
datapath: Eliminate synchronize_rcu() in port group update.
We found out some time ago that synchronize_rcu() can block for multiple
seconds in some cases, so it's a good idea to eliminate as many of them
as we can.
This removes such a call in set_port_group(). This requires adding an
rcu_head to the data structure for port groups; since until now we've just
used the same "struct odp_port_group" exported to userspace, this means
that we need to introduce a new "struct dp_port_group" for internal use,
which in turn causes a fair bit of code motion.
Bug #1233.
Ben Pfaff [Thu, 30 Apr 2009 19:57:20 +0000 (12:57 -0700)]
datapath: Fix memory leak in port group.
When we destroy a datapath, we need to free its port groups also.
This is a fairly small memory leak: a few hundred bytes, at most, and
it only occurred each time a datapath was destroyed.
Ben Pfaff [Thu, 30 Apr 2009 21:42:38 +0000 (14:42 -0700)]
leak-checker: Stop logging after fstat() fails.
When fstat() fails, we should either stop logging or re-set the leak
checker hooks. The previous code didn't do either.
Ben Pfaff [Thu, 30 Apr 2009 21:45:30 +0000 (14:45 -0700)]
leak-checker: Stop logging after an output error.
In particular it's not polite to continue trying to write to the output
file after ENOSPC, since it will immediately fill up the disk again should
anyone free up any space and keeping the log file open prevents usefully
deleting it.
Ben Pfaff [Thu, 30 Apr 2009 21:37:19 +0000 (14:37 -0700)]
leak-checker: Make output line-buffered.
Unbuffered output was ideal from the viewpoint of getting the maximum
amount of output when the process was killed, but it causes a dozen or
more system calls per log entry. Line buffering should be a reasonable
compromise.
Ben Pfaff [Thu, 30 Apr 2009 21:00:06 +0000 (14:00 -0700)]
brcompatd: Fix formatting of /proc/net/vlan files.
The C source file that I copped this from originally didn't spell out the
tab as \t, thus it was very difficult to see.
Thanks for Justin for pointing this out.
Ben Pfaff [Thu, 30 Apr 2009 20:59:08 +0000 (13:59 -0700)]
brcompat: Use named macro in place of literal constants.
Thanks to Justin for the suggestion.
Ben Pfaff [Thu, 30 Apr 2009 20:46:44 +0000 (13:46 -0700)]
datapath: Break up GSO packets before sending to userspace.
On a Xen host, over-MTU GSO packets from virtual machines can end up sent
down by the virtual switch to userspace. This happens, for example, if a
TCP flow has a long enough "pause" that the datapath flow times out. When
this happens, the packet is not marked as GSO when secchan sends it back up
via dpif_execute(), and the packet is then discarded in dp_xmit_skb() as
too large.
This commit solves the problem by breaking GSO packets into MTU-size pieces
before passing them along to userspace.
Tested by running "netperf" between two VMs on different boxes and running
"dpctl dp-del-flows" on the appropriate datapath a few times in the middle
and seeing that the total bandwidth didn't change much. Verified that
packets were actually being broken up by adding a printk call inside the
"if (skb_is_gso())" block.
Thanks to Justin and Keith for review.
Bug #1133.
Justin Pettit [Wed, 29 Apr 2009 22:43:57 +0000 (15:43 -0700)]
Fix policing performance issues with VIFs.
Policing is configured with the "tc" command. By default, it picks up
the MTU from the interface having policy applied. When a guest operating
systems is configured for segmentation offloading, the packets handed to
DOM0 may be substantially larger than the MTU. The policing code was
dropping these packets, which caused performance to dive. We now
configure policing with an MTU of 64K, which solves the problem.
Thanks to Ben for diagnosing the problem.
Keith Amidon [Wed, 29 Apr 2009 19:05:28 +0000 (12:05 -0700)]
Fix vswitch init script "restart" and "update-modules" options
These options are not guaranteed to work reliably but are useful in
some situations, especially during development. These changes fix
problems introduced by the combination of the vswitch and brcompatd
init files.
To make it clear that behavior may not be sane after a restart, a big
warning has been added and explicit user confirmation is requested
before the action is implemented.
Keith Amidon [Wed, 29 Apr 2009 16:50:53 +0000 (09:50 -0700)]
Fix copy-paste errors when combining vswitchd & brcompatd init scripts
There were a couple of places where copied functionality from one
script was properly updated to work in the combined script.
Keith Amidon [Wed, 29 Apr 2009 16:43:32 +0000 (09:43 -0700)]
Properly refer to brcompatd pidfile in init script stop option
When the init script attempted to stop brcompatd it was using the
wrong variable name to find the PID file and thus not obtaining a PID
and not stopping brcompatd.
Ben Pfaff [Tue, 28 Apr 2009 17:37:37 +0000 (10:37 -0700)]
Fix "make dist" by adding forgotten headers to the makefiles.
Ben Pfaff [Fri, 24 Apr 2009 18:12:10 +0000 (11:12 -0700)]
brcompat: Add /proc/net/vlan, /proc/net/bonding compatibility support.
This adds a kernel interface, controlled by userspace through Generic
Netlink, to create, modify, and delete files in /proc/net/vlan and
/proc/net/bonding, plus vswitch support for updating files in those
directories to fit better into legacy environments that expect specific
files to appear in those directories.
We hope that the need for this support will be temporary.
Keith Amidon [Mon, 27 Apr 2009 22:48:24 +0000 (15:48 -0700)]
Stop attempting to add mgmt interface flows at boot time.
With the change to the bridging code so that it does not change the
state of interfaces when they are added to the bridge, this no longer
appears to help in any case I've tested.
Keith Amidon [Mon, 27 Apr 2009 18:09:52 +0000 (11:09 -0700)]
Initial cut at merging vswitch and vswitch-brcompatd init files.
Ben Pfaff [Mon, 27 Apr 2009 22:05:24 +0000 (15:05 -0700)]
vswitch: Don't bring up interfaces when adding them to datapaths.
Xen expects that the bridge doesn't bring up interfaces when they get
added to a datapath, so bringing them up in our emulation was causing
problems on XenServer. Keith in particular reports that this commit fixed
boot on XenServer when the management interface was configured to get an
IP address via DHCP.
This change might cause trouble or surprise for other users who do expect
interfaces to be brought up when added to a vswitch, but Keith didn't see
any new problems on XenServer.
Bug #1259.
Ben Pfaff [Fri, 24 Apr 2009 23:47:23 +0000 (16:47 -0700)]
vswitch: Fix typo in manpage.
Ben Pfaff [Fri, 24 Apr 2009 23:46:14 +0000 (16:46 -0700)]
vswitch: By default, operate standalone when controller connection fails.
The default was previously to "fail secure" (by not passing any traffic)
when the controller could not be reached, but we came to consensus that it
is better to switch traffic in a standalone mode in this case.
Ben Pfaff [Thu, 23 Apr 2009 18:33:35 +0000 (11:33 -0700)]
brcompat: Add comments to netlink header.
Ben Pfaff [Thu, 23 Apr 2009 20:39:19 +0000 (13:39 -0700)]
brcompatd: Make vars used only in brcompatd.c "static".
Ben Pfaff [Thu, 23 Apr 2009 18:23:43 +0000 (11:23 -0700)]
brcompat: Check for null pointer before reading netlink attribute.
Netlink policy parsing only checks that, if an attribute is present, then
its format is correct. It doesn't ensure that attributes are present, so
an explicit check is needed, which this commit adds.
Ben Pfaff [Thu, 23 Apr 2009 21:42:27 +0000 (14:42 -0700)]
vswitch: Fetch interface MAC when creating an interface in iface_create().
iface_create() creates an interface but failed to initialize its MAC
address. (Fortunately, this was only used by STP, which has not been
known to be tested recently anyhow.)
This is also needed for /proc/net/bonding and /proc/net/vlan, hence fixing
the bug now.
Justin Pettit [Wed, 22 Apr 2009 00:23:37 +0000 (17:23 -0700)]
Have NetFlow account for first buffered packet of new flow (Bug #1162).
The record-keeping for NetFlow didn't track the buffered packet
associated with a flow add command. We now pull the relevant data from
the buffered packet and add it to the flow rule stats.
Ben Pfaff [Mon, 20 Apr 2009 16:56:27 +0000 (09:56 -0700)]
brcomapt: Delete VLANs and bonding entries for deleted ports.
Otherwise these entries can hang around past reboot and cause trouble
later as vif numbers are reused.
Probably another part of the full solution is to delete the whole
configuration file on reboot.
Bug #1216.
Ben Pfaff [Fri, 17 Apr 2009 22:00:55 +0000 (15:00 -0700)]
datapath: Fix Xen performance when adding a VLAN tag in presence of GSO.
When the datapath needed to add a VLAN tag to a GSO packet, it would stick
it on the front of the packet and pass it along. Then GSO would later fail
because it didn't know how to segment an 802.1Q packet. So we need to
segment GSO packets by ourselves before we put the 802.1Q header on them.
We could avoid this problem if we could patch the main kernel, either to
add GSO support to the VLAN protocol or to support hardware-accelerated
VLAN tagging without VLAN groups configured).
Bug #1231.
Ben Pfaff [Fri, 17 Apr 2009 21:59:03 +0000 (14:59 -0700)]
datapath: Define ERR_CAST on kernels that don't already have it.
ERR_CAST was introduced in Linux 2.6.25 but it is also backported into the
Red Hat 2.6.18, so we need a configure-time check.
Ben Pfaff [Thu, 16 Apr 2009 22:16:28 +0000 (15:16 -0700)]
datapath: Attempt to checksum packets sent to controller on non-Xen kernel.
Commit
96660ad113 "datapath: Fix up checksum on Xen before forwarding to
controller" made sure that packets sent to the controller were properly
checksummed on Xen, where it happens pretty commonly that they are not.
It seems possible that on non-Xen machines this could also happen, even
though it does not seem to be common, so this commit tries to fix up
packets in that case also.
This commit adds a WARN_ON_ONCE() call to make it clear that it has
triggered. If we ever see this warning, then we should figure out what
triggered it and make sure that this code actually works properly.
Ben Pfaff [Thu, 16 Apr 2009 22:10:26 +0000 (15:10 -0700)]
datapath: Fix VLAN tag insertion actions on Xen.
On Xen, a VM can pass a packet that needs to be checksummed up to Dom0 for
transmission on the wire. In this case, Dom0 is supposed to pick apart
the packet, figure out the protocol, and checksum it before sending it out.
However, this fails if we insert an 802.1Q header, because the Xen routine
that picks apart packets (skb_checksum_setup() in net/core/dev.c) does not
understand 802.1Q. Hence, we must call this function ourselves, before we
add the 802.1Q header.
Fixes bug #1215.
Ben Pfaff [Thu, 16 Apr 2009 21:43:47 +0000 (14:43 -0700)]
datapath: Fix build on 2.6.18, which doesn't have "bool" or "false".
The "bool" type is a relative newcomer to the Linux kernel, and it is
still frowned up by some developers, so instead of adding a definition
to our compatibility headers (which is what we usually do), this commit
changes "bool" to "int".
Ben Pfaff [Thu, 16 Apr 2009 00:21:26 +0000 (17:21 -0700)]
vswitch: Fix indefinite wait on reload.
If a "reload" command was sent to vswitchd via the unixctl library, then
the reconfiguration would not actually happen until another round trip
through the vswitchd poll loop. Ordinarily this happens quickly but if
nothing is going on it can become an indefinite wait.
Fix by forcing another round trip if we need to reconfigure.
Ben Pfaff [Wed, 15 Apr 2009 22:04:10 +0000 (15:04 -0700)]
svec: Avoid calling svec_sort() when it is not necessary.
When we delete an item from a sorted array we just have to move the rest
of the items down to keep it sorted; we don't have to actually re-sort
the whole thing.
Ben Pfaff [Wed, 15 Apr 2009 22:03:14 +0000 (15:03 -0700)]
vswitch: Keep list of old interfaces sorted when reconfiguring ports.
The old_ifaces array must be sorted because we apply svec_contains() to
it, but there was nothing guaranteeing that it was sorted. So add a
call to svec_sort().
Ben Pfaff [Wed, 15 Apr 2009 22:02:02 +0000 (15:02 -0700)]
cfg: Keep configuration sorted when adding entries.
Otherwise later cfg accesses are likely to assert-fail since the
configuration is no longer sorted.
Ben Pfaff [Wed, 15 Apr 2009 17:50:08 +0000 (10:50 -0700)]
datapath: Set only VID when adding a new header with ODPAT_SET_VLAN_VID.
ODPAT_SET_VLAN_VID is supposed to set only the VID field of the VLAN
header. When it was only modifying an existing VLAN header, it was doing
this correctly. However, when it added a new header, it added all of the
bits passed in as the argument, not just the VID field. Fix this, setting
the other bits to 0 implicitly.
Also fixes the analogous problem with ODPAT_SET_VLAN_PCP.
Ben Pfaff [Wed, 15 Apr 2009 17:47:52 +0000 (10:47 -0700)]
vswitch: Strip VLAN headers correctly.
When a packet comes in on a trunk port and goes out on an implicit VLAN
port, we are supposed to strip the VLAN header (because it is implicit).
We were not doing so; instead, we were setting the VLAN field to all-1s.
Strip it properly, instead.
Ben Pfaff [Wed, 15 Apr 2009 17:01:30 +0000 (10:01 -0700)]
dpctl: When parsing actions, don't let "drop" be preceded by other actions.
The dpctl manpage says that "drop" must not be used with other actions,
but we were only checking for other actions *after* it, not just ones
before it.
Ben Pfaff [Wed, 15 Apr 2009 16:57:39 +0000 (09:57 -0700)]
dpif: Remove duplicated functionality.
dpif_flow_flush() does the same thing as dpif_flow_del_all().
Keith Amidon [Tue, 14 Apr 2009 23:41:17 +0000 (16:41 -0700)]
Be more conservative about bringing up interfaces when vswitchd started
Previously we were starting all bridge interfaces when vswitchd
started to help resolve issues in Xen environemnts. However, only
starting the management interface seems to be required. The
information about which interface is the management interface is
available in /etc/xensource-inventory on Xen machines, so use that to
limit the interfaces we bring up.
Justin Pettit [Tue, 14 Apr 2009 06:52:57 +0000 (23:52 -0700)]
Add "dp-del-flows" command to dpctl.
Add ability to delete flows from the datapath. Currently, there is no
way to delete specific flows--it's all or nothing. By deleting flows
from underneath the process that set them up, some confusion may arise.
For vswitchd, this only amounts to a few warning messages, though.
Justin Pettit [Tue, 14 Apr 2009 00:41:33 +0000 (17:41 -0700)]
Add support for explicitly specifying a "drop" action when adding a flow.
When adding a flow entry with dpctl, the user specifies a set of actions
to execute when a match occurs. If none are specified, then the packet
is implicitly dropped by OpenFlow. When dumping these flows, the output
of dpctl shows an action of "drop". However, dpctl did not support
adding a flow with an explicit "drop" action. This fixes that lack of
symmetry.
Justin Pettit [Tue, 14 Apr 2009 00:21:49 +0000 (17:21 -0700)]
Add description of table output to dpctl man page.
Adds a description of the output from the "dump-flows" and
"dump-aggregate" commands of dpctl. Requested by NEC.
Ben Pfaff [Thu, 9 Apr 2009 22:20:06 +0000 (15:20 -0700)]
vswitch: Add unixctl command to reload configuration file synchronously.
The Xen interface-reconfigure script wants to tell vswitchd to reload
its configuration and wait until it is complete. Until now there has
been no way to do this: sending SIGHUP causes a reload, but there is no
way to tell when it is complete. Now "vlogconf -t <socket> -e
vswitchd/reload" does the job.
Ben Pfaff [Thu, 9 Apr 2009 21:32:12 +0000 (14:32 -0700)]
Improve infrastructure for Unix socket-based local management.
"vlog_socket" was essentially a framework for management of a process
over a simple Unix domain socket interface. Unfortunately it was a little
too simple:
* It was not extensible for use by clients other than vlog.
* It was not reliable, since it was based on datagram sockets.
* It tried to hide itself using poll_fd_callback(), instead of exposing
itself through the poll loop as does almost every other entity in
the build tree.
This commit replaces vlog_socket by unixctl, which fixes these problems:
* Arbitrary commands may now be registered.
* Use of stream sockets makes it reliable.
* The interface is exposed to clients.
Ben Pfaff [Wed, 8 Apr 2009 20:46:20 +0000 (13:46 -0700)]
vlog: Make vlog_reopen_log_file() a no-op if no log file is open.
The --log-file option is supposed to be used to create a log file, but
until now, even if this isn't done, --reopen to the vlogconf program will
make it open one. This behavior was unexpected, hence this commit that
prevents it from happening.
Fixes bug #905.
Ben Pfaff [Wed, 8 Apr 2009 20:06:02 +0000 (13:06 -0700)]
vswitch: Fix handling of ARPs received on bonded interfaces.
The vswitch must handle ARPs directed to broadcast that arrived on bonded
interfaces differently based on whether they are ARP requests or replies.
This cannot be done in a flow-based manner using OpenFlow, because
OpenFlow does not distinguish between ARP requests and replies. Thus,
every such packet must be handled separately by the bonding code, and a
flow must not be set up.
Before secchan was integrated into vswitch, this was handled correctly.
This commit restores that correct behavior, by making it possible for
a normal-action callback to signal that the actions must not be used to
set up a flow.
Ben Pfaff [Wed, 8 Apr 2009 17:36:59 +0000 (10:36 -0700)]
secchan: Fix OFPPS_LINK_DOWN detection.
netdev_get_flags() was supposed to return NETDEV_CARRIER if carrier was
detected by the network device PHY, by checking for the IFF_LOWER_UP bit
in the device flags returned by the SIOCGIFFLAGS ioctl. Unfortunately,
IFF_LOWER_UP has value 0x10000 and that ioctl returns a short int, so this
bit was always read as 0, indicating that carrier was off.
There are at least two other ways to get the carrier status. One is via
rtnetlink with RTM_GETLINK. Unfortunately that is only supported on Linux
2.6.19 and up. So we fall back to the other possibility, which is
/sys/net/class/<device>/carrier. I hope that our users mount sysfs.