Ben Pfaff [Tue, 14 Dec 2010 00:20:24 +0000 (16:20 -0800)]
ofp-util: Improve error log messages.
Ben Pfaff [Tue, 14 Dec 2010 00:20:06 +0000 (16:20 -0800)]
ofp-util: Use proper format specifier for uint32_t in ofputil_lookup_openflow_message().
Ben Pfaff [Tue, 14 Dec 2010 00:25:53 +0000 (16:25 -0800)]
ofproto: Always use xid 0 for *_FLOW_REMOVED messages.
Asynchronous messages are never part of a request/reply pair so it doesn't
make sense to allocate a xid, which could confuse the controller anyhow.
Ben Pfaff [Tue, 14 Dec 2010 00:21:43 +0000 (16:21 -0800)]
ofproto: Fix encoding of NXST_* replies.
This only matter for NXST_AGGREGATE currently since NXST_FLOW has value 0.
Ben Pfaff [Tue, 14 Dec 2010 00:20:54 +0000 (16:20 -0800)]
ofp-util: Fix encoding of NXST_AGGREGATE requests.
They were being sent out as NXST_FLOW requests.
Ben Pfaff [Wed, 15 Dec 2010 17:48:16 +0000 (09:48 -0800)]
ofproto: Fix write-after-free error in compose_nx_flow_removed().
Jesse Gross [Mon, 13 Dec 2010 23:21:28 +0000 (15:21 -0800)]
datapath: Correctly return error if percpu allocation fails.
If the allocation of percpu stats fails when creating a new
datapath, we currently don't return the correct error code. Since
we don't explicitly set it when the allocation fails it will keep
the value from the previous call. This means we will return success
when the creation actually failed.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Mon, 13 Dec 2010 22:32:55 +0000 (14:32 -0800)]
Makefile: Check for undistributed files on every make, not just "make dist".
It's really easy to add files to the Git repository but forget to add them
to the distributions created by "make dist". I do this regularly, for
example. For some time, we've had a check that runs on "make dist" to
make sure that the distribution is complete, but I still screw up because
I don't run "make dist" all that often.
This commit improves the situation, by doing the check on every "make",
instead of just on "make dist".
Ben Pfaff [Mon, 13 Dec 2010 21:08:31 +0000 (13:08 -0800)]
ovs-vswitchd: Release most memory on normal exit.
This makes "valgrind --leak-check=full --show-reachable=yes" output much
easier to read.
Ben Pfaff [Mon, 13 Dec 2010 21:07:48 +0000 (13:07 -0800)]
netdev-linux: Fix pairing of rtnetlink register and unregister calls.
netdev_linux_create() called rtnetlink_notifier_register() for both system
and internal devices, but netdev_linux_destroy() only did the reverse
accounting for system devices. This fixes the pairing.
This isn't really much of a bug, since it would only cause the notifier to
be active unnecessarily (not to be removed even though it was needed). At
most it was a missed opportunity for optimization, but I don't think that
optimization would ever happen anyway.
Found with valgrind --leak-check=full --show-reachable=yes.
Ben Pfaff [Mon, 13 Dec 2010 20:20:12 +0000 (12:20 -0800)]
vswitchd: Delete DP_MAX_PORTS.
This is no longer used.
Ben Pfaff [Mon, 13 Dec 2010 22:28:53 +0000 (14:28 -0800)]
vswitchd: Fix dependency on DP_MAX_PORTS for allocating "struct dst"s.
Until now, compose_actions() has allocated enough "struct dst"s on the
stack for a worst-case flow, one that floods packets with the maximum
number of ports and mirrors. When the code was written this was correct.
However, now the number of ports is no longer known at compile time. The
maximum number, 65535, would require (65536 * (32 + 1) * 4) == 8 MB of
stack space, which is a lot. So this commit fixes the problem a different
way, by allocating the "struct dst"s dynamically when necessary.
This is a bug fix, but not a very serious one, because it could only
become a buffer overflow with a large number of mirrors.
Ben Pfaff [Mon, 13 Dec 2010 19:12:37 +0000 (11:12 -0800)]
bridge: Eliminate bond_rebalance_port() dependency on DP_MAX_PORTS.
There's no reason to allocate the bals[] array on the stack here, since
this is not on any fast-path.
As an alternative, we could limit the number of interfaces on a single
bond to some reasonable maximum, such as 8 or 32, but this commit's change
is simpler.
Ben Pfaff [Mon, 13 Dec 2010 20:25:01 +0000 (12:25 -0800)]
ofproto: Fix use-after-free error in facet_revalidate().
Found by valgrind.
Jesse Gross [Wed, 8 Dec 2010 19:36:57 +0000 (11:36 -0800)]
datapath: Validate lock when handling flow actions.
When reading actions without rcu_read_lock we need to hold the
datapath lock. This checks that using lockdep.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Dec 2010 19:32:05 +0000 (11:32 -0800)]
datapath: Check locks on access to flow table.
When accessing the flow table without holding rcu_read_lcok
we need to hold the lock on the datapath. This enables lockdep
to validate that that is the case.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Dec 2010 19:07:56 +0000 (11:07 -0800)]
datapath: Validate access to DP array.
When access the array of DPs, we need to hold either rcu_read_lock
or dp_mutex. This enables lockdep to validate those conditions.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 5 Dec 2010 20:36:36 +0000 (12:36 -0800)]
tunneling: Add checks for header cache lock.
When updating the tunnel header cache, we need to hold a lock to
protect against concurrent access. This adds annotations to
make sparse happy when we access the data without rcu_read_lock
and enables lockdep to verify that we have the correct lock.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 23:15:47 +0000 (15:15 -0800)]
datapath: Convert rcu_dereference() to correct variant.
Using rcu_dereference() makes lockdep complain if rcu_read_lock
is not held. This is OK if the update side lock is held. This
adds checks to see if RTNL lock is held, if that is also a
correct form of protection. Alternately, it enforces that RTNL
must be held.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 5 Dec 2010 19:22:04 +0000 (11:22 -0800)]
datapath: Don't directly access RCU protected pointers.
If RTNL lock is used to protected updates to RCU data structures
then it isn't necessary to use rcu_dereference() to access them if
RTNL is held. This adds rtnl_dereference() to access these pointers
which has several benefits: documents the locking expectations;
checks that RTNL actually is held when run with lockdep; makes
sparse not complain about directly accessing RCU pointers.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 20:04:39 +0000 (12:04 -0800)]
datapath: Correct byte order annotations.
We have generally been using the byte order specific data types
(i.e. __be32 instead of u32) in most places. This corrects a
declaration and adds a few needed casts.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 19:50:53 +0000 (11:50 -0800)]
datapath: Add usage of __rcu annotation.
Sparse can warn about incorrect usage of RCU via direct access to
points when used in conjuction with __rcu and CONFIG_SPARSE_RCU.
This adds the necessary annotations.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 19:39:53 +0000 (11:39 -0800)]
datapath: Add usage of __percpu annotation.
Sparse can warn if percpu pointers are incorrectly directly
dereference. This adds the annotation where we declare percpu
pointers.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 19:17:26 +0000 (11:17 -0800)]
datapath: Correct usage of __user annotation.
We generally have been using the __user annotation but there were
a few places where it was missing or needed a cast.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 23:39:19 +0000 (15:39 -0800)]
datapath: Compatibility code for RCU check functions.
The rcu_dereference_rtnl() and rtnl_dereference() functions will
be introduced in 2.6.37. They provide nice documentation of
locking expectations as well as checking on recent kernels.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 20 Nov 2010 01:48:04 +0000 (17:48 -0800)]
datapath: Add compatibility code for sparse annotations.
The __percpu and __rcu annotations for sparse are relatively
recent additions, so provide no-op definitions on older kernels.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 12 Dec 2010 07:29:22 +0000 (23:29 -0800)]
datapath: Use __packed macro.
The __packed macro is preferred instead of an explicit GCC attribute,
so use it instead to deal with structure packing.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 12 Dec 2010 07:28:33 +0000 (23:28 -0800)]
datapath: Compatibility code for __packed macro.
The __packed macro for structure packing wasn't introduced until 2.6.24,
so define it ourselves.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Tue, 23 Nov 2010 18:15:43 +0000 (10:15 -0800)]
nicira-ext: Correct and extend examples for NXM_OF_VLAN_TCI field.
The final example for this field was wrong. This corrects it and adds
two more examples.
Reported-by: Natasha Gude <natasha@nicira.com>
Jesse Gross [Sun, 12 Dec 2010 18:01:19 +0000 (10:01 -0800)]
datapath-protocol: Include netlink.h.
On older kernels that don't have if_link.h, we use our own, limited
version. This version doesn't include the netlink header, causing
problems where we were relying on it to define the types in
datapath-protocol.h. Therefore, directly include it, since it is
better to be explicit about it anyways.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 12 Dec 2010 17:54:46 +0000 (09:54 -0800)]
pinsched: Avoid uninitialized variable warning.
Some compilers warn about the variable 'n_longest' in drop_packet()
being used uninitialized. This isn't actually possible but explicitly
set it to zero to avoid spurious warnings.
Jesse Gross [Sun, 12 Dec 2010 06:53:34 +0000 (22:53 -0800)]
nx-match: Use correct printf format specifiers.
A few of the printf format specifiers didn't match the type that
they were printing. On 32-bit platforms there is some overlap
but on 64-bit they cause a mismatch.
Jesse Gross [Sun, 12 Dec 2010 06:51:31 +0000 (22:51 -0800)]
vswitchd: Consistently use size_t for action lengths.
Currently the type of the datapath action length is mixture of
size_t and unsigned int. However, size_t is really defined as an
unsigned long, which causes the build to fail on 64-bit platforms.
This consistently uses size_t.
Jesse Gross [Sun, 12 Dec 2010 01:31:36 +0000 (17:31 -0800)]
flow: Make size of flow struct a multiple of 8.
The compiler wants to pad structures to a multiple of the native
datatype for the architecture, so a multiple of 4 on 32-bit platforms
and a multiple of 8 on 64-bit. Currently the size struct flow is
a multiple of 4, so the total size with padding varies depending on
the architecture, causing build asserts to fail. This explicitly pads
it out to a multiple of 8 for consistency.
Ben Pfaff [Sat, 11 Dec 2010 00:41:33 +0000 (16:41 -0800)]
datapath: Remove explicit 'unlikely' from IS_ERR calls.
As David Miller pointed out on netdev today, IS_ERR has a built-in
'unlikely', so there's no point in adding one of our own.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Mon, 13 Dec 2010 18:19:46 +0000 (10:19 -0800)]
datapath: Introduce more compat support for <net/netlink.h>.
With this commit, I have successfully built the datapath, without warnings,
on 2.6.{18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,36} on i386,
2.6.31 on x86-64, and the kernels included with XenServer 5.5.0 and (some
prerelease kernel for) XenServer 5.6.0.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:42:17 +0000 (14:42 -0800)]
datapath: Add compat support for nla_type().
The datapath code uses nla_type() but it was only introduced in 2.6.24.
The NLA_TYPE_MASK definition has to go above the #include <net/netlink.h>
because <net/netlink.h> recursively #include <linux/netlink.h>.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:39:25 +0000 (14:39 -0800)]
datapath: Include <linux/skbuff.h> directly into linux/ip.h compat.
While doing test builds on numerous kernel versions I found that one build
failed because skb_network_header() wasn't visible from flow.h. I guess
that we accidentally depend on <linux/netlink.h> being included indirectly,
but this didn't always happen.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:38:25 +0000 (14:38 -0800)]
datapath: Include <linux/netlink.h> directly into flow.h.
While doing test builds on numerous kernel versions I found that one build
failed because "struct nlattr" wasn't visible from flow.h. I guess that
we accidentally depend on <linux/netlink.h> being included indirectly, but
this didn't always happen.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:45:38 +0000 (14:45 -0800)]
datapath: Fix off-by-one error in dev_get_stats() compat code.
dev_get_stats() was introduced in 2.6.29, not 2.6.28.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 22:13:17 +0000 (14:13 -0800)]
datapath: Fix csum_replace4() compatibility implementation.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ethan Jackson [Sat, 11 Dec 2010 23:24:40 +0000 (15:24 -0800)]
utilities: ovs-tcpdump references non-existent exception
ovs-tcpdump would not behave properly when users attempted to pass
invalid arguments.
Jesse Gross [Fri, 10 Dec 2010 00:40:15 +0000 (16:40 -0800)]
datpath: Fix memory leak when a loop is detected.
If we detect a packet that is looping we kill the flow but then
don't do anything with the packet that caused the problem in the
first place, so this frees the packet. This isn't a very serious
leak because we try to shut off the flow that lead to the loop
as early as possible. Once this happens, packets will no longer
hit the loop detector and will be freed just as any other packet
that should be dropped.
It also fixes an issue where the offset to the stats counter is
uninitialized after a loop is detected.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 07:55:20 +0000 (23:55 -0800)]
datapth: Drop check for impossible condition after skb_gso_segment().
It's possible for skb_gso_segment to return NULL but only if the
hardware supports the correct form of segmentation offload but just
wants software to verify the offload parameters. However, since we're
not hardware and don't support any kind of segmentation offload natively,
we can never get in this situation. Therefore drop the check and
comment.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 07:29:10 +0000 (23:29 -0800)]
datapath: Drop synchronize_rcu() in internal dev destroy.
unregister_netdevice() contains a call to synchronize_rcu(), so there
is no need to directly call it ourselves immediately beforehand.
We were relying on the call during unregistration anyways to stop
packets from being transmited on the device, so our version was
both misleading and had a performance penalty.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 03:28:32 +0000 (19:28 -0800)]
datapath: Take advantage of IFF_OVS_DATAPATH.
Starting in 2.6.37 we have our own unique identifier to be able
to find ports attached to OVS. Take advantage of it to avoid
ugly workarounds.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Thu, 9 Dec 2010 03:21:40 +0000 (19:21 -0800)]
datapath: Don't use RCU for internal dev vport.
The vports are now attached and ready to go when they are allocated,
so we don't have to worry about future changes. As a result, we can
directly store the pointer in the internal dev's netdevice private
space before it is registered. The registration process will handle
the necessary write memory barriers and anyone who has a reference
to the netdev will have done the read side barriers, we don't need
to use RCU at all.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Justin Pettit [Sat, 11 Dec 2010 04:50:58 +0000 (20:50 -0800)]
ofproto: Fix problem that caused facets not to be installed into datapath.
Commit
cdee00f (datapath: Replace "struct odp_action" by Netlink
attributes.) stopped initializing some elements in facet structures
in certain cases. This caused flows to not be installed into the datapath.
This commit sets that again based on the action context.
Ben Pfaff [Fri, 10 Dec 2010 18:42:42 +0000 (10:42 -0800)]
Expand tunnel IDs from 32 to 64 bits.
We have a need to identify tunnels with keys longer than 32 bits. This
commit adds basic datapath and OpenFlow support for such keys. It doesn't
actually add any tunnel protocols that support 64-bit keys, so this is not
very useful yet.
The 'arg' member of struct odp_msg had to be expanded to 64-bits also,
because it sometimes contains a tunnel ID. This member also contains the
argument passed to ODPAT_CONTROLLER, so I expanded that action's argument
to 64 bits also so that it can use the full width of the expanded 'arg'.
Userspace doesn't take advantage of the new space though (it was only
using 16 bits anyhow).
This commit has been tested only to the extent that it doesn't disrupt
basic Open vSwitch operation. I have not tested it with tunnel traffic.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Feature #3976.
Ben Pfaff [Thu, 9 Dec 2010 22:19:51 +0000 (14:19 -0800)]
ofp-util: Make ofputil_cls_rule_to_match() help with flow cookies too.
This fixes OpenFlow 1.0 flow stats reporting of flows added via NXM.
I noticed this problem while implementing 64-bit tunnel IDs, hence the
positioning. The following commit adds a test.
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Thu, 9 Dec 2010 22:16:56 +0000 (14:16 -0800)]
ofproto: Format entire rule when dumping all flows.
cls_rule_format() formats the entire classifier rule, whereas
ofp_print_match() just shows the parts that are visible in OpenFlow 1.0.
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 18:40:58 +0000 (10:40 -0800)]
datapath: Replace "struct odp_action" by Netlink attributes.
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel. In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits. So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Tue, 7 Dec 2010 18:49:47 +0000 (10:49 -0800)]
netlink: Add macros for iterating through attributes.
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Tue, 7 Dec 2010 17:37:59 +0000 (09:37 -0800)]
netlink: New function nl_attr_type().
Linux since v2.6.24 has a couple of couple of bits at the top of
nla_type that one is apparently supposed to ignore. This commit
starts doing that in Open vSwitch userspace.
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Tue, 7 Dec 2010 17:37:29 +0000 (09:37 -0800)]
netlink: Add functions for working with big-endian attribute values.
These _be<N> functions are completely equivalent to the corresponding
_u<N> functions, but the names help to make their purpose clear.
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 17:51:03 +0000 (09:51 -0800)]
netlink: Split into generic and Linux-specific parts.
The parts of the netlink module that are related to sockets are
Linux-specific, since only Linux has AF_NETLINK sockets. The rest can be
built anywhere. This commit breaks them into two modules, and builds the
generic one on all platforms.
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Tue, 7 Dec 2010 17:33:27 +0000 (09:33 -0800)]
netlink: Make netlink-protocol.h compatible with <linux/netlink.h>.
Until now, netlink-protocol.h and <linux/netlink.h> could not both be
included by a single source file, because they contained conflicting
definitions. This commit fixes the problem, by having netlink-protocol.h
delegate to <linux/netlink.h> where it is available.
Here's an example of the problem: odp-util.c includes both
datapath-protocol.h and will need netlink-protocol.h also so that it can
look through actions defined as struct nlattr. datapath-protocol.h
includes <linux/if_link.h> for the definition of rtnl_link_stats64, and
<linux/if_link.h> includes <linux/netlink.h>.
Acked-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Fri, 10 Dec 2010 19:05:48 +0000 (11:05 -0800)]
ofproto: Fix documentation of ofproto/trace command.
Ben Pfaff [Fri, 10 Dec 2010 17:14:26 +0000 (09:14 -0800)]
xenserver: Make rpmbuild happy by updating %files with new utilities.
Jesse Gross [Fri, 10 Dec 2010 01:52:39 +0000 (17:52 -0800)]
tunneling: Fix updated port pools commit.
If readding a tunnel to the table fails during move_port(), we
should decrement the port pool counter that it is in. However,
when I attempted to do this, I accidentally put it in add_port().
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Wed, 8 Dec 2010 21:38:22 +0000 (13:38 -0800)]
datapath: Drop unused file ops.
There have been two ops to support async access to the datapath
character device for a long time but they have never been implemented.
Drop the commented out code.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Dec 2010 21:19:05 +0000 (13:19 -0800)]
datapath: Hold mutex for DP while userspace is blocking on read.
Currently we get a pointer to the DP in openvswitch_read() and
openvswitch_poll() and use it without any synchronization. This means
that the DP could disappear from underneath us while we are using it.
Currently, this isn't a problem because userspace is single threaded but
it's better for the locking to be correct.
With this change we hold the mutex while doing a blocking wait, which
means that no changes can be made, including adding/removing flows. It's
possible to make this finer grained but for the time being that isn't done,
since current userspace doesn't care.
Found with lockdep.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Wed, 8 Dec 2010 20:02:42 +0000 (12:02 -0800)]
datapath: dp_sysfs_add_dp() needs RTNL lock.
We currently drop RTNL before adding a new datapath to sysfs but
then access the dp data structures. This moves the call to
dp_sysfs_add_dp() before we drop the locks to prevent a potential
race. All other calls to sysfs functions already hold RTNL.
Found with lockdep.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 19:27:07 +0000 (11:27 -0800)]
datapath: RCU dereference correct pointer in table.
Our hash table implementation consists of two levels of buckets
and then arrays of pointers. The bucket arrays are fixed by the
size of the table, which is therefore protected by the RCU
dereference of the table pointer. The arrays change when items
are inserted or deleted. However, in tbl_insert/remove we need
to look at the old values and we do an rcu_dereference() on the
second level array instead of the bucket itself. Other places
that access the table for lookup do the pointer dereference in
the correct order.
Found by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 19:26:16 +0000 (11:26 -0800)]
datapath: Don't rcu_dereference() objects in table.
Each time that we modify the flow/port table, we reallocate the
array of pointers to objects in a particular bucket. We then use
RCU to update the link to that bucket. This means that we don't
need to use RCU to access the individual object pointers, since
they are constant for a given instance of the bucket data structure.
This doesn't cause a problem per se (though it does restrict the
optimizations that the compiler can perform and adds a memory barrier
on Alpha). However, it is confusing and inconsistent since the
pointers are not protected by RCU and we don't use rcu_assign_pointer().
Found by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 5 Dec 2010 20:03:49 +0000 (12:03 -0800)]
tunneling: Add missing rcu_dereference() to cache cleaner.
The cleaner for the header caching accesses the tunnel port table
without holding any locks. However, it doesn't have a read memory
barrier, so there is no guarantee that the contents of the table
have made it to the right CPU.
Found by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 23:17:56 +0000 (15:17 -0800)]
brcompat: Simplify generation of bridge ID.
Currently we use a fairly complicated method of generating the
bridge ID, since the actual struct is only available in a header
file private to the Linux bridge. The current method appears to
be correct but is difficult to reason about. This replaces it
with a simple memcpy, which is more analogous to what the Linux
bridge does.
Flagged by sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 21:52:25 +0000 (13:52 -0800)]
datapath: Use static where possible.
Mark functions and global variables used only in a single file as
static.
Found with sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 21:50:24 +0000 (13:50 -0800)]
datapath: Use NULL instead of 0 in alloc_buckets().
0 and NULL are the same but NULL has clearer semantics. This has
no functional change.
Found with sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 21:49:50 +0000 (13:49 -0800)]
capwap: Bind address should be big endian.
CAPWAP creates a UDP socket that accepts packets from any address using
INADDR_ANY. IP addresses should be in network byte order but that
constant is in host byte order, so use htonl. However, this is not a
real bug since the value of INADDR_ANY is 0.
Found with sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 7 Dec 2010 02:03:37 +0000 (18:03 -0800)]
datapath: Try to avoid custom checksum update function.
Our update_csum() function was exactly the same as
inet_proto_csum_replace4() with the one exception that it uses our
checksum status fields on older kernels that need it. Unfortunately,
we can't completely move the code to the compat directory because it
relies on fields in OVS CB but we can at least exile it to checksum.h.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 7 Dec 2010 03:02:14 +0000 (19:02 -0800)]
datapath: Compatibility code for inet_proto_csum_replace2.
Kernels earlier than 2.6.25 did not define inet_proto_csum_replace2,
so implement it ourselves.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Tue, 7 Dec 2010 01:51:33 +0000 (17:51 -0800)]
datapath: Correctly update IP checksum with actions.
The update_csum() function that we currently use to update
checksums on actions is really intended for L4 checksums. In
particular, if the packet has a partial checksum and the field
is not in the pseudo header, it doesn't do anything at all.
This doesn't make sense for the IP header because Linux doesn't
use hardware offload for it, so we always need to recompute the
checksum. Instead, we can use the kernel function csum_replace4(),
which will always do the right thing.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Tue, 7 Dec 2010 02:58:30 +0000 (18:58 -0800)]
datapath: Compatibility code for csum_replace4.
Kernels ealier than 2.6.25 did not define csum_replace4, so
implement it ourselves.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sun, 5 Dec 2010 20:16:40 +0000 (12:16 -0800)]
tunneling: Update port pools on config change.
We keep track of the number of tunnels using the different types of
matching in order to avoid doing the lookup when there are no ports
of that type. However, when updating the configuration we weren't
changing the port pool counts, which could lead to incorrectly not
finding a tunnel on receive.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Jesse Gross [Mon, 6 Dec 2010 23:50:16 +0000 (15:50 -0800)]
datapath: Fix indentation in patch-vport.c.
Convert spaces to tabs in indents.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 17:43:35 +0000 (09:43 -0800)]
datapath: Convert patch vport to use call_rcu() on destruction.
Since patch ports are virtual devices, we can potentially have many
of them in a datapath. Currently we have a call to synchronize_rcu()
each time we destroy one, which can be expensive if we are deleting a
datapath with many ports. This converts it to use call_rcu() instead,
which allows us to wait for only a single RCU grace period independent
of the number of ports.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 03:17:20 +0000 (19:17 -0800)]
tunneling: Access correct IP header when processing ECN.
We attempt to copy the ECN bits from the outside of the tunnel to
the inside on receive if we are encapsulating IP traffic. However,
we were previously looking at the inner IP header as the source of
the ECN bits, when it should have been the outer header. This
corrects that and cleans up the function a little bit.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Jesse Gross [Sat, 4 Dec 2010 02:06:23 +0000 (18:06 -0800)]
tunneling: Remove call to eth_type_trans() on receive.
On receive we call eth_type_trans() to set skb->protocol. However,
that function also sets skb->pkt_type, which requires several
comparisons to MAC addresses. Nothing in OVS cares about pkt_type,
so this is wasteful. If we actually do egress to the IP stack
through an internal device then we'll call eth_type_trans() to get
everything correctly setup. It's possible for device drivers to
see an incorrect pkt_type or not correctly parse legacy IPX (which
eth_type_trans() also handles) but it's highly unlikely that they
will care.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Ben Pfaff [Thu, 9 Dec 2010 23:00:36 +0000 (15:00 -0800)]
ofproto: Add "ofproto/trace" command to help debugging flow tables.
With an appropriate flow table, output from a command like this:
ovs-appctl ofproto/trace system@dp0 0 0
ffffffffffff000c29f49d5c080600010
80006040001000c29f49d5cac10008a000000000000ac1004df00000000000000000000000000000
0000000
resembles the following:
Packet: -8:00:00.000000 00:0c:29:f4:9d:5c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x
0806), length 60: arp who-has 172.16.4.223 tell 172.16.0.138
Flow: tunnel0:in_port0000:tci(0) mac00:0c:29:f4:9d:5c->ff:ff:ff:ff:ff:ff type080
6 proto1 tos0 ip172.16.0.138->172.16.4.223 port0->0
Rule: cookie=0 in_port=65534
OpenFlow actions=resubmit:1,mod_vlan_vid:5,resubmit:2,mod_vlan_pcp:6,strip_vlan
Resubmitted flow: unchanged
Rule: cookie=0 in_port=1
OpenFlow actions=resubmit:3,resubmit:4
Resubmitted flow: unchanged
No match
Resubmitted flow: unchanged
No match
Resubmitted flow: tunnel0:in_port0000:tci(vlan5,pcp0) mac00:0c:29:f4:9d:
5c->ff:ff:ff:ff:ff:ff type0806 proto1 tos0 ip172.16.0.138->172.16.4.223 port0->0
No match
Final flow: tunnel0:in_port0000:tci(0) mac00:0c:29:f4:9d:5c->ff:ff:ff:ff:ff:ff t
ype0806 proto1 tos0 ip172.16.0.138->172.16.4.223 port0->0
Datapath actions: set_tci(vid=5,pcp=0),set_tci(vid=5,pcp=6),strip_vlan
Ben Pfaff [Thu, 9 Dec 2010 22:52:44 +0000 (14:52 -0800)]
ofproto: Change xlate_actions() to take a structure.
An upcoming commit has a need to give xlate_actions() another parameter,
but it already has too many. This commit improves the situation by making
xlate_actions()'s caller fill in a structure instead.
The action_xlate_ctx structure is kind of big and unwieldy because it
include a struct odp_actions, which is about 16 kB. But work underway will
change that to a "struct ofpbuf", which is much more reasonable.
Ben Pfaff [Wed, 8 Dec 2010 21:09:59 +0000 (13:09 -0800)]
ofpbuf: New function ofpbuf_put_hex().
This commit converts nx_match_from_string() to use this new function. The
new function will also have another user in an upcoming commit.
Ben Pfaff [Thu, 9 Dec 2010 20:31:31 +0000 (12:31 -0800)]
ofp-print: Print every flow on a new line for NXST_FLOW replies too.
This makes NXST_FLOW formatting consistent with OFPST_FLOW.
Suggested-by: Justin Pettit <jpettit@nicira.com>
Ben Pfaff [Thu, 9 Dec 2010 19:03:35 +0000 (11:03 -0800)]
ofp-print, ofp-parse: Add support for NXAST_REG_MOVE and NXAST_REG_LOAD.
Ben Pfaff [Wed, 8 Dec 2010 00:57:12 +0000 (16:57 -0800)]
ofp-util: Group everything related to actions together in header file.
Cleanup.
Ben Pfaff [Thu, 9 Dec 2010 18:41:32 +0000 (10:41 -0800)]
Make compiler complain about unhandled OpenFlow and Nicira action types.
This should help avoid forgetting about them in the future, because the
compiler will complain about unhandled values in switch statements.
Ben Pfaff [Wed, 8 Dec 2010 00:17:14 +0000 (16:17 -0800)]
nicira-ext: Remove unused macro NICIRA_OUI_STR.
I don't know what this is good for.
Ben Pfaff [Thu, 9 Dec 2010 18:31:49 +0000 (10:31 -0800)]
ofp-print: Print OFPUTIL_NXT_FLOW_REMOVED.
Ben Pfaff [Tue, 7 Dec 2010 23:34:35 +0000 (15:34 -0800)]
ofp-util: Fix byte order of ofputil_cls_rule_from_match() parameter.
This doesn't change any generated code so it is not a bug fix, but it
makes the byte order of the 'cookie' argument clear.
Ben Pfaff [Tue, 7 Dec 2010 23:07:54 +0000 (15:07 -0800)]
ofp-print: Print OFPUTIL_NXST_AGGREGATE_REPLY.
Ben Pfaff [Tue, 7 Dec 2010 22:52:26 +0000 (14:52 -0800)]
ofp-print: Print NXST_FLOW_REQUEST and NXST_AGGREGATE_REQUEST.
This takes advantage of ofputil_decode_flow_stats_request() to get rid of
some code.
Ben Pfaff [Tue, 7 Dec 2010 22:41:19 +0000 (14:41 -0800)]
ofp-print: Print OFPUTIL_NXT_STATUS_REQUEST and OFPUTIL_NXT_STATUS_REPLY.
Ben Pfaff [Tue, 7 Dec 2010 22:36:18 +0000 (14:36 -0800)]
ofp-print: Print OFPUTIL_NXT_ROLE_REQUEST and OFPUTIL_NXT_ROLE_REPLY.
Ben Pfaff [Tue, 7 Dec 2010 22:30:07 +0000 (14:30 -0800)]
ovs-ofctl: Fix del-flows command parsing bugs.
"ovs-ofctl del-flows br0" segfaulted because do_flow_mod__() assumed that
it always had a "flow" argument, which is not true for the del-flows
command.
Beyond that, parse_ofp_flow_mod_str() rejected "ovs-ofctl del-flows
br0" because no actions were supplied, even though supplying actions
doesn't make sense for deleting flows.
This commit fixes both problems and adds a simple test that would have
caught both problems.
Bug #4112.
Ben Pfaff [Tue, 7 Dec 2010 23:45:10 +0000 (15:45 -0800)]
ofp-print: Print durations more readably.
It's easier to read "duration=1.75s" than "duration_sec=1s
duration_nsec=750000000ns".
Ben Pfaff [Tue, 7 Dec 2010 22:21:38 +0000 (14:21 -0800)]
ofp-print: Print NXST_FLOW replies.
Jean Tourrilhes [Thu, 9 Dec 2010 06:05:07 +0000 (22:05 -0800)]
dpif-netdev: Handle ECN bits when updating IP checksum.
When recalculating the checksum after a set ToS action, we were
not taking into account the ECN bits copied from the original header.
Ben Pfaff [Thu, 25 Nov 2010 00:10:11 +0000 (16:10 -0800)]
ofp-print: Improve OFPST_FLOW stats reply printing.
This fixes the spacing and prints the priority included in the message
instead of the implicit priority.
Ben Pfaff [Tue, 7 Dec 2010 23:49:36 +0000 (15:49 -0800)]
Use ofpbuf_pull() instead of ofpbuf_try_pull() where it is valid.
In each of these cases, we know that the buffer is long enough to pull
the header because ofputil_decode_msg_type() already checked for us.
Ben Pfaff [Tue, 7 Dec 2010 23:47:19 +0000 (15:47 -0800)]
ofp-util: Use ofpbuf_use_const() in a few more places.