Make dumping large numbers of flows possible.
This changes the kernel switch implementation to use the kernel Netlink
"dump" interface to allow flow stats that don't fit in the socket buffer
to be dumped gradually as the caller drains the socket buffer.
One of the changes here is a bug fix for nla_unreserve. Because Netlink
attributes' lengths are rounded up to a multiple of 4 bytes, reducing
the length of the payload by N bytes doesn't necessarily reduce the
length of the skb by N bytes. Instead, we need to know the original
length and final length of the attribute. This means that using 'len'
as a difference in bytes doesn't really make sense, so this changes
'len' to be the new length of the attribute payload and renames the
function to nla_shrink to (IMO) better reflect what it is now doing.
Since we have to release the RCU read lock between calls to the dump
function, we need table iterators that persist across RCU epochs. One
way to do this would be to add new "iterator_save" and "iterator_restore"
functions, but this seemed like overkill since there would then be a
total of 5 iterator functions that have only one user (flow stats dumping).
Instead, this patch refactors table iteration into a single "iterate"
function that takes a callback. This simplifies the table iteration code
significantly.
This change also modifies dpctl to understand the new format of flow
stats.