datapath: Fix deadlock in network device notifier.
authorBen Pfaff <blp@nicira.com>
Wed, 7 Jan 2009 23:19:09 +0000 (15:19 -0800)
committerBen Pfaff <blp@nicira.com>
Thu, 8 Jan 2009 21:22:56 +0000 (13:22 -0800)
commit95ceee731ab76d8eb84669a9a285897c96217b8e
tree0a172cd7d131a712f7e87471c3bb4a620593d4ca
parent9d7fcda67b89ab31d73395924e650933cb97c7e3
datapath: Fix deadlock in network device notifier.

When a network device that is part of a datapath is destroyed, the
notifier handler in dp_device_event() calls dp_del_switch_port() to
delete the switch port.  dp_del_switch_port() then calls rtnl_lock()
to drop the device's promiscuous count.  But the RTNL lock has already
been taken at this point, so taking it recursively deadlocks.

The obvious fix is for dp_del_switch_port()'s caller to take the RTNL
lock for it, if it is not already taken.  This, however, causes a
different deadlock, as dp_del_switch_port() needs dp_mutex and that
means that we are nesting dp_mutex and the RTNL both ways.

The fix used here is to always nest dp_mutex inside the RTNL lock and
never vice versa.

    =============================================
    [ INFO: possible recursive locking detected ]
    ---------------------------------------------
    tunctl/2934 is trying to acquire lock:
     (rtnl_mutex){--..}, at: [<c029667d>] mutex_lock+0x1c/0x1f

    but task is already holding lock:
     (rtnl_mutex){--..}, at: [<c029667d>] mutex_lock+0x1c/0x1f

    other info that might help us debug this:
    1 lock held by tunctl/2934:
     #0:  (rtnl_mutex){--..}, at: [<c029667d>] mutex_lock+0x1c/0x1f

    stack backtrace:
     [<c01040cf>] show_trace+0xd/0x10
     [<c0104655>] dump_stack+0x19/0x1b
     [<c012f897>] __lock_acquire+0x747/0x9a6
     [<c0130058>] lock_acquire+0x4b/0x6b
     [<c029651b>] __mutex_lock_slowpath+0xb0/0x1f6
     [<c029667d>] mutex_lock+0x1c/0x1f
     [<c023f045>] rtnl_lock+0xd/0xf
     [<c89703eb>] dp_del_switch_port+0x2d/0xb9 [openflow_mod]
     [<c8972554>] dp_device_event+0x84/0xb0 [openflow_mod]
     [<c0125033>] notifier_call_chain+0x20/0x31
     [<c012506d>] raw_notifier_call_chain+0x8/0xa
     [<c0237e75>] unregister_netdevice+0x129/0x1ce
     [<c020ed97>] tun_chr_close+0x5e/0x69
     [<c01573a5>] __fput+0xb3/0x15e
     [<c0157467>] fput+0x17/0x19
     [<c0154de1>] filp_close+0x51/0x5b
     [<c011aeca>] put_files_struct+0x6d/0xa9
     [<c011becf>] do_exit+0x206/0x790
     [<c011c4d2>] sys_exit_group+0x0/0x11
     [<c011c4e1>] sys_exit_group+0xf/0x11
     [<c0102ab7>] syscall_call+0x7/0xb

Problem identified with this script (in conjunction with the brcompat
module):

    #! /bin/sh
    brctl delbr a
    brctl delbr b
    brctl delbr c
    set -x
    stress () {
        bridge=$1
        shift
        while true; do
            brctl addbr $bridge; brctl show
            for iface
            do
                case $iface in
                    tap*) tunctl -t $iface ;;
                esac
                brctl addif $bridge $iface
            done
            for iface
            do
                brctl delif $bridge $iface
                case $iface in
                    tap*) tunctl -d $iface ;;
                esac
            done
            brctl delbr $bridge; brctl show
        done
    }
    stress a tap0 tap1 tap2 &
    stress b tap3 tap4 tap5 &
    trap 'killall stress-brctl' 0 SIGINT
    wait
datapath/datapath.c
datapath/dp_dev.c
datapath/dp_notify.c