Bug 839384

Summary: kernel hangs up in netpoll
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Vagin <avagin>
Component: kernelAssignee: Rashid Khan <rkhan>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: amwang, kdube, khorenko, manuel, rkhan, tgraf
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-08 07:53:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
console.log none

Description Andrew Vagin 2012-07-11 19:20:10 UTC
Created attachment 597656 [details]
console.log

Description of problem:
Non-debug kernel hangs up, the debug kernel reports a problem:
<3>BUG: sleeping function called from invalid context at kernel/mutex.c:287
<0>BUG: spinlock recursion on CPU#0, brctl/4715 (Not tainted)
<0> lock: ffffffffa02d3000, .magic: dead4ead, .owner: brctl/4715, .owner_cpu: 

Version-Release number of selected component (if applicable):
2.6.32-279.el6.x86_64

How reproducible:
100%


Steps to Reproduce:
1. Set up netconsole in eth0
2. Create veth devices
   # ip link create type veth
   # ip link set up dev veth1
   # ip link set up dev veth0
3. Create bridge and add veth1 to it
   # brctl addbr br0
   # brctl addif br0 veth1
4. Add eth0 to br0
   # brctl addif br0 eth0
  
Actual results:
kernel hangs up

Expected results:
kernel should not hang

Comment 2 Andrew Vagin 2012-07-11 19:36:50 UTC
The mainstrem kernel doesn't hangs in this case and reports following messages in log:
netconsole: network logging stopped on interface eth0 as it is joining a master device
device eth0 entered promiscuous mode
br0: port 2(eth0) entered forwarding state
br0: port 2(eth0) entered forwarding state

Comment 3 Andrew Vagin 2012-07-12 07:59:32 UTC
Looks like the following commit should be back-ported.

commit 13f172ff26563995049abe73f6eeba828de3c09d
Author: Neil Horman <nhorman>
Date:   Fri Apr 22 08:10:59 2011 +0000

    netconsole: fix deadlock when removing net driver that netconsole is using (v2)
    
    A deadlock was reported to me recently that occured when netconsole was being
    used in a virtual guest.  If the virtio_net driver was removed while netconsole
    was setup to use an interface that was driven by that driver, the guest
    deadlocked.  No backtrace was provided because netconsole was the only console
    configured, but it became clear pretty quickly what the problem was.  In
    netconsole_netdev_event, if we get an unregister event, we call
    __netpoll_cleanup with the target_list_lock held and irqs disabled.
    __netpoll_cleanup can, if pending netpoll packets are waiting call
    cancel_delayed_work_sync, which is a sleeping path.  the might_sleep call in
    that path gets triggered, causing a console warning to be issued.  The
    netconsole write handler of course tries to take the target_list_lock again,
    which we already hold, causing deadlock.
    
    The fix is pretty striaghtforward.  Simply drop the target_list_lock and
    re-enable irqs prior to calling __netpoll_cleanup, the re-acquire the lock, and
    restart the loop.  Confirmed by myself to fix the problem reported.
    
    Signed-off-by: Neil Horman <nhorman>
    CC: "David S. Miller" <davem>
    Signed-off-by: David S. Miller <davem>

Comment 4 Rashid Khan 2012-07-26 20:26:54 UTC
*** Bug 839381 has been marked as a duplicate of this bug. ***

Comment 5 Cong Wang 2012-08-08 07:53:27 UTC

*** This bug has been marked as a duplicate of bug 769734 ***