Bug 480937 - RHEL-4: Deadlock in Xen netfront driver.
RHEL-4: Deadlock in Xen netfront driver.
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen (Show other bugs)
All Linux
low Severity medium
: rc
: ---
Assigned To: Andrew Jones
Virtualization Bugs
Depends On:
Blocks: 458302
  Show dependency treegraph
Reported: 2009-01-21 08:49 EST by Ian Campbell
Modified: 2011-02-16 11:03 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 480939 (view as bug list)
Last Closed: 2011-02-16 11:03:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
xen-unstable 14844:abea8d171503 backported to 2.6.9-78.0.13.EL (3.80 KB, patch)
2009-01-21 08:49 EST, Ian Campbell
no flags Details | Diff
xen-unstable.hg 14851:22460cfaca71 backported to 2.6.9-78.0.13.EL (1.54 KB, patch)
2009-01-21 08:50 EST, Ian Campbell
no flags Details | Diff

  None (edit)
Description Ian Campbell 2009-01-21 08:49:14 EST
Created attachment 329602 [details]
xen-unstable 14844:abea8d171503 backported to 2.6.9-78.0.13.EL

Description of problem:

Some time ago Jeremy Fitzhardinge discovered a couple of potential deadlocks in the Xen netfront code using lockdep on pvops[0] this was fixed in xen-unstable with 14844:abea8d171503 [1] with an update based on comments by Herbert Xu in 14851:22460cfaca71 [2]

[0] http://lists.xensource.com/archives/html/xen-devel/2007-04/msg00339.html
[1] http://xenbits.xensource.com/xen-unstable.hg?cs=abea8d171503
[2] http://xenbits.xensource.com/xen-unstable.hg?cs=22460cfaca71

We have been seeing very ocasional hangs during boot of RHEL 4 and RHEL 5 at the 'Bringing up interface eth0' stage of the boot during automated testing for some time now and recently were able to obtain a backtrace of one:

Bringing up interface eth0:  SysRq : HELP : loglevel0-8 reBoot Crash tErm kIll saK showMem showPc unRaw Sync showTasks Unmount shoWcpus 

SysRq : Show Regs

Pid: 695, comm:                   ip
EIP: 0061:[<c026f5b7>] CPU: 0
EIP is at _spin_lock+0x29/0x34
 EFLAGS: 00000286    Not tainted  (2.6.9-78.0.8.EL.xs5.1.0.39xenU)
EAX: cf1102d0 EBX: cf1102d0 ECX: f5392000 EDX: cf110100
ESI: cf110000 EDI: c0323fa0 EBP: c0323000 DS: 007b ES: 007b
 [<d0881b2b>] netif_poll+0x41/0x64c [xennet]
 [<c01187f0>] __wake_up_common+0x2f/0x4b
 [<c01188d2>] complete+0x24/0x37
 [<c012d9aa>] __rcu_process_callbacks+0xf7/0x110
 [<c021c74b>] net_rx_action+0xde/0x1e1
 [<c01212ac>] __do_softirq+0x64/0xdd
 [<c010a35a>] do_softirq+0x61/0x89
 [<c0109ba6>] do_IRQ+0x1a8/0x1b5
 [<c01fc940>] evtchn_do_upcall+0x84/0xb8
 [<c01075c8>] hypervisor_callback+0x2c/0x34
 [<c014007b>] free_pages_bulk+0x12b/0x1d2
 [<c01fc8ba>] force_evtchn_callback+0xa/0xc
 [<d088074c>] network_open+0x10a/0x121 [xennet]
 [<c021b66c>] dev_open+0x2f/0x6c
 [<c021ce31>] dev_change_flags+0x4d/0xf0
 [<c025497b>] devinet_ioctl+0x2ac/0x61e
 [<c025663b>] inet_ioctl+0x77/0xa1
 [<c0213881>] sock_ioctl+0x283/0x2ae
 [<c016c976>] sys_ioctl+0x22c/0x272
 [<c01504c9>] sys_munmap+0x48/0x63
 [<c010740f>] syscall_call+0x7/0xb

Ignoring the spurious entries due to stack polution (__wake_up_common, complete,  __rcu_process_callbacks) this stack trace precisely matches the second issue described by Jeremy:

"rx_lock can also be used in softirq context, so it should be taken/released
   with spin_(un)lock_bh."

Here network_open() has taken rx_lock with plain spin_lock() and netif_poll() is called in softirq context and tries to take it again.

Version-Release number of selected component (if applicable):


Confirmed by inspection to still be present in 2.6.9-78.0.13.EL and also in RHEL 5 2.6.18-92.1.22.el5 and 2.6.18-128.el5.

How reproducible:

Our automated testing probably does several dozen RHEL 4 and RHEL 5 installs/boots each week and we've seen this a very small number of times ever so it seems to be extremely rare and very hard to trigger deliberately, certainly I've been unable to.
Comment 1 Ian Campbell 2009-01-21 08:50:05 EST
Created attachment 329603 [details]
xen-unstable.hg 14851:22460cfaca71 backported to 2.6.9-78.0.13.EL
Comment 6 Andrew Jones 2009-07-01 14:27:20 EDT
This is a difficult bug to recreate, but the proposed patches have been integrated into a test build at http://people.redhat.com/drjones/virttest/1-2/. The build is available for anyone who has seen the bug and would like to test the patches to see if it goes away.
Comment 8 RHEL Product and Program Management 2010-10-12 13:51:15 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 9 Vivek Goyal 2010-10-13 12:11:38 EDT
Committed in 89.42.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 12 errata-xmlrpc 2011-02-16 11:03:40 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.