Red Hat Bugzilla – Bug 480937
RHEL-4: Deadlock in Xen netfront driver.
Last modified: 2011-02-16 11:03:40 EST
Created attachment 329602 [details]
xen-unstable 14844:abea8d171503 backported to 2.6.9-78.0.13.EL
Description of problem:
Some time ago Jeremy Fitzhardinge discovered a couple of potential deadlocks in the Xen netfront code using lockdep on pvops this was fixed in xen-unstable with 14844:abea8d171503  with an update based on comments by Herbert Xu in 14851:22460cfaca71 
We have been seeing very ocasional hangs during boot of RHEL 4 and RHEL 5 at the 'Bringing up interface eth0' stage of the boot during automated testing for some time now and recently were able to obtain a backtrace of one:
Bringing up interface eth0: SysRq : HELP : loglevel0-8 reBoot Crash tErm kIll saK showMem showPc unRaw Sync showTasks Unmount shoWcpus
SysRq : Show Regs
Pid: 695, comm: ip
EIP: 0061:[<c026f5b7>] CPU: 0
EIP is at _spin_lock+0x29/0x34
EFLAGS: 00000286 Not tainted (2.6.9-78.0.8.EL.xs220.127.116.11xenU)
EAX: cf1102d0 EBX: cf1102d0 ECX: f5392000 EDX: cf110100
ESI: cf110000 EDI: c0323fa0 EBP: c0323000 DS: 007b ES: 007b
[<d0881b2b>] netif_poll+0x41/0x64c [xennet]
[<d088074c>] network_open+0x10a/0x121 [xennet]
Ignoring the spurious entries due to stack polution (__wake_up_common, complete, __rcu_process_callbacks) this stack trace precisely matches the second issue described by Jeremy:
"rx_lock can also be used in softirq context, so it should be taken/released
Here network_open() has taken rx_lock with plain spin_lock() and netif_poll() is called in softirq context and tries to take it again.
Version-Release number of selected component (if applicable):
Confirmed by inspection to still be present in 2.6.9-78.0.13.EL and also in RHEL 5 2.6.18-92.1.22.el5 and 2.6.18-128.el5.
Our automated testing probably does several dozen RHEL 4 and RHEL 5 installs/boots each week and we've seen this a very small number of times ever so it seems to be extremely rare and very hard to trigger deliberately, certainly I've been unable to.
Created attachment 329603 [details]
xen-unstable.hg 14851:22460cfaca71 backported to 2.6.9-78.0.13.EL
This is a difficult bug to recreate, but the proposed patches have been integrated into a test build at http://people.redhat.com/drjones/virttest/1-2/. The build is available for anyone who has seen the bug and would like to test the patches to see if it goes away.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Committed in 89.42.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.