Bug 426031 - rapid block device plug / unplug leads to kernel crash and/or soft lockup
rapid block device plug / unplug leads to kernel crash and/or soft lockup
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen (Show other bugs)
All Linux
low Severity low
: rc
: ---
Assigned To: Don Dutile
Martin Jenner
Depends On: 308971
  Show dependency treegraph
Reported: 2007-12-17 16:25 EST by Don Dutile
Modified: 2008-07-24 15:23 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-07-24 15:23:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Don Dutile 2007-12-17 16:25:42 EST
+++ This bug was initially created as a clone of Bug #308971 +++

Description of problem:
Rapidly pluging and unpluggin a block device eventually leads to a kernel crash
when the device is unplugged before fully established due to a double free.

This effects RHEL5 and RHEL4u5. The fix is up at
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/c1c57fea77e9 and/or

The fix depends on 
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017 and/or
http://xenbits.xensource.com/kernels/rhel4x.hg?rev/156e3eaca552 which are
attached to #247265 but only applied to RHEL5, I think.

Version-Release number of selected component (if applicable):
2.6.18-8.el5 and 2.6.9-55.EL

How reproducible:
A simple loop which attaches and detaches a block device as quick as possible is
Actual results:
The backtrace on 2.6.9-55.EL:

 [<c043ea67>] softlockup_tick+0x98/0xa6

 [<c0408b7d>] timer_interrupt+0x504/0x557

 [<c043ec9b>] handle_IRQ_event+0x27/0x51

 [<c043ed58>] __do_IRQ+0x93/0xe8

 [<c040672b>] do_IRQ+0x93/0xae

 [<c053a045>] evtchn_do_upcall+0x64/0x9b

 [<c0404ec5>] hypervisor_callback+0x3d/0x48

 [<c045007b>] __pte_alloc+0x1e1/0x21a

 [<c05f4e18>] _spin_lock+0x7/0xf

 [<c05f4179>] __mutex_lock_slowpath+0x19/0x74

 [<c04d36d0>] __next_cpu+0x12/0x21

 [<c045e2ee>] kfree+0xe/0x77

 [<c05f41e3>] .text.lock.mutex+0xf/0x14

 [<c04d3e3d>] kobject_cleanup+0x4a/0x5e

 [<c04c6467>] elevator_exit+0xe/0x32

 [<c04c931c>] blk_cleanup_queue+0x2d/0x36

 [<d1029e6d>] xlvbd_del+0x4d/0x56 [xenblk]

 [<d10293f4>] blkfront_closing+0x45/0x4f [xenblk]

 [<d1029e1b>] blkif_release+0x34/0x39 [xenblk]

 [<c0468cb4>] __blkdev_put+0x50/0x123

 [<c04950d8>] register_disk+0x117/0x155

 [<c04cc39c>] add_disk+0x2b/0x36

 [<c04cbd9c>] exact_match+0x0/0x4

 [<c04cc248>] exact_lock+0x0/0xd

 [<d1029d7f>] backend_changed+0xf3/0x153 [xenblk]

 [<c053fc58>] otherend_changed+0x74/0x79

 [<c053e442>] xenwatch_handle_callback+0x12/0x44

 [<c053ee3a>] xenwatch_thread+0xf1/0x107

 [<c042cbe1>] autoremove_wake_function+0x0/0x2d

 [<c053ed49>] xenwatch_thread+0x0/0x107

 [<c042cb15>] kthread+0xc0/0xeb

 [<c042ca55>] kthread+0x0/0xeb

 [<c04029b5>] kernel_thread_helper+0x5/0xb

-- Additional comment from ijc@hellion.org.uk on 2007-12-14 04:17 EST --
Created an attachment (id=288811)
linux-2.6.180-xen 217:c1c57fea77e9 ported to 2.6.18-53.el5

-- Additional comment from ijc@hellion.org.uk on 2007-12-14 04:18 EST --
Created an attachment (id=288821)
linux-2.6.18-xen 217:c1c57fea77e9 ported to 2.6.9-67.EL

-- Additional comment from ddutile@redhat.com on 2007-12-14 16:07 EST --
Simple fix for corner case; working on rhel5 patch & test right now.
Comment 1 Don Dutile 2007-12-17 16:35:47 EST
Posted a patch to 4.7.
Comment 2 Bill Burns 2008-01-04 14:16:44 EST
Seting dev ack for Don Dutile.
Comment 3 Vivek Goyal 2008-02-28 18:29:33 EST
Committed in 68.15.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 6 errata-xmlrpc 2008-07-24 15:23:30 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.