Bug 426031

Summary: rapid block device plug / unplug leads to kernel crash and/or soft lockup
Product: Red Hat Enterprise Linux 4 Reporter: Don Dutile (Red Hat) <ddutile>
Component: kernel-xenAssignee: Don Dutile (Red Hat) <ddutile>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 4.6CC: xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:23:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 308971    
Bug Blocks:    

Description Don Dutile (Red Hat) 2007-12-17 21:25:42 UTC
+++ This bug was initially created as a clone of Bug #308971 +++

Description of problem:
Rapidly pluging and unpluggin a block device eventually leads to a kernel crash
when the device is unplugged before fully established due to a double free.

This effects RHEL5 and RHEL4u5. The fix is up at
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/c1c57fea77e9 and/or
http://xenbits.xensource.com/kernels/rhel4x.hg?rev/5bccd45a081d

The fix depends on 
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017 and/or
http://xenbits.xensource.com/kernels/rhel4x.hg?rev/156e3eaca552 which are
attached to #247265 but only applied to RHEL5, I think.

Version-Release number of selected component (if applicable):
2.6.18-8.el5 and 2.6.9-55.EL

How reproducible:
A simple loop which attaches and detaches a block device as quick as possible is
sufficient.
  
Actual results:
The backtrace on 2.6.9-55.EL:

 [<c043ea67>] softlockup_tick+0x98/0xa6

 [<c0408b7d>] timer_interrupt+0x504/0x557

 [<c043ec9b>] handle_IRQ_event+0x27/0x51

 [<c043ed58>] __do_IRQ+0x93/0xe8

 [<c040672b>] do_IRQ+0x93/0xae

 [<c053a045>] evtchn_do_upcall+0x64/0x9b

 [<c0404ec5>] hypervisor_callback+0x3d/0x48

 [<c045007b>] __pte_alloc+0x1e1/0x21a

 [<c05f4e18>] _spin_lock+0x7/0xf

 [<c05f4179>] __mutex_lock_slowpath+0x19/0x74

 [<c04d36d0>] __next_cpu+0x12/0x21

 [<c045e2ee>] kfree+0xe/0x77

 [<c05f41e3>] .text.lock.mutex+0xf/0x14

 [<c04d3e3d>] kobject_cleanup+0x4a/0x5e

 [<c04c6467>] elevator_exit+0xe/0x32

 [<c04c931c>] blk_cleanup_queue+0x2d/0x36

 [<d1029e6d>] xlvbd_del+0x4d/0x56 [xenblk]

 [<d10293f4>] blkfront_closing+0x45/0x4f [xenblk]

 [<d1029e1b>] blkif_release+0x34/0x39 [xenblk]

 [<c0468cb4>] __blkdev_put+0x50/0x123

 [<c04950d8>] register_disk+0x117/0x155

 [<c04cc39c>] add_disk+0x2b/0x36

 [<c04cbd9c>] exact_match+0x0/0x4

 [<c04cc248>] exact_lock+0x0/0xd

 [<d1029d7f>] backend_changed+0xf3/0x153 [xenblk]

 [<c053fc58>] otherend_changed+0x74/0x79

 [<c053e442>] xenwatch_handle_callback+0x12/0x44

 [<c053ee3a>] xenwatch_thread+0xf1/0x107

 [<c042cbe1>] autoremove_wake_function+0x0/0x2d

 [<c053ed49>] xenwatch_thread+0x0/0x107

 [<c042cb15>] kthread+0xc0/0xeb

 [<c042ca55>] kthread+0x0/0xeb

 [<c04029b5>] kernel_thread_helper+0x5/0xb

-- Additional comment from ijc.uk on 2007-12-14 04:17 EST --
Created an attachment (id=288811)
linux-2.6.180-xen 217:c1c57fea77e9 ported to 2.6.18-53.el5


-- Additional comment from ijc.uk on 2007-12-14 04:18 EST --
Created an attachment (id=288821)
linux-2.6.18-xen 217:c1c57fea77e9 ported to 2.6.9-67.EL


-- Additional comment from ddutile on 2007-12-14 16:07 EST --
Simple fix for corner case; working on rhel5 patch & test right now.

Comment 1 Don Dutile (Red Hat) 2007-12-17 21:35:47 UTC
Posted a patch to 4.7.


Comment 2 Bill Burns 2008-01-04 19:16:44 UTC
Seting dev ack for Don Dutile.

Comment 3 Vivek Goyal 2008-02-28 23:29:33 UTC
Committed in 68.15.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 6 errata-xmlrpc 2008-07-24 19:23:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html