Bug 190729

Summary: sleeping function called from invalid context at kernel/workqueue.c
Product: Red Hat Enterprise Linux 4 Reporter: Alan D. Brunelle <alan.brunelle>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0791 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:14:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
aio_complete should not drop the last reference to an ioctx none

Description Alan D. Brunelle 2006-05-04 18:47:34 UTC
Description of problem:
While under heavy load (significant numbers of asynchronous direct I/Os to
multipe storage devices), I've seen this oops a handful of times (5?) over the
past year (on varous versions of RHEL4). 

Version-Release number of selected component (if applicable): 2.6.9-34.EL


How reproducible:
Not very - as noted above, only seen it a handful of times.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info: Oops information:

Debug: sleeping function called from invalid context at kernel/workqueue.c:264
in_atomic():1[expected: 0], irqs_disabled():0

Call Trace:
 [<a000000100016ba0>] show_stack+0x80/0xa0
                                sp=e0000040fe80f9d0 bsp=e0000040fe809480
 [<a000000100016bf0>] dump_stack+0x30/0x60
                                sp=e0000040fe80fba0 bsp=e0000040fe809468
 [<a000000100068050>] __might_sleep+0x190/0x260
                                sp=e0000040fe80fba0 bsp=e0000040fe809440
 [<a0000001000a1770>] flush_workqueue+0x30/0x140
                                sp=e0000040fe80fbb0 bsp=e0000040fe809418
 [<a00000010017b800>] __put_ioctx+0xa0/0x1a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe8093e0
 [<a00000010017cd40>] aio_complete+0x480/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809380
 [<a0000001001784f0>] finished_one_bio+0x190/0x220
                                sp=e0000040fe80fbb0 bsp=e0000040fe809348
 [<a000000100178a20>] dio_bio_complete+0x1c0/0x200
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092f0
 [<a000000100178ac0>] dio_bio_end_aio+0x60/0x80
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092d0
 [<a00000010012edf0>] bio_endio+0x110/0x1c0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809298
 [<a000000100362560>] __end_that_request_first+0x2c0/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809230
 [<a0000001003627d0>] end_that_request_chunk+0x30/0x60
                                sp=e0000040fe80fbb0 bsp=e0000040fe809200
 [<a000000200078f10>] scsi_end_request+0x50/0x2e0 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8091b0
 [<a000000200079610>] scsi_io_completion+0x2b0/0xa00 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809130
 [<a000000200026710>] sd_rw_intr+0x110/0x700 [sd_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090e0
 [<a00000020006c230>] scsi_finish_command+0x2d0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090b0
 [<a00000020006c520>] scsi_softirq+0x2c0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809070
 [<a000000100082510>] __do_softirq+0x1f0/0x240
                                sp=e0000040fe80fbc0 bsp=e0000040fe808fd8
 [<a0000001000825d0>] do_softirq+0x70/0xc0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f78
 [<a000000100015bd0>] ia64_handle_irq+0x1b0/0x1e0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a00000010000f5c0>] ia64_leave_kernel+0x0/0x260
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a0000001000160c0>] ia64_pal_call_static+0xa0/0xc0
                                sp=e0000040fe80fd90 bsp=e0000040fe808ee0
 [<a000000100017740>] default_idle+0x140/0x1e0
                                sp=e0000040fe80fd90 bsp=e0000040fe808e90
 [<a000000100017900>] cpu_idle+0x120/0x2c0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e48
 [<a00000010005b150>] start_secondary+0x2b0/0x2e0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10
 [<a000000100008180>] __end_ivt_text+0x260/0x290
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10

Comment 1 Jason Baron 2006-05-05 11:23:35 UTC
Do you actually get a system crash, or anything going wrong? or just the messages?

Comment 2 Alan D. Brunelle 2006-05-05 12:41:02 UTC
Nope - messages just logged, and the system continues onwards. I _believe_
might_sleep is just a warning mechanism: meaning that one should _not_ be
sleeping in this context, and the fact that we _might_ sleep means things aren't
quite right. [Meaning: somebody above me in the call stack is doing something
inherently wrong.]

Comment 3 Jeff Moyer 2006-09-12 13:11:19 UTC
This is indeed a corner case.  The last user of the ioctx is the I/O path
(meaning that the calling process either closed the context or went away before
the I/O completed).

I'll give this some thought.

Comment 4 Jeff Moyer 2006-12-15 16:22:05 UTC
It looks like someone ran into this on a 2.6.18.4 kernel.  See the thread at:

  http://marc.theaimsgroup.com/?l=linux-ia64&m=116594483721437&w=2


Comment 5 Jeff Moyer 2007-01-04 15:53:41 UTC
Created attachment 144812 [details]
aio_complete should not drop the last reference to an ioctx

This is the fix Kenneth Chen posted for this problem.  Please try it out if you
get the chance.

Comment 6 RHEL Program Management 2007-06-20 15:46:11 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 7 Jason Baron 2007-06-20 19:46:04 UTC
committed in stream U6 build 55.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 10 errata-xmlrpc 2007-11-15 16:14:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html