Bug 190729 - sleeping function called from invalid context at kernel/workqueue.c
sleeping function called from invalid context at kernel/workqueue.c
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Jeff Moyer
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2006-05-04 14:47 EDT by Alan D. Brunelle
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-11-15 11:14:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
aio_complete should not drop the last reference to an ioctx (6.90 KB, patch)
2007-01-04 10:53 EST, Jeff Moyer
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0791 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6 2007-11-14 13:25:55 EST

  None (edit)
Description Alan D. Brunelle 2006-05-04 14:47:34 EDT
Description of problem:
While under heavy load (significant numbers of asynchronous direct I/Os to
multipe storage devices), I've seen this oops a handful of times (5?) over the
past year (on varous versions of RHEL4). 

Version-Release number of selected component (if applicable): 2.6.9-34.EL

How reproducible:
Not very - as noted above, only seen it a handful of times.

Steps to Reproduce:
Actual results:

Expected results:

Additional info: Oops information:

Debug: sleeping function called from invalid context at kernel/workqueue.c:264
in_atomic():1[expected: 0], irqs_disabled():0

Call Trace:
 [<a000000100016ba0>] show_stack+0x80/0xa0
                                sp=e0000040fe80f9d0 bsp=e0000040fe809480
 [<a000000100016bf0>] dump_stack+0x30/0x60
                                sp=e0000040fe80fba0 bsp=e0000040fe809468
 [<a000000100068050>] __might_sleep+0x190/0x260
                                sp=e0000040fe80fba0 bsp=e0000040fe809440
 [<a0000001000a1770>] flush_workqueue+0x30/0x140
                                sp=e0000040fe80fbb0 bsp=e0000040fe809418
 [<a00000010017b800>] __put_ioctx+0xa0/0x1a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe8093e0
 [<a00000010017cd40>] aio_complete+0x480/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809380
 [<a0000001001784f0>] finished_one_bio+0x190/0x220
                                sp=e0000040fe80fbb0 bsp=e0000040fe809348
 [<a000000100178a20>] dio_bio_complete+0x1c0/0x200
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092f0
 [<a000000100178ac0>] dio_bio_end_aio+0x60/0x80
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092d0
 [<a00000010012edf0>] bio_endio+0x110/0x1c0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809298
 [<a000000100362560>] __end_that_request_first+0x2c0/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809230
 [<a0000001003627d0>] end_that_request_chunk+0x30/0x60
                                sp=e0000040fe80fbb0 bsp=e0000040fe809200
 [<a000000200078f10>] scsi_end_request+0x50/0x2e0 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8091b0
 [<a000000200079610>] scsi_io_completion+0x2b0/0xa00 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809130
 [<a000000200026710>] sd_rw_intr+0x110/0x700 [sd_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090e0
 [<a00000020006c230>] scsi_finish_command+0x2d0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090b0
 [<a00000020006c520>] scsi_softirq+0x2c0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809070
 [<a000000100082510>] __do_softirq+0x1f0/0x240
                                sp=e0000040fe80fbc0 bsp=e0000040fe808fd8
 [<a0000001000825d0>] do_softirq+0x70/0xc0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f78
 [<a000000100015bd0>] ia64_handle_irq+0x1b0/0x1e0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a00000010000f5c0>] ia64_leave_kernel+0x0/0x260
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a0000001000160c0>] ia64_pal_call_static+0xa0/0xc0
                                sp=e0000040fe80fd90 bsp=e0000040fe808ee0
 [<a000000100017740>] default_idle+0x140/0x1e0
                                sp=e0000040fe80fd90 bsp=e0000040fe808e90
 [<a000000100017900>] cpu_idle+0x120/0x2c0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e48
 [<a00000010005b150>] start_secondary+0x2b0/0x2e0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10
 [<a000000100008180>] __end_ivt_text+0x260/0x290
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10
Comment 1 Jason Baron 2006-05-05 07:23:35 EDT
Do you actually get a system crash, or anything going wrong? or just the messages?
Comment 2 Alan D. Brunelle 2006-05-05 08:41:02 EDT
Nope - messages just logged, and the system continues onwards. I _believe_
might_sleep is just a warning mechanism: meaning that one should _not_ be
sleeping in this context, and the fact that we _might_ sleep means things aren't
quite right. [Meaning: somebody above me in the call stack is doing something
inherently wrong.]
Comment 3 Jeff Moyer 2006-09-12 09:11:19 EDT
This is indeed a corner case.  The last user of the ioctx is the I/O path
(meaning that the calling process either closed the context or went away before
the I/O completed).

I'll give this some thought.
Comment 4 Jeff Moyer 2006-12-15 11:22:05 EST
It looks like someone ran into this on a kernel.  See the thread at:

Comment 5 Jeff Moyer 2007-01-04 10:53:41 EST
Created attachment 144812 [details]
aio_complete should not drop the last reference to an ioctx

This is the fix Kenneth Chen posted for this problem.  Please try it out if you
get the chance.
Comment 6 RHEL Product and Program Management 2007-06-20 11:46:11 EDT
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.
Comment 7 Jason Baron 2007-06-20 15:46:04 EDT
committed in stream U6 build 55.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 10 errata-xmlrpc 2007-11-15 11:14:08 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.