Bug 190729 - sleeping function called from invalid context at kernel/workqueue.c
sleeping function called from invalid context at kernel/workqueue.c
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Jeffrey Moyer
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-04 14:47 EDT by Alan D. Brunelle
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-15 11:14:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
aio_complete should not drop the last reference to an ioctx (6.90 KB, patch)
2007-01-04 10:53 EST, Jeffrey Moyer
no flags Details | Diff

  None (edit)
Description Alan D. Brunelle 2006-05-04 14:47:34 EDT
Description of problem:
While under heavy load (significant numbers of asynchronous direct I/Os to
multipe storage devices), I've seen this oops a handful of times (5?) over the
past year (on varous versions of RHEL4). 

Version-Release number of selected component (if applicable): 2.6.9-34.EL


How reproducible:
Not very - as noted above, only seen it a handful of times.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info: Oops information:

Debug: sleeping function called from invalid context at kernel/workqueue.c:264
in_atomic():1[expected: 0], irqs_disabled():0

Call Trace:
 [<a000000100016ba0>] show_stack+0x80/0xa0
                                sp=e0000040fe80f9d0 bsp=e0000040fe809480
 [<a000000100016bf0>] dump_stack+0x30/0x60
                                sp=e0000040fe80fba0 bsp=e0000040fe809468
 [<a000000100068050>] __might_sleep+0x190/0x260
                                sp=e0000040fe80fba0 bsp=e0000040fe809440
 [<a0000001000a1770>] flush_workqueue+0x30/0x140
                                sp=e0000040fe80fbb0 bsp=e0000040fe809418
 [<a00000010017b800>] __put_ioctx+0xa0/0x1a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe8093e0
 [<a00000010017cd40>] aio_complete+0x480/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809380
 [<a0000001001784f0>] finished_one_bio+0x190/0x220
                                sp=e0000040fe80fbb0 bsp=e0000040fe809348
 [<a000000100178a20>] dio_bio_complete+0x1c0/0x200
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092f0
 [<a000000100178ac0>] dio_bio_end_aio+0x60/0x80
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092d0
 [<a00000010012edf0>] bio_endio+0x110/0x1c0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809298
 [<a000000100362560>] __end_that_request_first+0x2c0/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809230
 [<a0000001003627d0>] end_that_request_chunk+0x30/0x60
                                sp=e0000040fe80fbb0 bsp=e0000040fe809200
 [<a000000200078f10>] scsi_end_request+0x50/0x2e0 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8091b0
 [<a000000200079610>] scsi_io_completion+0x2b0/0xa00 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809130
 [<a000000200026710>] sd_rw_intr+0x110/0x700 [sd_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090e0
 [<a00000020006c230>] scsi_finish_command+0x2d0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090b0
 [<a00000020006c520>] scsi_softirq+0x2c0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809070
 [<a000000100082510>] __do_softirq+0x1f0/0x240
                                sp=e0000040fe80fbc0 bsp=e0000040fe808fd8
 [<a0000001000825d0>] do_softirq+0x70/0xc0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f78
 [<a000000100015bd0>] ia64_handle_irq+0x1b0/0x1e0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a00000010000f5c0>] ia64_leave_kernel+0x0/0x260
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a0000001000160c0>] ia64_pal_call_static+0xa0/0xc0
                                sp=e0000040fe80fd90 bsp=e0000040fe808ee0
 [<a000000100017740>] default_idle+0x140/0x1e0
                                sp=e0000040fe80fd90 bsp=e0000040fe808e90
 [<a000000100017900>] cpu_idle+0x120/0x2c0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e48
 [<a00000010005b150>] start_secondary+0x2b0/0x2e0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10
 [<a000000100008180>] __end_ivt_text+0x260/0x290
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10
Comment 1 Jason Baron 2006-05-05 07:23:35 EDT
Do you actually get a system crash, or anything going wrong? or just the messages?
Comment 2 Alan D. Brunelle 2006-05-05 08:41:02 EDT
Nope - messages just logged, and the system continues onwards. I _believe_
might_sleep is just a warning mechanism: meaning that one should _not_ be
sleeping in this context, and the fact that we _might_ sleep means things aren't
quite right. [Meaning: somebody above me in the call stack is doing something
inherently wrong.]
Comment 3 Jeffrey Moyer 2006-09-12 09:11:19 EDT
This is indeed a corner case.  The last user of the ioctx is the I/O path
(meaning that the calling process either closed the context or went away before
the I/O completed).

I'll give this some thought.
Comment 4 Jeffrey Moyer 2006-12-15 11:22:05 EST
It looks like someone ran into this on a 2.6.18.4 kernel.  See the thread at:

  http://marc.theaimsgroup.com/?l=linux-ia64&m=116594483721437&w=2
Comment 5 Jeffrey Moyer 2007-01-04 10:53:41 EST
Created attachment 144812 [details]
aio_complete should not drop the last reference to an ioctx

This is the fix Kenneth Chen posted for this problem.  Please try it out if you
get the chance.
Comment 6 RHEL Product and Program Management 2007-06-20 11:46:11 EDT
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.
Comment 7 Jason Baron 2007-06-20 15:46:04 EDT
committed in stream U6 build 55.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 10 errata-xmlrpc 2007-11-15 11:14:08 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html

Note You need to log in before you can comment on or make changes to this bug.