Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 190729

Summary:

sleeping function called from invalid context at kernel/workqueue.c

Product:

Red Hat Enterprise Linux 4

Reporter:

Alan D. Brunelle <alan.brunelle>

Component:

kernel

Assignee:

Jeff Moyer <jmoyer>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.0

CC:

jbaron

Target Milestone:

---

Target Release:

---

Hardware:

ia64

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0791

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-11-15 16:14:08 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
aio_complete should not drop the last reference to an ioctx	none

Description Alan D. Brunelle 2006-05-04 18:47:34 UTC

Description of problem:
While under heavy load (significant numbers of asynchronous direct I/Os to
multipe storage devices), I've seen this oops a handful of times (5?) over the
past year (on varous versions of RHEL4). 

Version-Release number of selected component (if applicable): 2.6.9-34.EL


How reproducible:
Not very - as noted above, only seen it a handful of times.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info: Oops information:

Debug: sleeping function called from invalid context at kernel/workqueue.c:264
in_atomic():1[expected: 0], irqs_disabled():0

Call Trace:
 [<a000000100016ba0>] show_stack+0x80/0xa0
                                sp=e0000040fe80f9d0 bsp=e0000040fe809480
 [<a000000100016bf0>] dump_stack+0x30/0x60
                                sp=e0000040fe80fba0 bsp=e0000040fe809468
 [<a000000100068050>] __might_sleep+0x190/0x260
                                sp=e0000040fe80fba0 bsp=e0000040fe809440
 [<a0000001000a1770>] flush_workqueue+0x30/0x140
                                sp=e0000040fe80fbb0 bsp=e0000040fe809418
 [<a00000010017b800>] __put_ioctx+0xa0/0x1a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe8093e0
 [<a00000010017cd40>] aio_complete+0x480/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809380
 [<a0000001001784f0>] finished_one_bio+0x190/0x220
                                sp=e0000040fe80fbb0 bsp=e0000040fe809348
 [<a000000100178a20>] dio_bio_complete+0x1c0/0x200
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092f0
 [<a000000100178ac0>] dio_bio_end_aio+0x60/0x80
                                sp=e0000040fe80fbb0 bsp=e0000040fe8092d0
 [<a00000010012edf0>] bio_endio+0x110/0x1c0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809298
 [<a000000100362560>] __end_that_request_first+0x2c0/0x4a0
                                sp=e0000040fe80fbb0 bsp=e0000040fe809230
 [<a0000001003627d0>] end_that_request_chunk+0x30/0x60
                                sp=e0000040fe80fbb0 bsp=e0000040fe809200
 [<a000000200078f10>] scsi_end_request+0x50/0x2e0 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8091b0
 [<a000000200079610>] scsi_io_completion+0x2b0/0xa00 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809130
 [<a000000200026710>] sd_rw_intr+0x110/0x700 [sd_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090e0
 [<a00000020006c230>] scsi_finish_command+0x2d0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe8090b0
 [<a00000020006c520>] scsi_softirq+0x2c0/0x300 [scsi_mod]
                                sp=e0000040fe80fbb0 bsp=e0000040fe809070
 [<a000000100082510>] __do_softirq+0x1f0/0x240
                                sp=e0000040fe80fbc0 bsp=e0000040fe808fd8
 [<a0000001000825d0>] do_softirq+0x70/0xc0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f78
 [<a000000100015bd0>] ia64_handle_irq+0x1b0/0x1e0
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a00000010000f5c0>] ia64_leave_kernel+0x0/0x260
                                sp=e0000040fe80fbc0 bsp=e0000040fe808f30
 [<a0000001000160c0>] ia64_pal_call_static+0xa0/0xc0
                                sp=e0000040fe80fd90 bsp=e0000040fe808ee0
 [<a000000100017740>] default_idle+0x140/0x1e0
                                sp=e0000040fe80fd90 bsp=e0000040fe808e90
 [<a000000100017900>] cpu_idle+0x120/0x2c0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e48
 [<a00000010005b150>] start_secondary+0x2b0/0x2e0
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10
 [<a000000100008180>] __end_ivt_text+0x260/0x290
                                sp=e0000040fe80fe30 bsp=e0000040fe808e10

Comment 1 Jason Baron 2006-05-05 11:23:35 UTC

Do you actually get a system crash, or anything going wrong? or just the messages?

Comment 2 Alan D. Brunelle 2006-05-05 12:41:02 UTC

Nope - messages just logged, and the system continues onwards. I _believe_
might_sleep is just a warning mechanism: meaning that one should _not_ be
sleeping in this context, and the fact that we _might_ sleep means things aren't
quite right. [Meaning: somebody above me in the call stack is doing something
inherently wrong.]

Comment 3 Jeff Moyer 2006-09-12 13:11:19 UTC

This is indeed a corner case.  The last user of the ioctx is the I/O path
(meaning that the calling process either closed the context or went away before
the I/O completed).

I'll give this some thought.

Comment 4 Jeff Moyer 2006-12-15 16:22:05 UTC

It looks like someone ran into this on a 2.6.18.4 kernel.  See the thread at:

  http://marc.theaimsgroup.com/?l=linux-ia64&m=116594483721437&w=2

Comment 5 Jeff Moyer 2007-01-04 15:53:41 UTC

Created attachment 144812 [details]
aio_complete should not drop the last reference to an ioctx

This is the fix Kenneth Chen posted for this problem.  Please try it out if you
get the chance.

Comment 6 RHEL Program Management 2007-06-20 15:46:11 UTC

This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 7 Jason Baron 2007-06-20 19:46:04 UTC

committed in stream U6 build 55.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Comment 10 errata-xmlrpc 2007-11-15 16:14:08 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html