Bug 456686

Summary: race in aio_complete() leads to process hang
Product: Red Hat Enterprise Linux 4 Reporter: Bryn M. Reeves <bmr>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: urgent    
Version: 4.7CC: dhoward, jplans, qcai, tao, vmayatsk
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:08:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 461304, 475814, 489935    

Description Bryn M. Reeves 2008-07-25 15:33:37 UTC
Description of problem:
There is a missing memory barrier in the current aio_complete in the RHEL4
kernels causing a race between read_events/aio_complete causing the thread in
read_events to sleep indefinitely, hanging the application that is waiting on
I/O completion.

This was reported upstream by Quentin Barnes of Yahoo:

http://lkml.org/lkml/2008/3/12/207

Fix has been merged in 2.6.26:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6cb2a21049b8990df4576c5fce4d48d0206c22d5

And was also accepted for 2.6.24.y:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.24.y.git;a=commit;h=0db49fc729eee503836ea12745b55f7f802d2abb

Version-Release number of selected component (if applicable):


How reproducible:
Unclear. Depends on the AIO application; Quentin reports seeing hangs virtually
100% of the time. Looking for a straightforward reproducer for this now and will
update with details when they are available.

Steps to Reproduce:
< to be filled >
  
Actual results:
Application hangs in read_events

Expected results:
No hang. AIO completes as normal.

Additional info:

Comment 1 RHEL Program Management 2008-09-03 13:12:05 UTC
Updating PM score.

Comment 2 RHEL Program Management 2008-09-22 17:53:39 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Jeff Moyer 2009-01-05 19:50:24 UTC
Patch posted for review:
http://post-office.corp.redhat.com/archives/rhkernel-list/2009-January/msg00042.html

Comment 4 Vivek Goyal 2009-01-15 14:04:05 UTC
Committed in 78.29.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 12 errata-xmlrpc 2009-05-18 19:08:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html