Bug 198859

Summary: race in aio io_submit write/read
Product: Red Hat Enterprise Linux 4 Reporter: Rafal Wijata <wijata>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: i-kitayama, jbaron
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0791 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:14:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 245197    
Bug Blocks: 234251, 236328, 245198    
Attachments:
Description Flags
testcase for shortread none

Description Rafal Wijata 2006-07-14 09:13:25 UTC
Description of problem:
See http://bugzilla.kernel.org/show_bug.cgi?id=6831

Version-Release number of selected component (if applicable):
See http://bugzilla.kernel.org/show_bug.cgi?id=6831

How reproducible:
See http://bugzilla.kernel.org/show_bug.cgi?id=6831

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jeff Moyer 2006-08-09 17:52:58 UTC
I'll work with Zach on this.

Comment 2 Jeff Moyer 2007-01-12 23:04:48 UTC
I put together a kernel that includes Zach's patch set.  Could you please test
with this kernel to ensure that the bug is fixed for you?  I will test this on
Monday, but for now my RHEL 4 test box is toast.

Comment 3 Rafal Wijata 2007-01-15 10:27:27 UTC
Since there is a testcase, I'll pass over. Sorry have other stuff todo.

Comment 4 Jeff Moyer 2007-01-15 15:14:34 UTC
No problem, I'll take care of it.

Thanks!

Comment 5 Jeff Moyer 2007-01-16 18:01:20 UTC
The kernel which includes Zach's patch set (found at
http://people.redhat.com/jmoyer/dio/) fixes this problem in my test environment.

Comment 6 Rafal Wijata 2007-02-26 13:24:49 UTC
I have some bad news. It looks like the race was fixed, but another bug was
introduced.
First i encountered following scenario:
aio read issued to the file - OK
exactly same aio read issued to same file - returned less data than the first.
of course between those two reads there were some appends to the file.

Decided to create testcase, success. Moreover, upon the testcase failure
bug210281 was hit as well.

The test is ugly, and need iterating, but after 5 hours it fired at mine
machine. I hope the test is valid(need verification), but hitting bug210281
suggests it's valid.

compile(without DO_SYNCH), run as: while ./aio_backread; do true; done

Comment 7 Rafal Wijata 2007-02-26 13:28:06 UTC
Created attachment 148791 [details]
testcase for shortread

Tested on Tyan(something) 2*opteron275 with sata disk WD3200JS-00P attached to
sata_sil(onboard)

Comment 8 Rafal Wijata 2007-02-26 13:42:35 UTC
The kernel version: Linux server 2.6.9-42.27.EL.dio.2smp #1 SMP Mon Jan 15
15:05:46 EST 2007 i686 athlon i386 GNU/Linux

Comment 9 Rafal Wijata 2007-02-26 13:56:27 UTC
> compile(without DO_SYNCH)
in fact compiling with the DO_SYNCH (gcc aio_backread.c -DDO_SYNCH -lpthread
-laio) made the bug rendering faster(but no bug210281). No idea why...
Still trying on official RH kernel...

Comment 10 Jeff Moyer 2007-02-26 19:33:35 UTC
Thanks for your continued help on this problem;  it is greatly appreciated.  I
have some comments on the code for you, and some test results.

I'm not sure you should use the same buffer for concurrent reads and writes.  It
looks like you meant to initialize the write iocbs with buf.  Is that right?

The DO_SYNCH code looks incorrect to me.  You should guard access to
submittedSize with a mutex/conditional pair.

I was able to reproduce the short read twice so far.  In both cases, I got the
following output:
  short read: expected at least 1024 bytes, receiced 0

I think the expected size of 1024 can only happen after the very first set of
writes completes, right?  Is this consistent with the failures you've witnessed?

Comment 11 Jeff Moyer 2007-02-26 19:58:01 UTC
Sure enough, I modified the source code to only issue a dozen reads and then to
cancel the write thread and exit the program.  I can now reproduce the short
read in a matter of seconds.

Comment 12 Rafal Wijata 2007-02-27 08:09:31 UTC
> looks like you meant to initialize the write iocbs with buf.  Is that right?
Right, my mistake.

> submittedSize with a mutex/conditional pair
possibly(or atomic asm op), but: int writes are atomic anyway on x86, and the
bug appears without synch as well.

> I can now reproduce the short read in a matter of seconds.
glad You could improve the testcase.

BTW: official RH kernel 2.6.9-42.0.3.ELsmp with the synch was working well for
20hours, so I guess the patch introduced something malicious.

Comment 13 RHEL Program Management 2007-04-27 19:24:01 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 14 Jason Baron 2007-06-20 19:45:25 UTC
committed in stream U6 build 55.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 17 John Poelstra 2007-08-29 16:35:30 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6 Beta released on RHN (also available at partners.redhat.com).  

Requested action: Please verify that your issue is fixed to ensure that it is
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 18 John Poelstra 2007-09-05 22:26:43 UTC
A fix for this issue should have been included in the packages contained in 
the RHEL4.6-Snapshot1 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed to ensure that it is 
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed, 
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent 
symptoms of the problem you are having and change the status of the bug to 
FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test 
results to Issue Tracker.  If you need assistance accessing 
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 19 John Poelstra 2007-09-12 00:42:26 UTC
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on
partners.redhat.com.  

Please verify that your issue is fixed to ensure that it is included in this
update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 20 John Poelstra 2007-09-20 04:30:49 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6-Snapshot3 on partners.redhat.com.  

Please verify that your issue is fixed to ensure that it is included in this
update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.


Comment 21 John Poelstra 2007-09-26 23:36:16 UTC
A fix for this issue should be included in the packages contained in
RHEL4.6-Snapshot4--available now on partners.redhat.com.  

Please verify that your issue is fixed ASAP to ensure that it is included in
this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 22 John Poelstra 2007-10-05 02:58:11 UTC
A fix for this issue should be included in the packages contained in
RHEL4.6-Snapshot5--available now on partners.redhat.com.  

Please verify that your issue is fixed ASAP to ensure that it is included in
this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 23 John Poelstra 2007-10-11 03:09:48 UTC
A fix for this issue should be included in the packages contained in
RHEL4.6-Snapshot6--available now on partners.redhat.com.  

Please verify that your issue is fixed ASAP to ensure that it is included in
this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.


Comment 24 John Poelstra 2007-10-18 18:53:58 UTC
A fix for this issue should be included in the packages contained in 
RHEL4.6-Snapshot7--available now on partners.redhat.com.  

IMPORTANT: This is the last opportunity to confirm that your issue is fixed in 
the RHEL4.6 update release.

After you (Red Hat Partner) have verified that this issue has been addressed, 
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent 
symptoms of the problem you are having and change the status of the bug to 
FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test 
results to Issue Tracker.  If you need assistance accessing 
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 27 errata-xmlrpc 2007-11-15 16:14:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html