Bug 73846

Summary: file system read call returns corrupted data
Product: [Retired] Red Hat Linux Reporter: Armin Haken <armin>
Component: libaioAssignee: Ben LaHaise <bcrl>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 6.2   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-09-12 04:27:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test app that demonstrates problem
none
sample makefile none

Description Armin Haken 2002-09-12 04:13:28 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; MSNDE; T312461)

Description of problem:
Under heavy disk system load, "read" calls will sometimes return data that 
contains a sequence of 16 wrong Bytes. It seems that high memory usage and heap 
allocation are also involved in reproducing the problem. I'm attaching an 
example app that shows the failure on each of the 3 RedHat 6.2 machines I have 
available, but does not fail on other linux machines including RedHat 7.3

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.build the attached drt.cpp (sample Makefile attached)
2.run the "drt" app.. on 2 processor systems it seems to fail more reliably 
when 2 copies of the app are running (in separate directories)
3.
	

Actual Results:  The app will print out text including "read buffer does not 
match" as well as the iteration number (out of 10) that is running. Typically I 
see around 3 failure messages per run. When a failure is detected, the correct 
and the misread buffers (length 40KB) are printed in the files errorfile_should 
and errorfile_reality. Comparing these files shows a sequence of 16 Bytes on 
which they differ.

Expected Results:  Non-error operation results in output that is only an 
iteration count from 0 to 9.

Additional info:

The behavior of the test app on all the various platforms I tried is the same 
when I build it on RedHat 6.2 as when I build it on Debian. (I link statically.)

Comment 1 Ben LaHaise 2002-09-12 04:16:25 UTC
libaio does not exist in Red Hat 6.2.

Comment 2 Armin Haken 2002-09-12 04:16:37 UTC
Created attachment 75844 [details]
Test app that demonstrates problem

Comment 3 Ben LaHaise 2002-09-12 04:22:17 UTC
What version of the kernel is this with?  At least one of the 2.2 errata
affected page cache io on smp.

Comment 4 Armin Haken 2002-09-12 04:23:22 UTC
Created attachment 75845 [details]
sample makefile

Comment 5 Armin Haken 2002-09-12 04:27:47 UTC
the kernel is 2.2.14-5.0 and the error shows up on single as well as dual 
processor machines

Comment 6 Ben LaHaise 2002-09-12 04:54:43 UTC
That kernel is obsolete and vulnerable to the aforementioned bug.  Upgrade and
reopen if the problem persists.  This is not a bug in libaio, but in older
kernels for which errata were already released.

Comment 7 Armin Haken 2002-09-16 21:24:20 UTC
The problem did indeed disappear after upgrading
to kernel 2.2.17-14