Bug 73846 - file system read call returns corrupted data
file system read call returns corrupted data
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: libaio (Show other bugs)
6.2
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Ben LaHaise
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-09-12 00:13 EDT by Armin Haken
Modified: 2008-05-01 11:38 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-09-12 00:27:54 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test app that demonstrates problem (2.78 KB, text/plain)
2002-09-12 00:16 EDT, Armin Haken
no flags Details
sample makefile (47 bytes, text/plain)
2002-09-12 00:23 EDT, Armin Haken
no flags Details

  None (edit)
Description Armin Haken 2002-09-12 00:13:28 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; MSNDE; T312461)

Description of problem:
Under heavy disk system load, "read" calls will sometimes return data that 
contains a sequence of 16 wrong Bytes. It seems that high memory usage and heap 
allocation are also involved in reproducing the problem. I'm attaching an 
example app that shows the failure on each of the 3 RedHat 6.2 machines I have 
available, but does not fail on other linux machines including RedHat 7.3

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.build the attached drt.cpp (sample Makefile attached)
2.run the "drt" app.. on 2 processor systems it seems to fail more reliably 
when 2 copies of the app are running (in separate directories)
3.
	

Actual Results:  The app will print out text including "read buffer does not 
match" as well as the iteration number (out of 10) that is running. Typically I 
see around 3 failure messages per run. When a failure is detected, the correct 
and the misread buffers (length 40KB) are printed in the files errorfile_should 
and errorfile_reality. Comparing these files shows a sequence of 16 Bytes on 
which they differ.

Expected Results:  Non-error operation results in output that is only an 
iteration count from 0 to 9.

Additional info:

The behavior of the test app on all the various platforms I tried is the same 
when I build it on RedHat 6.2 as when I build it on Debian. (I link statically.)
Comment 1 Ben LaHaise 2002-09-12 00:16:25 EDT
libaio does not exist in Red Hat 6.2.
Comment 2 Armin Haken 2002-09-12 00:16:37 EDT
Created attachment 75844 [details]
Test app that demonstrates problem
Comment 3 Ben LaHaise 2002-09-12 00:22:17 EDT
What version of the kernel is this with?  At least one of the 2.2 errata
affected page cache io on smp.
Comment 4 Armin Haken 2002-09-12 00:23:22 EDT
Created attachment 75845 [details]
sample makefile
Comment 5 Armin Haken 2002-09-12 00:27:47 EDT
the kernel is 2.2.14-5.0 and the error shows up on single as well as dual 
processor machines
Comment 6 Ben LaHaise 2002-09-12 00:54:43 EDT
That kernel is obsolete and vulnerable to the aforementioned bug.  Upgrade and
reopen if the problem persists.  This is not a bug in libaio, but in older
kernels for which errata were already released.
Comment 7 Armin Haken 2002-09-16 17:24:20 EDT
The problem did indeed disappear after upgrading
to kernel 2.2.17-14

Note You need to log in before you can comment on or make changes to this bug.