Red Hat Bugzilla – Bug 73846
file system read call returns corrupted data
Last modified: 2008-05-01 11:38:03 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; MSNDE; T312461)
Description of problem:
Under heavy disk system load, "read" calls will sometimes return data that
contains a sequence of 16 wrong Bytes. It seems that high memory usage and heap
allocation are also involved in reproducing the problem. I'm attaching an
example app that shows the failure on each of the 3 RedHat 6.2 machines I have
available, but does not fail on other linux machines including RedHat 7.3
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.build the attached drt.cpp (sample Makefile attached)
2.run the "drt" app.. on 2 processor systems it seems to fail more reliably
when 2 copies of the app are running (in separate directories)
Actual Results: The app will print out text including "read buffer does not
match" as well as the iteration number (out of 10) that is running. Typically I
see around 3 failure messages per run. When a failure is detected, the correct
and the misread buffers (length 40KB) are printed in the files errorfile_should
and errorfile_reality. Comparing these files shows a sequence of 16 Bytes on
which they differ.
Expected Results: Non-error operation results in output that is only an
iteration count from 0 to 9.
The behavior of the test app on all the various platforms I tried is the same
when I build it on RedHat 6.2 as when I build it on Debian. (I link statically.)
libaio does not exist in Red Hat 6.2.
Created attachment 75844 [details]
Test app that demonstrates problem
What version of the kernel is this with? At least one of the 2.2 errata
affected page cache io on smp.
Created attachment 75845 [details]
the kernel is 2.2.14-5.0 and the error shows up on single as well as dual
That kernel is obsolete and vulnerable to the aforementioned bug. Upgrade and
reopen if the problem persists. This is not a bug in libaio, but in older
kernels for which errata were already released.
The problem did indeed disappear after upgrading
to kernel 2.2.17-14