From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; MSNDE; T312461) Description of problem: Under heavy disk system load, "read" calls will sometimes return data that contains a sequence of 16 wrong Bytes. It seems that high memory usage and heap allocation are also involved in reproducing the problem. I'm attaching an example app that shows the failure on each of the 3 RedHat 6.2 machines I have available, but does not fail on other linux machines including RedHat 7.3 Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.build the attached drt.cpp (sample Makefile attached) 2.run the "drt" app.. on 2 processor systems it seems to fail more reliably when 2 copies of the app are running (in separate directories) 3. Actual Results: The app will print out text including "read buffer does not match" as well as the iteration number (out of 10) that is running. Typically I see around 3 failure messages per run. When a failure is detected, the correct and the misread buffers (length 40KB) are printed in the files errorfile_should and errorfile_reality. Comparing these files shows a sequence of 16 Bytes on which they differ. Expected Results: Non-error operation results in output that is only an iteration count from 0 to 9. Additional info: The behavior of the test app on all the various platforms I tried is the same when I build it on RedHat 6.2 as when I build it on Debian. (I link statically.)
libaio does not exist in Red Hat 6.2.
Created attachment 75844 [details] Test app that demonstrates problem
What version of the kernel is this with? At least one of the 2.2 errata affected page cache io on smp.
Created attachment 75845 [details] sample makefile
the kernel is 2.2.14-5.0 and the error shows up on single as well as dual processor machines
That kernel is obsolete and vulnerable to the aforementioned bug. Upgrade and reopen if the problem persists. This is not a bug in libaio, but in older kernels for which errata were already released.
The problem did indeed disappear after upgrading to kernel 2.2.17-14