Red Hat Bugzilla – Bug 5459
Data Corruption on raw transfers..
Last modified: 2008-05-01 11:37:51 EDT
I've written a simple, multi-process disk exerciser test
for our Compatibility lab. Both the AT and SCSI side use
this program and both are seeing data corruption on
sequential raw xfers.
Apparently, on some level the OS concantenates raw
sequential writes into more efficient 64K transfers. In
doing so, however, it appears to be corrupting data.
As an example, my test will make the following write
write 4 blocks starting at LBA 0
write 4 blocks starting at LBA 6
write 4 blocks starting at LBA 12
The logic analyzer will show 1 scsi nexus of 128 blocks,
covering LBAs 0-128. The data in that transfer is
frequently corrupted, with sectors I did not write (e.g.,
LBA 5) being duplicated and overwriting an adjacent sector
(e.g. LBA 6).
I do not see this problem on Linux 5.x. Any help
appreciated. I can provide more information as needed.
assigned to dledford
We need to know the exact type of "raw" transfers you are referring
to. How does this application interface with the kernel to create
these raw transfers. Can you send the source to this program for
review? The 6.1 kernel has actual Raw I/O interfaces for this type of
thing, but it didn't exist in the 6.0 and earlier releases, so I hope
you understand my query about how you are doing "raw" transfers.
This should be fixed in 2.2.14. There was a race in scsi.h: the IO completion
function was modifying the size of the request _after_ completing it, and this
left open the potential for the request size of a subsequent IO request to be
Can the problem be reproduced on a 2.2.14 kernel?
The test simply open()s the device file for the disk, and forks a bunch of
processes that do read()s/write()s to that device. It was written for UNIX,
where the file open()ed would be the "raw" device filename. I incorrectly
assumed "sda" was a raw device interface, when apparently it is not. (Which
would explain why there was caching and concantenation of writes).
Nonetheless, the corruption, as verified by a SCSI logic analyzer, was occuring
using this method of data transfer, specifically sequential writes.
The problem seems to have been fixed with 2.2.14, however. If it was scsi.h,
why would we see the issue on our ATA drives as well?
Anyway, glad it's fixed.
It is possible that the other cleanups in the 2.2.14 ide code have made a