I've written a simple, multi-process disk exerciser test for our Compatibility lab. Both the AT and SCSI side use this program and both are seeing data corruption on sequential raw xfers. Apparently, on some level the OS concantenates raw sequential writes into more efficient 64K transfers. In doing so, however, it appears to be corrupting data. As an example, my test will make the following write requests: write 4 blocks starting at LBA 0 write 4 blocks starting at LBA 6 write 4 blocks starting at LBA 12 .... The logic analyzer will show 1 scsi nexus of 128 blocks, covering LBAs 0-128. The data in that transfer is frequently corrupted, with sectors I did not write (e.g., LBA 5) being duplicated and overwriting an adjacent sector (e.g. LBA 6). I do not see this problem on Linux 5.x. Any help appreciated. I can provide more information as needed.
assigned to dledford
We need to know the exact type of "raw" transfers you are referring to. How does this application interface with the kernel to create these raw transfers. Can you send the source to this program for review? The 6.1 kernel has actual Raw I/O interfaces for this type of thing, but it didn't exist in the 6.0 and earlier releases, so I hope you understand my query about how you are doing "raw" transfers.
This should be fixed in 2.2.14. There was a race in scsi.h: the IO completion function was modifying the size of the request _after_ completing it, and this left open the potential for the request size of a subsequent IO request to be corrupted. Can the problem be reproduced on a 2.2.14 kernel?
The test simply open()s the device file for the disk, and forks a bunch of processes that do read()s/write()s to that device. It was written for UNIX, where the file open()ed would be the "raw" device filename. I incorrectly assumed "sda" was a raw device interface, when apparently it is not. (Which would explain why there was caching and concantenation of writes). Nonetheless, the corruption, as verified by a SCSI logic analyzer, was occuring using this method of data transfer, specifically sequential writes. The problem seems to have been fixed with 2.2.14, however. If it was scsi.h, why would we see the issue on our ATA drives as well? Anyway, glad it's fixed. Thanks
It is possible that the other cleanups in the 2.2.14 ide code have made a difference there.