Bug 5459 - Data Corruption on raw transfers..
Summary: Data Corruption on raw transfers..
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 6.0
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Michael K. Johnson
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 1999-09-30 18:54 UTC by mdodge
Modified: 2008-05-01 15:37 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2000-02-09 16:36:47 UTC

Attachments (Terms of Use)

Description mdodge 1999-09-30 18:54:49 UTC
I've written a simple, multi-process disk exerciser test
for our Compatibility lab.   Both the AT and SCSI side use
this program and both are seeing data corruption on
sequential raw xfers.

Apparently, on some level the OS concantenates raw
sequential writes into more efficient 64K transfers.  In
doing so, however, it appears to be corrupting data.

As an example, my test will make the following write
    write 4 blocks  starting at LBA 0
    write 4 blocks  starting at LBA 6
    write 4 blocks  starting at LBA 12

The logic analyzer will show 1 scsi nexus of 128 blocks,
covering LBAs 0-128.   The data in that transfer is
frequently corrupted, with sectors I did not write (e.g.,
LBA 5) being duplicated and overwriting an adjacent sector
(e.g. LBA 6).

I do not see this problem on Linux 5.x.   Any help
appreciated.  I can provide more information as needed.

Comment 1 Cristian Gafton 1999-10-06 22:47:59 UTC
assigned to dledford

Comment 2 Doug Ledford 1999-10-07 02:14:59 UTC
We need to know the exact type of "raw" transfers you are referring
to.  How does this application interface with the kernel to create
these raw transfers.  Can you send the source to this program for
review?  The 6.1 kernel has actual Raw I/O interfaces for this type of
thing, but it didn't exist in the 6.0 and earlier releases, so I hope
you understand my query about how you are doing "raw" transfers.

Comment 3 Stephen Tweedie 2000-02-08 13:34:59 UTC
This should be fixed in 2.2.14.  There was a race in scsi.h: the IO completion
function was modifying the size of the request _after_ completing it, and this
left open the potential for the request size of a subsequent IO request to be

Can the problem be reproduced on a 2.2.14 kernel?

Comment 4 mdodge 2000-02-09 03:40:59 UTC
The test simply open()s the device file for the disk, and forks a bunch of
processes that do read()s/write()s to that device.  It was written for UNIX,
where the file open()ed would be the "raw" device filename.  I incorrectly
assumed "sda" was a raw device interface, when apparently it is not.  (Which
would explain why there was caching and concantenation of writes).
Nonetheless, the corruption, as verified by a SCSI logic analyzer, was occuring
using this method of data transfer, specifically sequential writes.

The problem seems to have been fixed with 2.2.14, however.  If it was scsi.h,
why would we see the issue on our ATA drives as well?

Anyway, glad it's fixed.


Comment 5 Stephen Tweedie 2000-02-09 16:36:59 UTC
It is possible that the other cleanups in the 2.2.14 ide code have made a
difference there.

Note You need to log in before you can comment on or make changes to this bug.