Bug 5459 - Data Corruption on raw transfers..
Data Corruption on raw transfers..
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
6.0
i386 Linux
high Severity high
: ---
: ---
Assigned To: Michael K. Johnson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 1999-09-30 14:54 EDT by mdodge
Modified: 2008-05-01 11:37 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-02-09 11:36:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description mdodge 1999-09-30 14:54:49 EDT
I've written a simple, multi-process disk exerciser test
for our Compatibility lab.   Both the AT and SCSI side use
this program and both are seeing data corruption on
sequential raw xfers.

Apparently, on some level the OS concantenates raw
sequential writes into more efficient 64K transfers.  In
doing so, however, it appears to be corrupting data.

As an example, my test will make the following write
requests:
    write 4 blocks  starting at LBA 0
    write 4 blocks  starting at LBA 6
    write 4 blocks  starting at LBA 12
    ....

The logic analyzer will show 1 scsi nexus of 128 blocks,
covering LBAs 0-128.   The data in that transfer is
frequently corrupted, with sectors I did not write (e.g.,
LBA 5) being duplicated and overwriting an adjacent sector
(e.g. LBA 6).

I do not see this problem on Linux 5.x.   Any help
appreciated.  I can provide more information as needed.
Comment 1 Cristian Gafton 1999-10-06 18:47:59 EDT
assigned to dledford
Comment 2 Doug Ledford 1999-10-06 22:14:59 EDT
We need to know the exact type of "raw" transfers you are referring
to.  How does this application interface with the kernel to create
these raw transfers.  Can you send the source to this program for
review?  The 6.1 kernel has actual Raw I/O interfaces for this type of
thing, but it didn't exist in the 6.0 and earlier releases, so I hope
you understand my query about how you are doing "raw" transfers.
Comment 3 Stephen Tweedie 2000-02-08 08:34:59 EST
This should be fixed in 2.2.14.  There was a race in scsi.h: the IO completion
function was modifying the size of the request _after_ completing it, and this
left open the potential for the request size of a subsequent IO request to be
corrupted.

Can the problem be reproduced on a 2.2.14 kernel?
Comment 4 mdodge 2000-02-08 22:40:59 EST
The test simply open()s the device file for the disk, and forks a bunch of
processes that do read()s/write()s to that device.  It was written for UNIX,
where the file open()ed would be the "raw" device filename.  I incorrectly
assumed "sda" was a raw device interface, when apparently it is not.  (Which
would explain why there was caching and concantenation of writes).
Nonetheless, the corruption, as verified by a SCSI logic analyzer, was occuring
using this method of data transfer, specifically sequential writes.

The problem seems to have been fixed with 2.2.14, however.  If it was scsi.h,
why would we see the issue on our ATA drives as well?

Anyway, glad it's fixed.


Thanks
Comment 5 Stephen Tweedie 2000-02-09 11:36:59 EST
It is possible that the other cleanups in the 2.2.14 ide code have made a
difference there.

Note You need to log in before you can comment on or make changes to this bug.