5459 – Data Corruption on raw transfers..

Bug 5459 - Data Corruption on raw transfers..

Summary: Data Corruption on raw transfers..

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Michael K. Johnson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	1999-09-30 18:54 UTC by mdodge
Modified:	2008-05-01 15:37 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2000-02-09 16:36:47 UTC
Embargoed:

Attachments	(Terms of Use)

Description mdodge 1999-09-30 18:54:49 UTC

I've written a simple, multi-process disk exerciser test
for our Compatibility lab.   Both the AT and SCSI side use
this program and both are seeing data corruption on
sequential raw xfers.

Apparently, on some level the OS concantenates raw
sequential writes into more efficient 64K transfers.  In
doing so, however, it appears to be corrupting data.

As an example, my test will make the following write
requests:
    write 4 blocks  starting at LBA 0
    write 4 blocks  starting at LBA 6
    write 4 blocks  starting at LBA 12
    ....

The logic analyzer will show 1 scsi nexus of 128 blocks,
covering LBAs 0-128.   The data in that transfer is
frequently corrupted, with sectors I did not write (e.g.,
LBA 5) being duplicated and overwriting an adjacent sector
(e.g. LBA 6).

I do not see this problem on Linux 5.x.   Any help
appreciated.  I can provide more information as needed.

Comment 1 Cristian Gafton 1999-10-06 22:47:59 UTC

assigned to dledford

Comment 2 Doug Ledford 1999-10-07 02:14:59 UTC

We need to know the exact type of "raw" transfers you are referring
to.  How does this application interface with the kernel to create
these raw transfers.  Can you send the source to this program for
review?  The 6.1 kernel has actual Raw I/O interfaces for this type of
thing, but it didn't exist in the 6.0 and earlier releases, so I hope
you understand my query about how you are doing "raw" transfers.

Comment 3 Stephen Tweedie 2000-02-08 13:34:59 UTC

This should be fixed in 2.2.14.  There was a race in scsi.h: the IO completion
function was modifying the size of the request _after_ completing it, and this
left open the potential for the request size of a subsequent IO request to be
corrupted.

Can the problem be reproduced on a 2.2.14 kernel?

Comment 4 mdodge 2000-02-09 03:40:59 UTC

The test simply open()s the device file for the disk, and forks a bunch of
processes that do read()s/write()s to that device.  It was written for UNIX,
where the file open()ed would be the "raw" device filename.  I incorrectly
assumed "sda" was a raw device interface, when apparently it is not.  (Which
would explain why there was caching and concantenation of writes).
Nonetheless, the corruption, as verified by a SCSI logic analyzer, was occuring
using this method of data transfer, specifically sequential writes.

The problem seems to have been fixed with 2.2.14, however.  If it was scsi.h,
why would we see the issue on our ATA drives as well?

Anyway, glad it's fixed.


Thanks

Comment 5 Stephen Tweedie 2000-02-09 16:36:59 UTC

It is possible that the other cleanups in the 2.2.14 ide code have made a
difference there.

Note You need to log in before you can comment on or make changes to this bug.