Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 3 product line. The current stable release is 3.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 114941

Summary: Data corruption on SCSI disks
Product: Red Hat Enterprise Linux 3 Reporter: Martin Peschke <mpeschke>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: mf, petrides, riel, salm, zaitcev
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-21 19:01:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Peschke 2004-02-04 18:32:51 UTC
Description of problem:
We have been running disk exercisers (called blast, which is an IBM
internal tool that is capable of verifying written data) on 40 LUNs
attached via 1 adapter driven by zfcp. Within a few hours verification
of sectors written to some disks failed. The read data did
unexpectedly not equal to the previously written data.

Version-Release number of selected component (if applicable):
2.4.21-EL.9

How reproducible:
almost 100 %, did it 3-4 times

Steps to Reproduce:
1. Load scsi_mod, zfcp, sd_mod
2. configure 40 disks by means of add-single-device
3. start up disk exerciser

Additional info:
I can attach blast reports about failed sectors, if requested.

Comment 1 Martin Peschke 2004-02-06 16:22:43 UTC
Problem was most-likely caused by a missing scsi_eh thread. This
kernel thread is essential for SCSI I/O. Without proper recovery, the
result of to be recovered SCSI commands seems to be unpredictable. The
system seems to be silently (with default logging level) railroaded
into data corruption. The eh-thread was not created because there was
not a single scsi device/host available when loading modules. I have
just realized that our setup only included devices/hosts which were
added on-the-fly via proc-fs. A first re-test with at least one device
per host being available when loading SCSI modules has not shown any
data corruption so far. I will close this bugzilla entry as duplicate
to either 112426 or 106214 if the problem does not occur again.

Comment 2 Martin Peschke 2004-02-09 22:02:57 UTC

*** This bug has been marked as a duplicate of 112426 ***

Comment 3 Red Hat Bugzilla 2006-02-21 19:01:05 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.