Red Hat Bugzilla – Bug 114941
Data corruption on SCSI disks
Last modified: 2007-11-30 17:07:00 EST
Description of problem:
We have been running disk exercisers (called blast, which is an IBM
internal tool that is capable of verifying written data) on 40 LUNs
attached via 1 adapter driven by zfcp. Within a few hours verification
of sectors written to some disks failed. The read data did
unexpectedly not equal to the previously written data.
Version-Release number of selected component (if applicable):
almost 100 %, did it 3-4 times
Steps to Reproduce:
1. Load scsi_mod, zfcp, sd_mod
2. configure 40 disks by means of add-single-device
3. start up disk exerciser
I can attach blast reports about failed sectors, if requested.
Problem was most-likely caused by a missing scsi_eh thread. This
kernel thread is essential for SCSI I/O. Without proper recovery, the
result of to be recovered SCSI commands seems to be unpredictable. The
system seems to be silently (with default logging level) railroaded
into data corruption. The eh-thread was not created because there was
not a single scsi device/host available when loading modules. I have
just realized that our setup only included devices/hosts which were
added on-the-fly via proc-fs. A first re-test with at least one device
per host being available when loading SCSI modules has not shown any
data corruption so far. I will close this bugzilla entry as duplicate
to either 112426 or 106214 if the problem does not occur again.
*** This bug has been marked as a duplicate of 112426 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.