Description of problem: We have been running disk exercisers (called blast, which is an IBM internal tool that is capable of verifying written data) on 40 LUNs attached via 1 adapter driven by zfcp. Within a few hours verification of sectors written to some disks failed. The read data did unexpectedly not equal to the previously written data. Version-Release number of selected component (if applicable): 2.4.21-EL.9 How reproducible: almost 100 %, did it 3-4 times Steps to Reproduce: 1. Load scsi_mod, zfcp, sd_mod 2. configure 40 disks by means of add-single-device 3. start up disk exerciser Additional info: I can attach blast reports about failed sectors, if requested.
Problem was most-likely caused by a missing scsi_eh thread. This kernel thread is essential for SCSI I/O. Without proper recovery, the result of to be recovered SCSI commands seems to be unpredictable. The system seems to be silently (with default logging level) railroaded into data corruption. The eh-thread was not created because there was not a single scsi device/host available when loading modules. I have just realized that our setup only included devices/hosts which were added on-the-fly via proc-fs. A first re-test with at least one device per host being available when loading SCSI modules has not shown any data corruption so far. I will close this bugzilla entry as duplicate to either 112426 or 106214 if the problem does not occur again.
*** This bug has been marked as a duplicate of 112426 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.