114941 – Data corruption on SCSI disks

Bug 114941 - Data corruption on SCSI disks

Summary: Data corruption on SCSI disks

Keywords:
Status:	CLOSED DUPLICATE of bug 112426
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-02-04 18:32 UTC by Martin Peschke
Modified:	2007-11-30 22:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-02-21 19:01:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Martin Peschke 2004-02-04 18:32:51 UTC

Description of problem:
We have been running disk exercisers (called blast, which is an IBM
internal tool that is capable of verifying written data) on 40 LUNs
attached via 1 adapter driven by zfcp. Within a few hours verification
of sectors written to some disks failed. The read data did
unexpectedly not equal to the previously written data.

Version-Release number of selected component (if applicable):
2.4.21-EL.9

How reproducible:
almost 100 %, did it 3-4 times

Steps to Reproduce:
1. Load scsi_mod, zfcp, sd_mod
2. configure 40 disks by means of add-single-device
3. start up disk exerciser

Additional info:
I can attach blast reports about failed sectors, if requested.

Comment 1 Martin Peschke 2004-02-06 16:22:43 UTC

Problem was most-likely caused by a missing scsi_eh thread. This
kernel thread is essential for SCSI I/O. Without proper recovery, the
result of to be recovered SCSI commands seems to be unpredictable. The
system seems to be silently (with default logging level) railroaded
into data corruption. The eh-thread was not created because there was
not a single scsi device/host available when loading modules. I have
just realized that our setup only included devices/hosts which were
added on-the-fly via proc-fs. A first re-test with at least one device
per host being available when loading SCSI modules has not shown any
data corruption so far. I will close this bugzilla entry as duplicate
to either 112426 or 106214 if the problem does not occur again.

Comment 2 Martin Peschke 2004-02-09 22:02:57 UTC


*** This bug has been marked as a duplicate of 112426 ***

Comment 3 Red Hat Bugzilla 2006-02-21 19:01:05 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.