86227 – System fails with version 2 of sym53c8xx driver, works fine with version 1

Bug 86227 - System fails with version 2 of sym53c8xx driver, works fine with version 1

Summary: System fails with version 2 of sym53c8xx driver, works fine with version 1

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	9
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-03-17 17:48 UTC by Göran Uddeborg
Modified:	2008-01-17 17:49 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:40:39 UTC
Embargoed:

Attachments	(Terms of Use)
Configuration for kernel which fails to write. (21.80 KB, text/plain) 2003-03-17 17:49 UTC, Göran Uddeborg	no flags	Details
Configuration for kernel where I can't trigger this problem (21.99 KB, text/plain) 2003-03-17 17:50 UTC, Göran Uddeborg	no flags	Details
Contents of /proc/scsi/sum53c8xx/0 with version 1 driver (stable case) (174 bytes, text/plain) 2003-03-19 23:20 UTC, Göran Uddeborg	no flags	Details
SCSI-related dmesg messages with version 1 driver (stable case) (1.21 KB, text/plain) 2003-03-19 23:24 UTC, Göran Uddeborg	no flags	Details
Contents of /proc/scsi/sum53c8xx/0 with version 2 driver (unstable case) (184 bytes, text/plain) 2003-03-20 22:55 UTC, Göran Uddeborg	no flags	Details
SCSI-related dmesg messages with version 2 driver (unstable case) (1.58 KB, text/plain) 2003-03-20 22:57 UTC, Göran Uddeborg	no flags	Details
Messages from driver in log (94.39 KB, text/plain) 2003-03-26 22:21 UTC, Göran Uddeborg	no flags	Details
View All

Description Göran Uddeborg 2003-03-17 17:48:36 UTC

Description of problem:
When using version 2 of the sym53c8xx driver, the kernel rather soon starts to
emit messages it can't write to sd(8,2) (my root disk).  I can't copy the
messages verbatim since no logs are not (they are on same disk), but is says it
can't write various blocks on the disk, reporting the inode number.  I include
two configurations, one which fails and one which works.

I realize this is a very vague error report, and don't expect anyone to fix the
problem from this data.  But if possible I would appreciate some help in how to
debug this.  I don't really know what to do next here.

Version-Release number of selected component (if applicable):
kernel-source-2.4.20-2.48

How reproducible:
It does not show immediately with the dangerous kernel.  It comes after some
time, apparently random.  I have a feeling that stressing the system might
trigger it, but this is obviously nothing I can verify..

Comment 1 Göran Uddeborg 2003-03-17 17:49:52 UTC

Created attachment 90628 [details]
Configuration for kernel which fails to write.

Comment 2 Göran Uddeborg 2003-03-17 17:50:41 UTC

Created attachment 90629 [details]
Configuration for kernel where I can't trigger this problem

Comment 3 Göran Uddeborg 2003-03-19 23:20:40 UTC

Created attachment 90666 [details]
Contents of /proc/scsi/sum53c8xx/0 with version 1 driver (stable case)

Comment 4 Göran Uddeborg 2003-03-19 23:24:06 UTC

Created attachment 90667 [details]
SCSI-related dmesg messages with version 1 driver (stable case)

Comment 5 Göran Uddeborg 2003-03-20 22:55:52 UTC

Created attachment 90680 [details]
Contents of /proc/scsi/sum53c8xx/0 with version 2 driver (unstable case)

Comment 6 Göran Uddeborg 2003-03-20 22:57:16 UTC

Created attachment 90681 [details]
SCSI-related dmesg messages with version 2 driver (unstable case)

Comment 7 Göran Uddeborg 2003-03-26 22:21:27 UTC

Created attachment 90736 [details]
Messages from driver in log

After putting the logs on a different partition as Alan suggested, I've got a
crash now where a lot of info was written to the messages file.

The complete messages are in the attachment.  It comes in a number of phases,
briefly shown below.  To me it seems like the the driver is trying harder and
harder to reset things, and then gives up, consequently causing problems
problems for the file system using the disk.

But don't know how to figure out why this happens only to the version 2 driver.


Phase 1 consists of some initial messages

    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation started.
    Mar 25 17:51:16 uebn kernel: sym0:0:control msgout: 80 20 63 d.
    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation complete.
    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation started.
    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation failed.

The last two are then repeated a lot of times.	Next phase does this
once:

    Mar 25 17:51:17 uebn kernel: sym0:0:0: DEVICE RESET operation started.
    Mar 25 17:51:17 uebn kernel: sym0:0:0: DEVICE RESET operation failed.

Then a lot of times this:

    Mar 25 17:51:17 uebn kernel: sym0:0:0: BUS RESET operation started.
    Mar 25 17:51:17 uebn kernel: sym0:0:0: BUS RESET operation failed.

Then, again a lot of times:

    Mar 25 17:52:36 uebn kernel: sym0:0:0: HOST RESET operation started.
    Mar 25 17:52:36 uebn kernel: sym0:0:0: HOST RESET operation failed.

Then this once:

    Mar 25 17:55:16 uebn kernel: scsi: device set offline - command error
recover failed: host 0 channel 0 id 0 lun 0
    Mar 25 17:55:16 uebn kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000028

This a lot of times, for different sectors:

    Mar 25 17:55:16 uebn kernel:  I/O error: dev 08:02, sector 1458226

Then there is this a couple of times.  The return code varies between
these two, the sector varies:

    Mar 25 17:55:17 uebn kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000028
    Mar 25 17:55:17 uebn kernel:  I/O error: dev 08:02, sector 2
    Mar 25 17:55:17 uebn kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6050000
    Mar 25 17:55:17 uebn kernel:  I/O error: dev 08:02, sector 4853152
    Mar 25 17:55:17 uebn kernel:  I/O error: dev 08:02, sector 4853154

Final phase also gives file system error messages.  Repeats for
various sectors until I reboot:

    Mar 25 17:55:18 uebn kernel:  I/O error: dev 08:02, sector 1458232
    Mar 25 17:55:18 uebn kernel:  I/O error: dev 08:02, sector 2
    Mar 25 17:55:18 uebn kernel: EXT2-fs error (device sd(8,2)):
ext2_write_inode: unable to read inode block - inode=182342, block=729116

Comment 8 Bugzilla owner 2004-09-30 15:40:39 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.