Bug 86227

Summary: System fails with version 2 of sym53c8xx driver, works fine with version 1
Product: [Retired] Red Hat Linux Reporter: Göran Uddeborg <goeran>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 9   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:40:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Configuration for kernel which fails to write.
none
Configuration for kernel where I can't trigger this problem
none
Contents of /proc/scsi/sum53c8xx/0 with version 1 driver (stable case)
none
SCSI-related dmesg messages with version 1 driver (stable case)
none
Contents of /proc/scsi/sum53c8xx/0 with version 2 driver (unstable case)
none
SCSI-related dmesg messages with version 2 driver (unstable case)
none
Messages from driver in log none

Description Göran Uddeborg 2003-03-17 17:48:36 UTC
Description of problem:
When using version 2 of the sym53c8xx driver, the kernel rather soon starts to
emit messages it can't write to sd(8,2) (my root disk).  I can't copy the
messages verbatim since no logs are not (they are on same disk), but is says it
can't write various blocks on the disk, reporting the inode number.  I include
two configurations, one which fails and one which works.

I realize this is a very vague error report, and don't expect anyone to fix the
problem from this data.  But if possible I would appreciate some help in how to
debug this.  I don't really know what to do next here.

Version-Release number of selected component (if applicable):
kernel-source-2.4.20-2.48

How reproducible:
It does not show immediately with the dangerous kernel.  It comes after some
time, apparently random.  I have a feeling that stressing the system might
trigger it, but this is obviously nothing I can verify..

Comment 1 Göran Uddeborg 2003-03-17 17:49:52 UTC
Created attachment 90628 [details]
Configuration for kernel which fails to write.

Comment 2 Göran Uddeborg 2003-03-17 17:50:41 UTC
Created attachment 90629 [details]
Configuration for kernel where I can't trigger this problem

Comment 3 Göran Uddeborg 2003-03-19 23:20:40 UTC
Created attachment 90666 [details]
Contents of /proc/scsi/sum53c8xx/0 with version 1 driver (stable case)

Comment 4 Göran Uddeborg 2003-03-19 23:24:06 UTC
Created attachment 90667 [details]
SCSI-related dmesg messages with version 1 driver (stable case)

Comment 5 Göran Uddeborg 2003-03-20 22:55:52 UTC
Created attachment 90680 [details]
Contents of /proc/scsi/sum53c8xx/0 with version 2 driver (unstable case)

Comment 6 Göran Uddeborg 2003-03-20 22:57:16 UTC
Created attachment 90681 [details]
SCSI-related dmesg messages with version 2 driver (unstable case)

Comment 7 Göran Uddeborg 2003-03-26 22:21:27 UTC
Created attachment 90736 [details]
Messages from driver in log

After putting the logs on a different partition as Alan suggested, I've got a
crash now where a lot of info was written to the messages file.

The complete messages are in the attachment.  It comes in a number of phases,
briefly shown below.  To me it seems like the the driver is trying harder and
harder to reset things, and then gives up, consequently causing problems
problems for the file system using the disk.

But don't know how to figure out why this happens only to the version 2 driver.


Phase 1 consists of some initial messages

    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation started.
    Mar 25 17:51:16 uebn kernel: sym0:0:control msgout: 80 20 63 d.
    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation complete.
    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation started.
    Mar 25 17:51:16 uebn kernel: sym0:0:0: ABORT operation failed.

The last two are then repeated a lot of times.	Next phase does this
once:

    Mar 25 17:51:17 uebn kernel: sym0:0:0: DEVICE RESET operation started.
    Mar 25 17:51:17 uebn kernel: sym0:0:0: DEVICE RESET operation failed.

Then a lot of times this:

    Mar 25 17:51:17 uebn kernel: sym0:0:0: BUS RESET operation started.
    Mar 25 17:51:17 uebn kernel: sym0:0:0: BUS RESET operation failed.

Then, again a lot of times:

    Mar 25 17:52:36 uebn kernel: sym0:0:0: HOST RESET operation started.
    Mar 25 17:52:36 uebn kernel: sym0:0:0: HOST RESET operation failed.

Then this once:

    Mar 25 17:55:16 uebn kernel: scsi: device set offline - command error
recover failed: host 0 channel 0 id 0 lun 0
    Mar 25 17:55:16 uebn kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000028

This a lot of times, for different sectors:

    Mar 25 17:55:16 uebn kernel:  I/O error: dev 08:02, sector 1458226

Then there is this a couple of times.  The return code varies between
these two, the sector varies:

    Mar 25 17:55:17 uebn kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000028
    Mar 25 17:55:17 uebn kernel:  I/O error: dev 08:02, sector 2
    Mar 25 17:55:17 uebn kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6050000
    Mar 25 17:55:17 uebn kernel:  I/O error: dev 08:02, sector 4853152
    Mar 25 17:55:17 uebn kernel:  I/O error: dev 08:02, sector 4853154

Final phase also gives file system error messages.  Repeats for
various sectors until I reboot:

    Mar 25 17:55:18 uebn kernel:  I/O error: dev 08:02, sector 1458232
    Mar 25 17:55:18 uebn kernel:  I/O error: dev 08:02, sector 2
    Mar 25 17:55:18 uebn kernel: EXT2-fs error (device sd(8,2)):
ext2_write_inode: unable to read inode block - inode=182342, block=729116

Comment 8 Bugzilla owner 2004-09-30 15:40:39 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/