Bug 92056

Summary: Symbios SCSI timeout
Product: [Retired] Red Hat Linux Reporter: neil <neilcuk>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3CC: alan
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-09 08:53:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description neil 2003-06-02 08:53:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030208
Netscape/7.02

Description of problem:
I am using a dual 1.8GHz Xeon HP X4000 workstation. Using the aforementioned
kernel causes a SCSI timeout when the station has been left idle for a number of
hours (usually overnight).

Sample from Messages:

May 30 22:39:37 xmasterz kernel: scsi : aborting command due to timeout : pid
72609, scsi0, channel 0, id 0, lun 0 Write (10) 00 00 04 be e3 00 00 58 00
May 30 22:39:37 xmasterz kernel: sym53c8xx_abort: pid=72609 serial_number=72609
serial_number_at_timeout=72609
May 30 22:39:37 xmasterz kernel: SCSI host 0 abort (pid 72609) timed out - resetting
May 30 22:39:37 xmasterz kernel: SCSI bus is being reset for host 0 channel 0.
May 30 22:39:37 xmasterz kernel: sym53c8xx_reset: pid=72609 reset_flags=2
serial_number=72609 serial_number_at_timeout=72609
May 30 22:39:37 xmasterz kernel: sym53c1010-66-0: restart (scsi reset).
May 30 22:39:37 xmasterz kernel: sym53c1010-66-0: handling phase mismatch from
SCRIPTS.
May 30 22:39:37 xmasterz kernel: sym53c1010-66-0: Downloading SCSI SCRIPTS.
May 30 22:40:32 xmasterz kernel: sym53c1010-66-0-<0,0>: ordered tag forced.
May 30 22:40:39 xmasterz kernel: SCSI host 0 abort (pid 72610) timed out - resetting
May 30 22:40:39 xmasterz kernel: SCSI bus is being reset for host 0 channel 0.

This has actually resulted in the bios being unable to pick-up disc signatures.
I have flashed the bios but the error persists

Version-Release number of selected component (if applicable):
kernel-2.4.20-13.7smp

How reproducible:
Always

Steps to Reproduce:
1.get mentioned hardware
2.update to mentioned kernel ver.
3.leave overnight on gas mark 1
4. system nicely cooked
    

Actual Results:  porkage

Expected Results:  porkage

Additional info:

problem is not apparent during the day (when I am using the system).

Comment 1 Alan Cox 2003-06-08 12:19:39 UTC
I can think of two causes for this. Firstly the abort could be because of a
genuine problem - a drive aborting a command for example, secondly it might be a
really weird interaction with bios power management.  The bios not finding the
disk sounds like the disk firmware crashed.

First thing to try would be disabling any power management in the bios and then
booting with apm=off as a boot option. I'm not sure it will change anything but
it eliminates one suspicion


Comment 2 neil 2003-06-09 08:53:02 UTC
I had to get a quick resolution for this so I slapped in a HP NetRaid
controller, disabled the on-board SCSI and rebuilt the system. No more SCSI
time-outs. I'm unlikely to return the system to it's faulty state but I really
appreciated the comments. The bios power management sounds a good contender if
somewhat troubling (it really shouldn't do that!). If I do get my hands on a
couple of extra disks I'll re-enable the on-board controller - slap them in and
let you know what happens.
Cheers :n)