Bug 63457 - multiple deaths by SCSI errors, Alpha 21264 DP, sym53c1010 with sym53c8xx attached to IBM DDYS-T36950N and SEAGATE ST39205LW
multiple deaths by SCSI errors, Alpha 21264 DP, sym53c1010 with sym53c8xx att...
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
alphaev6 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-04-14 02:58 EDT by Need Real Name
Modified: 2007-04-18 12:41 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-06-09 08:43:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2002-04-14 02:58:58 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408

Description of problem:
On one of our dual Alpha 21264 666MHz machines with Intraserver 64bit scsi
controller based on sym53c1010 using the sym53c8xx driver from Red Hat kernel
2.4.9-31smp, twice in the last 5 days we have had catastrophic failures prefaced
by SCSI errors.  By catastrophic, I mean that you can ping the system, but you
can't log in via the network or via mgetty on the serial port. by default, I've
included info from our logs in the additional information area.  We'll try to
identify a pattern if this continues to happen (I expect it will), but we
haven't identified one yet.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1.Use the machine for a few days, and pray it doesn't happen.
2.It eventually happens.  We haven't identified a pattern yet.
3.
	

Actual Results:  Machine is pingable, but that's it.  Remote logging shows scsi
death.

Additional info:

Here is some log info, with a small amount of context. The first two messages
are about nfs: the Alpha is the nfs v3 over tcp client named "liver", and the
nfs server running FreeBSD is "lofty".  fxp0 is one of lofty's ethernet devices.
 Please let me know if more or fewer log messages would be appropriate.
******************************************************************

Apr 13 22:48:49 lofty /kernel: m_clalloc failed, consider increase NMBCLUSTERS value
Apr 13 22:48:49 lofty /kernel: fxp0: cluster allocation failed, packet dropped!
Apr 13 22:48:51 liver kernel: nfs: server lofty.auton.cs.cmu.edu not responding,
still trying 
Apr 13 22:48:51 liver kernel: nfs: server lofty.auton.cs.cmu.edu OK 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0:3: ERROR (40:0) (c-2c-0) (3e/18) @
(script ce0:14001e00). 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: script cmd = 88080000 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: regdump: da 10 c0 18 47 3e 03 0e
1e 0c 83 2c 80 00 0c 00 00 80 6c 40 08 00 00 00. 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: ctest4/sist original 0x8/0x0 
mod: 0x18/0x0 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: Downloading SCSI SCRIPTS. 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0:3: ERROR (40:0) (4-24-0) (3e/18) @
(script ce0:14007e00). 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: script cmd = 88080000 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: regdump: da 10 c0 18 47 3e 03 0e
32 0c 83 2c 80 00 04 00 00 e0 c7 b5 08 00 00 00. 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: ctest4/sist original 0x8/0x0 
mod: 0x18/0x0 
Apr 13 22:54:33 liver kernel: sym53c1010-33-0: Downloading SCSI SCRIPTS. 
Apr 13 22:55:31 liver kernel: scsi : aborting command due to timeout : pid 0,
scsi2, channel 0, id 3, lun 0 Write (10) 00 00 0f e9 0f 00 00 40 00  
Apr 13 22:55:31 liver kernel: sym53c8xx_abort: pid=0 serial_number=60936
serial_number_at_timeout=60936 

<last two lines repeated 15 more times, with the "e9 0f" part rotating through 8
values:
   e9 0f,  e9 8f,  e8 4f,  e8 8f,  e9 cf,  e8 0f,  e8 cf,  e9 4f
and the serial numbers incrementing by one

Apr 13 22:57:32 liver kernel: SCSI host 2 abort (pid 0) timed out - resetting 
Apr 13 22:57:32 liver kernel: SCSI bus is being reset for host 2 channel 0. 
Apr 13 22:57:32 liver kernel: sym53c8xx_reset: pid=0 reset_flags=2
serial_number=61025 serial_number_at_timeout=61025 
Apr 13 22:57:32 liver kernel: sym53c1010-33-0: Downloading SCSI SCRIPTS. 

<these four lines repeat 10 times, with serial numbers bouncing around>

<Apr 13 23:07:52 was the time of the final message>
Comment 1 Alan Cox 2003-06-09 08:43:22 EDT
Alpha is no longer a supported platform

Note You need to log in before you can comment on or make changes to this bug.