Bug 4608

Summary:	kernel-2.2.5-22 causes scsi bus errors on Sparc5
Product:	[Retired] Red Hat Linux	Reporter:	wjacobs
Component:	kernel	Assignee:	Cristian Gafton <gafton>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.2	CC:	dan.carter, rddavis1, tolson, wjacobs
Target Milestone:	---
Target Release:	---
Hardware:	sparc
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description wjacobs 1999-08-19 14:51:41 UTC

I have a Sparc5 with 2 1GB drives.  Both have valid Sun
label.  After updating kernel to 2.2.5-22, I get lots of
SCSI bus reset errors showing up on my console in in
/var/log/messages.  Also, the cdrom drive does not work
anymore, except for getting a list of files from it.

If I boot back into 2.2.5-15, these particular errors go
away.

When doing the upgrade, I followed the instructions found on
the RedHat kernal-upgrade page.

This problem occurred on a setup that had been upgraded from
RH 5.2 to 6.0 and on a clean 6.0 install.

Comment 1 Bill Nottingham 1999-08-31 22:12:59 UTC

*** Bug 4737 has been marked as a duplicate of this bug. ***

I have a Sun Sparc5, and after upgrading my kernel from
2.2.5-15 to 2.2.5-22, I get numerous scsi bus resets that
I didn't get before the upgrade.

In /var/log/messages, besides the scsi bus reset messages,
another message I noticed was "AIEEE wide msg received and
not HME".

After looking around in the source code, I saw that the
message originates from /usr/src/linux/drivers/scsi/esp.c
function - check_multibyte_msg().

The scsi bus resets occur in two cases on my machine:

1.  during the boot sequence

    During the boot sequence, just before the login screen
    appears, the disk drive starts clicking (meaning the
    scsi bus resets are occurring).  It clicks for almost
    2 minutes with just the blue blank screen before the
    Redhat login screen appears.

2.  Whenever a command involving files on my second disk
    drive is run.  (i.e. ls, cp, mv, etc.)


From the file /var/log/dmesg...

using fastest function: SPARC (57.150 MB/sec)
esp0: IRQ 36 SCSI ID 7 Clk 40MHz CCF=8 TOut 167
NCR53C9XF(espfast)
ESP: Total of 1 ESP hosts found, 1 actually in use.
scsi0 : Sparc ESP100A-FAST
scsi : 1 host.
 Vendor: SEAGATE   Model: ST5660N  SUN0535  Rev: 0644
  Type:   Direct-Access                      ANSI SCSI
revision: 02
Detected scsi disk sda at scsi0, channel 0, id 1, lun 0
esp0: AIEEE wide msg received and not HME.
esp0: hoping for msgout
  Vendor: COMPAQ    Model: ST34501WC         Rev: AF03
  Type:   Direct-Access                      ANSI SCSI
revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 3, lun 0
  Vendor: TOSHIBA   Model: XM-4101TASUNSLCD  Rev: 1084
  Type:   CD-ROM                             ANSI SCSI
revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0

===========================================================


Output from --->  cat /proc/scsi/esp/0


Sparc ESP Host Adapter:
        PROM node               ffd3d230
        PROM name               esp
        ESP Model               FAS100A
        DMA Revision            Rev 2
        Live Targets            [ 1 3 6 ]

Target #   config3    Sync Capabilities  Disconnect Wide
1          00000003   [2f,04]            yes        no
3          0000000    [2f,04]            yes        no
6          00000001   [2f,04]            yes        no


==========================================================

a few lines from --->  /var/log/messages
(there are a bunch of these)


Aug 26 04:32:28 localhost kernel: esp0: Resetting scsi bus
Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset
interrupt
Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset
interrupt
Aug 26 04:32:28 localhost kernel: esp0: AIEEE wide msg
received and not HME.
Aug 26 04:32:28 localhost kernel: esp0: hoping for msgout
Aug 26 04:32:28 localhost kernel: esp0: Resetting scsi bus
Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset
interrupt
Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset
interrupt
Aug 26 04:32:29 localhost kernel: esp0: AIEEE wide msg
received and not HME.
Aug 26 04:32:29 localhost kernel: esp0: hoping for msgout
Aug 26 04:32:29 localhost kernel: esp0: Resetting scsi bus
Aug 26 04:32:29 localhost kernel: esp0: SCSI bus reset
interrupt
Aug 26 04:32:29 localhost kernel: esp0: SCSI bus reset
interrupt
Aug 26 04:32:29 localhost kernel: esp0: AIEEE wide msg
received and not HME.
Aug 26 04:32:29 localhost kernel: esp0: hoping for msgout


==========================================================

If you can help, please let me know.


THANKS!!!

Richard Davis, Jr.

Comment 2 Cristian Gafton 2000-04-03 21:29:59 UTC

Can you try the sparc kernel that is shipped as part of the 6.2?

Comment 3 dan carter 2000-04-06 23:55:59 UTC

I just downloaded and tried out the kernel from redhat6.2/sparc
(kernel-2.2.14-5.0.sparc.rpm).  Identical results.
It boots up.  Gets into init goes OK until it starts cron and then starts
spitting out scsi errors:
esp0: resetting scsi bus
esp0: bus reset interrupt
esp0: bus reset interrupt
EXT2-fs error host 0 channel 0 id 4 lun=0 return code = 28000000
Additional sense indicates logical unit not ready, cause not reportable.

The things normally started by cron do not start, eg i get an email from cron :
Subject: Cron <dcarter@mowgli> /home/dcarter/distributed.net/start-pproxy
/home/dcarter/distributed.net/start-pproxy:
/home/dcarter/distributed.net/proxyper-current/proxyper: Input/output
error

While it completes bootin this sequence of messages:
esp0: resetting scsi bus
esp0: bus reset interrupt
esp0: bus reset interrupt
continues being outputted to the console.

Eventually it finished booting and logging in works, but doing anything like
'ls' causes this error sequence to be printed out again, eventually ls succeeds.

I've rebooted back to an egcs-1.1.2-12 compiled kernel and all is well again.

Comment 4 dan carter 2000-05-17 02:57:59 UTC

OK, after months of work, here's what i did.

The compiler that ships with redhat6.0 doesn't cause scsi errors, but does
produce an
unstable kernel (random lockups, uptimes rarely reaching 6 days).  Any more
recent
compilers cause the scsi errors.  I have the gcc 2.95 compiler from mandrake
7.0/sparc
installed at the moment, i got the same scsi errors with that.

However, i have just tried the 2.3.99-pre8 kernel.  That does not have scsi
errors, so it
appears there is a bug in the scsi driver that was only apparent with recent
compilers, and
is not present in the 2.3 kernels.  You might like to give 2.3 a try yourself.
I imagine it will
work with the updated redhat versions of gcc/egcs too.

Comment 5 tolson 2000-07-06 20:26:33 UTC

I would not think that this is "resolved" at this time.  This scsi bus reset
interrupt has been causing problems ever since switching from the original
linux 6.0 kernel.  It happens on Sparc 5's any time there are 2 hard drives
on the system.  After just loading RedHat 6.2 it is still unresolved.
When will a kernel come out that doesn't have this problem?  I have multiple
"spare" sparc 5's and have tried this on a few of them.  I'd even be happy
to donate one to the cause if it would do any good.

Tim