Bug 4608
Summary: | kernel-2.2.5-22 causes scsi bus errors on Sparc5 | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | wjacobs |
Component: | kernel | Assignee: | Cristian Gafton <gafton> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.2 | CC: | dan.carter, rddavis1, tolson, wjacobs |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | sparc | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
wjacobs
1999-08-19 14:51:41 UTC
*** Bug 4737 has been marked as a duplicate of this bug. *** I have a Sun Sparc5, and after upgrading my kernel from 2.2.5-15 to 2.2.5-22, I get numerous scsi bus resets that I didn't get before the upgrade. In /var/log/messages, besides the scsi bus reset messages, another message I noticed was "AIEEE wide msg received and not HME". After looking around in the source code, I saw that the message originates from /usr/src/linux/drivers/scsi/esp.c function - check_multibyte_msg(). The scsi bus resets occur in two cases on my machine: 1. during the boot sequence During the boot sequence, just before the login screen appears, the disk drive starts clicking (meaning the scsi bus resets are occurring). It clicks for almost 2 minutes with just the blue blank screen before the Redhat login screen appears. 2. Whenever a command involving files on my second disk drive is run. (i.e. ls, cp, mv, etc.) From the file /var/log/dmesg... using fastest function: SPARC (57.150 MB/sec) esp0: IRQ 36 SCSI ID 7 Clk 40MHz CCF=8 TOut 167 NCR53C9XF(espfast) ESP: Total of 1 ESP hosts found, 1 actually in use. scsi0 : Sparc ESP100A-FAST scsi : 1 host. Vendor: SEAGATE Model: ST5660N SUN0535 Rev: 0644 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sda at scsi0, channel 0, id 1, lun 0 esp0: AIEEE wide msg received and not HME. esp0: hoping for msgout Vendor: COMPAQ Model: ST34501WC Rev: AF03 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sdb at scsi0, channel 0, id 3, lun 0 Vendor: TOSHIBA Model: XM-4101TASUNSLCD Rev: 1084 Type: CD-ROM ANSI SCSI revision: 02 Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0 =========================================================== Output from ---> cat /proc/scsi/esp/0 Sparc ESP Host Adapter: PROM node ffd3d230 PROM name esp ESP Model FAS100A DMA Revision Rev 2 Live Targets [ 1 3 6 ] Target # config3 Sync Capabilities Disconnect Wide 1 00000003 [2f,04] yes no 3 0000000 [2f,04] yes no 6 00000001 [2f,04] yes no ========================================================== a few lines from ---> /var/log/messages (there are a bunch of these) Aug 26 04:32:28 localhost kernel: esp0: Resetting scsi bus Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset interrupt Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset interrupt Aug 26 04:32:28 localhost kernel: esp0: AIEEE wide msg received and not HME. Aug 26 04:32:28 localhost kernel: esp0: hoping for msgout Aug 26 04:32:28 localhost kernel: esp0: Resetting scsi bus Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset interrupt Aug 26 04:32:28 localhost kernel: esp0: SCSI bus reset interrupt Aug 26 04:32:29 localhost kernel: esp0: AIEEE wide msg received and not HME. Aug 26 04:32:29 localhost kernel: esp0: hoping for msgout Aug 26 04:32:29 localhost kernel: esp0: Resetting scsi bus Aug 26 04:32:29 localhost kernel: esp0: SCSI bus reset interrupt Aug 26 04:32:29 localhost kernel: esp0: SCSI bus reset interrupt Aug 26 04:32:29 localhost kernel: esp0: AIEEE wide msg received and not HME. Aug 26 04:32:29 localhost kernel: esp0: hoping for msgout ========================================================== If you can help, please let me know. THANKS!!! Richard Davis, Jr. Can you try the sparc kernel that is shipped as part of the 6.2? I just downloaded and tried out the kernel from redhat6.2/sparc (kernel-2.2.14-5.0.sparc.rpm). Identical results. It boots up. Gets into init goes OK until it starts cron and then starts spitting out scsi errors: esp0: resetting scsi bus esp0: bus reset interrupt esp0: bus reset interrupt EXT2-fs error host 0 channel 0 id 4 lun=0 return code = 28000000 Additional sense indicates logical unit not ready, cause not reportable. The things normally started by cron do not start, eg i get an email from cron : Subject: Cron <dcarter@mowgli> /home/dcarter/distributed.net/start-pproxy /home/dcarter/distributed.net/start-pproxy: /home/dcarter/distributed.net/proxyper-current/proxyper: Input/output error While it completes bootin this sequence of messages: esp0: resetting scsi bus esp0: bus reset interrupt esp0: bus reset interrupt continues being outputted to the console. Eventually it finished booting and logging in works, but doing anything like 'ls' causes this error sequence to be printed out again, eventually ls succeeds. I've rebooted back to an egcs-1.1.2-12 compiled kernel and all is well again. OK, after months of work, here's what i did. The compiler that ships with redhat6.0 doesn't cause scsi errors, but does produce an unstable kernel (random lockups, uptimes rarely reaching 6 days). Any more recent compilers cause the scsi errors. I have the gcc 2.95 compiler from mandrake 7.0/sparc installed at the moment, i got the same scsi errors with that. However, i have just tried the 2.3.99-pre8 kernel. That does not have scsi errors, so it appears there is a bug in the scsi driver that was only apparent with recent compilers, and is not present in the 2.3 kernels. You might like to give 2.3 a try yourself. I imagine it will work with the updated redhat versions of gcc/egcs too. I would not think that this is "resolved" at this time. This scsi bus reset interrupt has been causing problems ever since switching from the original linux 6.0 kernel. It happens on Sparc 5's any time there are 2 hard drives on the system. After just loading RedHat 6.2 it is still unresolved. When will a kernel come out that doesn't have this problem? I have multiple "spare" sparc 5's and have tried this on a few of them. I'd even be happy to donate one to the cause if it would do any good. Tim |