Bug 44438 - (SCSI AIC7XXX)Adaptec driver emitting spurious errors
(SCSI AIC7XXX)Adaptec driver emitting spurious errors
Status: CLOSED WORKSFORME
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i686 Linux
high Severity high
: ---
: ---
Assigned To: Doug Ledford
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-06-13 12:26 EDT by Peter Beresford
Modified: 2005-10-31 17:00 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-12-16 22:28:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Peter Beresford 2001-06-13 12:26:03 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)

Description of problem:
After upgrading three machines from 7.0 to 7.1 (full overwrite, not 
upgrade installation), they all emit SCSI error messages as follows, at 
approximately four hour intervals (All machines are displaying exactly the 
same error message):

SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 28000002
[valid = 0] Info fld=0x0, Current sd08:11: sense key Hardware Error
Additional sense indicates Internal target failure
  I/O error: dev 08:11, sector 535008


How reproducible:
Always

Steps to Reproduce:
1. While machines are running, all the time
2.
3.
	

Additional info:

All machines are identical:

Dell, Dual Pentium III (600Mhz) [Running SMP kernel]
Dual Adaptec SCSI Adapters (AIC 7890 and AIC 7860, both with
SCSI bios v2.01).
The 7890 Adapters have 6 SCSI drives each (All Ultra2-LVD drives).
Errors are coming off the AIC 7890 adapter in each case.
These machines have been running Linux 6.1 through 7.0 quite
happily for the past 18 months.
Does not appear to be disk failure because all 3 machines are
emitting exactly the same errors.

I have opened this as High severity due to potential data losses.
Comment 1 Peter Beresford 2001-06-20 09:34:17 EDT
Additional Info.
The sector and device changes for each error message (I didn't say that before).
I have now a 4th machine (also a Dell PowerEdge but a completely different
configuration) which is doing the same thing.
These machines are apparently slowly destroying the integrity of their disks.

Comment 2 Peter Beresford 2001-06-20 09:44:27 EDT
Here's a copy of the lspci output for the main servers (they are busy
  web servers):

00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32
	Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M]
	Capabilities: [a0] AGP version 1.0
		Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
		Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge (prog-if 00 
[Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
	I/O behind bridge: 0000f000-0000ffff
	Memory behind bridge: fb000000-fdffffff
	Prefetchable memory behind bridge: f6000000-f6ffffff
	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B+

00:02.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03) (prog-
if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 32, cache line size 08
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=32
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: f9000000-faffffff
	Prefetchable memory behind bridge: 00000000f5000000-00000000f5f00000
	BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
	Capabilities: [dc] Power Management version 1
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=220mA PME(D0-,D1-,D2-
,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
		Bridge: PM- B3+

00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 0

00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01) (prog-if 80 
[Master])
	Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Region 4: I/O ports at 1000 [size=16]

00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01) (prog-if 
00 [UHCI])
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin D routed to IRQ 0
	Region 4: I/O ports at 1020 [disabled] [size=32]

00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
	Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin ? routed to IRQ 9

00:08.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
	Subsystem: Intel Corporation EtherExpress PRO/100+ Server Adapter 
(PILA8470B)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (2000ns min, 14000ns max), cache line size 08
	Interrupt: pin A routed to IRQ 20
	Region 0: Memory at fe201000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at dcc0 [size=64]
	Region 2: Memory at fe100000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME
(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=2 PME-

00:0a.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
	Subsystem: Intel Corporation EtherExpress PRO/100+ Server Adapter 
(PILA8470B)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (2000ns min, 14000ns max), cache line size 08
	Interrupt: pin A routed to IRQ 21
	Region 0: Memory at fe200000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at dc80 [size=64]
	Region 2: Memory at fe000000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME
(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=2 PME-

01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage Pro AGP 1X (rev 
5c) (prog-if 00 [VGA])
	Subsystem: Dell Computer Corporation: Unknown device 007c
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- 
Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (2000ns min), cache line size 08
	Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: I/O ports at fc00 [size=256]
	Region 2: Memory at fbfff000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: [50] AGP version 1.0
		Status: RQ=255 SBA+ 64bit- FW- Rate=x1
		Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

02:04.0 SCSI storage controller: Adaptec AHA-2940U2/W / 7890
	Subsystem: Dell Computer Corporation: Unknown device 007c
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (9750ns min, 6250ns max), cache line size 08
	Interrupt: pin A routed to IRQ 16
	BIST result: 00
	Region 0: I/O ports at ec00 [size=256]
	Region 1: Memory at f9fff000 (64-bit, non-prefetchable) [size=4K]
	Expansion ROM at fa000000 [disabled] [size=128K]
	Capabilities: [dc] Power Management version 1
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-
,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

02:06.0 SCSI storage controller: Adaptec AIC-7860 (rev 03)
	Subsystem: Adaptec: Unknown device 7860
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (1000ns min, 1000ns max), cache line size 08
	Interrupt: pin A routed to IRQ 16
	Region 0: I/O ports at e800 [size=256]
	Region 1: Memory at f9ffe000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [dc] Power Management version 1
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-
,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

Comment 3 enedeen 2001-06-20 13:19:14 EDT
We are noticing the same errors with the (single) machine that we
upgraded to 7.1, but at a less frequent rate.  We see approximately one
of these messages each day.
We have a Dell poweredge 1400SC.  Ours is a single processor version.
Let me know if you need additional information.

Ene.
Comment 4 Brent Fox 2001-06-21 19:56:48 EDT
It seems that the installer is working ok.  This seems like a kernel issue.
Comment 5 Peter Beresford 2001-07-05 02:51:24 EDT
Do I need to re-assign this problem.  There has been no update in the past
couple of weeks.  This was opened on 6-13, has anyone even begun to look
at this yet?  Do you need more info?
Any feedback would be helpful!
Comment 6 Doug Ledford 2001-08-24 14:27:58 EDT
I'm unable to reproduce this problem (and the specific error message you have
cited is not something that is generated by the aic7xxx driver).  I'm inclined
to think that there is some compatibility problem between your machine and the
2.4 linux kernel that shows up at a regular interval.  I'm assuming there is
some command being run every four hours that is triggering the problem.  If you
could find what command is causing it, then we might be able to isolate the problem.

Note You need to log in before you can comment on or make changes to this bug.