164970 – PCI error interrupt in /var/log/messages (aic7xxx)

Bug 164970 - PCI error interrupt in /var/log/messages (aic7xxx)

Summary: PCI error interrupt in /var/log/messages (aic7xxx)

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tom Coughlan
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-08-03 06:19 UTC by Steven Roberts
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 18:56:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Steven Roberts 2005-08-03 06:19:29 UTC

I am seeing soemthign similar to bug 140311, but I am on x86_64.

got a report from our Oracle DBA's that they saw a 'PCI Error interrupt' error
message in /var/log/messages.

kernel: 2.4.21-20.ELsmp
system: quad Opteron 248's w/32GB RAM

scsi0 and 1 are an internal aic79xx controller (OS and oracle software).  scsi2
and 3 are a dual qlogic HBA hooked into our SAN storage.

an example from /var/log/messages:
Aug  2 05:13:17 db09 kernel: scsi0: PCI error Interrupt
Aug  2 05:13:17 db09 kernel: scsi1: PCI error Interrupt
Aug  2 05:13:17 db09 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins
<<<<<<<<<<<<<<<<<
Aug  2 05:13:17 db09 kernel: scsi1: Dumping Card State at program address 0x26
Mode 0x22
Aug  2 05:13:17 db09 kernel: Card was paused
Aug  2 05:13:17 db09 kernel: HS_MAILBOX[0x0] INTCTL[0x0] SEQINTSTAT[0x0]
SAVED_MODE[0x0] 
Aug  2 05:13:17 db09 kernel: DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0]
SCSIBUS[0x0] 
Aug  2 05:13:17 db09 kernel: LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12]
SEQCTL0[0x0] 
Aug  2 05:13:17 db09 kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0]
SSTAT0[0x0] 
Aug  2 05:13:17 db09 kernel: SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] 
Aug  2 05:13:17 db09 kernel: SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] 
Aug  2 05:13:17 db09 kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] 
Aug  2 05:13:17 db09 kernel: 
Aug  2 05:13:18 db09 kernel: SCB Count = 8 CMDS_PENDING = 0 LASTSCB 0xffff
CURRSCB 0x7 NEXTSCB 0x0
Aug  2 05:13:18 db09 kernel: qinstart = 30 qinfifonext = 30
Aug  2 05:13:18 db09 kernel: QINFIFO:
Aug  2 05:13:18 db09 kernel: WAITING_TID_QUEUES:
Aug  2 05:13:18 db09 kernel: Pending list:
Aug  2 05:13:18 db09 kernel: Total 0
Aug  2 05:13:18 db09 kernel: Kernel Free SCB list: 7 6 5 4 3 2 1 0 
Aug  2 05:13:18 db09 kernel: Sequencer Complete DMA-inprog list: 
Aug  2 05:13:18 db09 kernel: Sequencer Complete list: 
Aug  2 05:13:18 db09 kernel: Sequencer DMA-Up and Complete list: 
Aug  2 05:13:18 db09 kernel: 
Aug  2 05:13:18 db09 kernel: scsi1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
Aug  2 05:13:18 db09 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0]
DFSTATUS[0x89] 
Aug  2 05:13:18 db09 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
Aug  2 05:13:18 db09 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 
Aug  2 05:13:18 db09 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] 
Aug  2 05:13:18 db09 kernel: scsi1: FIFO1 Free, LONGJMP == 0x80ff, SCB 0x0
Aug  2 05:13:18 db09 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0]
DFSTATUS[0x89] 
Aug  2 05:13:18 db09 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
Aug  2 05:13:18 db09 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 
Aug  2 05:13:18 db09 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] 
Aug  2 05:13:18 db09 kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
Aug  2 05:13:18 db09 kernel: scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE =
0x52
Aug  2 05:13:18 db09 kernel: scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
Aug  2 05:13:18 db09 kernel: SIMODE0[0xc] 
Aug  2 05:13:18 db09 kernel: CCSCBCTL[0x4] 
Aug  2 05:13:18 db09 kernel: scsi1: REG0 == 0x7, SINDEX = 0x11d, DINDEX = 0x120
Aug  2 05:13:18 db09 kernel: scsi1: SCBPTR == 0x7, SCB_NEXT == 0xff00, SCB_NEXT2
== 0xff02
Aug  2 05:13:18 db09 kernel: CDB 12 0 0 0 ff 0
Aug  2 05:13:18 db09 kernel: STACK: 0x14 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Aug  2 05:13:18 db09 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends
>>>>>>>>>>>>>>>>>>
Aug  2 05:13:18 db09 kernel: scsi1: Address or Write Phase Parity Error Detected
in TARG.

now an interesting thing is that there are no drives hooked to scsi1, only scsi0
and scsi2/3.

(filing here as RH support has told me to file bugs here and then create a
support case in the RH support tool if need to get priroity assigned to it).

Comment 1 Ernie Petrides 2005-08-03 20:38:31 UTC

Most of the log messages above come from ahd_dump_card_state()
in "aic7xxx" SCSI driver.

Comment 2 Tom Coughlan 2005-08-10 14:49:11 UTC

The aic79xx driver is called to process an interrupt. The driver determines it
is a PCI error (in ahd_pci_intr).

ahd_pci_intr prints "PCI error Interrupt", then dumps the state of the HBA.
(This is an extremely verbose output that the driver dumps routinely. It is
useless to anyone other than an aic79xx firmware/hardware expert). Next,
ahd_pci_intr determines that the error type is "Address or Write Phase Parity
Error Detected" and the "pci_status_source" is TARG. As far as I can tell, TARG
is a reference to a SCSI target. This is relevant because you (and the reporter
in bug 140311) do not have any SCSI targets attached.

This is apparently just noise caused by the fact that there is no attached SCSI
bus. It would be ideal for us to determine the source of this and stop it, but
as far as I can see, it is not a critical or dangerous situation.

Comment 3 RHEL Program Management 2007-10-19 18:56:39 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.