Bug 164970

Summary: PCI error interrupt in /var/log/messages (aic7xxx)
Product: Red Hat Enterprise Linux 3 Reporter: Steven Roberts <strobert>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jparadis, petrides
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 18:56:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steven Roberts 2005-08-03 06:19:29 UTC
I am seeing soemthign similar to bug 140311, but I am on x86_64.

got a report from our Oracle DBA's that they saw a 'PCI Error interrupt' error
message in /var/log/messages.

kernel: 2.4.21-20.ELsmp
system: quad Opteron 248's w/32GB RAM

scsi0 and 1 are an internal aic79xx controller (OS and oracle software).  scsi2
and 3 are a dual qlogic HBA hooked into our SAN storage.

an example from /var/log/messages:
Aug  2 05:13:17 db09 kernel: scsi0: PCI error Interrupt
Aug  2 05:13:17 db09 kernel: scsi1: PCI error Interrupt
Aug  2 05:13:17 db09 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins
<<<<<<<<<<<<<<<<<
Aug  2 05:13:17 db09 kernel: scsi1: Dumping Card State at program address 0x26
Mode 0x22
Aug  2 05:13:17 db09 kernel: Card was paused
Aug  2 05:13:17 db09 kernel: HS_MAILBOX[0x0] INTCTL[0x0] SEQINTSTAT[0x0]
SAVED_MODE[0x0] 
Aug  2 05:13:17 db09 kernel: DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0]
SCSIBUS[0x0] 
Aug  2 05:13:17 db09 kernel: LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12]
SEQCTL0[0x0] 
Aug  2 05:13:17 db09 kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0]
SSTAT0[0x0] 
Aug  2 05:13:17 db09 kernel: SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] 
Aug  2 05:13:17 db09 kernel: SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0]
LQISTAT2[0x0] 
Aug  2 05:13:17 db09 kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] 
Aug  2 05:13:17 db09 kernel: 
Aug  2 05:13:18 db09 kernel: SCB Count = 8 CMDS_PENDING = 0 LASTSCB 0xffff
CURRSCB 0x7 NEXTSCB 0x0
Aug  2 05:13:18 db09 kernel: qinstart = 30 qinfifonext = 30
Aug  2 05:13:18 db09 kernel: QINFIFO:
Aug  2 05:13:18 db09 kernel: WAITING_TID_QUEUES:
Aug  2 05:13:18 db09 kernel: Pending list:
Aug  2 05:13:18 db09 kernel: Total 0
Aug  2 05:13:18 db09 kernel: Kernel Free SCB list: 7 6 5 4 3 2 1 0 
Aug  2 05:13:18 db09 kernel: Sequencer Complete DMA-inprog list: 
Aug  2 05:13:18 db09 kernel: Sequencer Complete list: 
Aug  2 05:13:18 db09 kernel: Sequencer DMA-Up and Complete list: 
Aug  2 05:13:18 db09 kernel: 
Aug  2 05:13:18 db09 kernel: scsi1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
Aug  2 05:13:18 db09 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0]
DFSTATUS[0x89] 
Aug  2 05:13:18 db09 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
Aug  2 05:13:18 db09 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 
Aug  2 05:13:18 db09 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] 
Aug  2 05:13:18 db09 kernel: scsi1: FIFO1 Free, LONGJMP == 0x80ff, SCB 0x0
Aug  2 05:13:18 db09 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0]
DFSTATUS[0x89] 
Aug  2 05:13:18 db09 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
Aug  2 05:13:18 db09 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 
Aug  2 05:13:18 db09 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] 
Aug  2 05:13:18 db09 kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
Aug  2 05:13:18 db09 kernel: scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE =
0x52
Aug  2 05:13:18 db09 kernel: scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
Aug  2 05:13:18 db09 kernel: SIMODE0[0xc] 
Aug  2 05:13:18 db09 kernel: CCSCBCTL[0x4] 
Aug  2 05:13:18 db09 kernel: scsi1: REG0 == 0x7, SINDEX = 0x11d, DINDEX = 0x120
Aug  2 05:13:18 db09 kernel: scsi1: SCBPTR == 0x7, SCB_NEXT == 0xff00, SCB_NEXT2
== 0xff02
Aug  2 05:13:18 db09 kernel: CDB 12 0 0 0 ff 0
Aug  2 05:13:18 db09 kernel: STACK: 0x14 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Aug  2 05:13:18 db09 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends
>>>>>>>>>>>>>>>>>>
Aug  2 05:13:18 db09 kernel: scsi1: Address or Write Phase Parity Error Detected
in TARG.

now an interesting thing is that there are no drives hooked to scsi1, only scsi0
and scsi2/3.

(filing here as RH support has told me to file bugs here and then create a
support case in the RH support tool if need to get priroity assigned to it).

Comment 1 Ernie Petrides 2005-08-03 20:38:31 UTC
Most of the log messages above come from ahd_dump_card_state()
in "aic7xxx" SCSI driver.

Comment 2 Tom Coughlan 2005-08-10 14:49:11 UTC
The aic79xx driver is called to process an interrupt. The driver determines it
is a PCI error (in ahd_pci_intr).

ahd_pci_intr prints "PCI error Interrupt", then dumps the state of the HBA.
(This is an extremely verbose output that the driver dumps routinely. It is
useless to anyone other than an aic79xx firmware/hardware expert). Next,
ahd_pci_intr determines that the error type is "Address or Write Phase Parity
Error Detected" and the "pci_status_source" is TARG. As far as I can tell, TARG
is a reference to a SCSI target. This is relevant because you (and the reporter
in bug 140311) do not have any SCSI targets attached.

This is apparently just noise caused by the fact that there is no attached SCSI
bus. It would be ideal for us to determine the source of this and stop it, but
as far as I can see, it is not a critical or dangerous situation. 



Comment 3 RHEL Program Management 2007-10-19 18:56:39 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.