I am seeing soemthign similar to bug 140311, but I am on x86_64. got a report from our Oracle DBA's that they saw a 'PCI Error interrupt' error message in /var/log/messages. kernel: 2.4.21-20.ELsmp system: quad Opteron 248's w/32GB RAM scsi0 and 1 are an internal aic79xx controller (OS and oracle software). scsi2 and 3 are a dual qlogic HBA hooked into our SAN storage. an example from /var/log/messages: Aug 2 05:13:17 db09 kernel: scsi0: PCI error Interrupt Aug 2 05:13:17 db09 kernel: scsi1: PCI error Interrupt Aug 2 05:13:17 db09 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< Aug 2 05:13:17 db09 kernel: scsi1: Dumping Card State at program address 0x26 Mode 0x22 Aug 2 05:13:17 db09 kernel: Card was paused Aug 2 05:13:17 db09 kernel: HS_MAILBOX[0x0] INTCTL[0x0] SEQINTSTAT[0x0] SAVED_MODE[0x0] Aug 2 05:13:17 db09 kernel: DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] Aug 2 05:13:17 db09 kernel: LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] Aug 2 05:13:17 db09 kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] Aug 2 05:13:17 db09 kernel: SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] Aug 2 05:13:17 db09 kernel: SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] Aug 2 05:13:17 db09 kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] Aug 2 05:13:17 db09 kernel: Aug 2 05:13:18 db09 kernel: SCB Count = 8 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x7 NEXTSCB 0x0 Aug 2 05:13:18 db09 kernel: qinstart = 30 qinfifonext = 30 Aug 2 05:13:18 db09 kernel: QINFIFO: Aug 2 05:13:18 db09 kernel: WAITING_TID_QUEUES: Aug 2 05:13:18 db09 kernel: Pending list: Aug 2 05:13:18 db09 kernel: Total 0 Aug 2 05:13:18 db09 kernel: Kernel Free SCB list: 7 6 5 4 3 2 1 0 Aug 2 05:13:18 db09 kernel: Sequencer Complete DMA-inprog list: Aug 2 05:13:18 db09 kernel: Sequencer Complete list: Aug 2 05:13:18 db09 kernel: Sequencer DMA-Up and Complete list: Aug 2 05:13:18 db09 kernel: Aug 2 05:13:18 db09 kernel: scsi1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 Aug 2 05:13:18 db09 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] Aug 2 05:13:18 db09 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Aug 2 05:13:18 db09 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Aug 2 05:13:18 db09 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Aug 2 05:13:18 db09 kernel: scsi1: FIFO1 Free, LONGJMP == 0x80ff, SCB 0x0 Aug 2 05:13:18 db09 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] Aug 2 05:13:18 db09 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Aug 2 05:13:18 db09 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Aug 2 05:13:18 db09 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Aug 2 05:13:18 db09 kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Aug 2 05:13:18 db09 kernel: scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 Aug 2 05:13:18 db09 kernel: scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 Aug 2 05:13:18 db09 kernel: SIMODE0[0xc] Aug 2 05:13:18 db09 kernel: CCSCBCTL[0x4] Aug 2 05:13:18 db09 kernel: scsi1: REG0 == 0x7, SINDEX = 0x11d, DINDEX = 0x120 Aug 2 05:13:18 db09 kernel: scsi1: SCBPTR == 0x7, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff02 Aug 2 05:13:18 db09 kernel: CDB 12 0 0 0 ff 0 Aug 2 05:13:18 db09 kernel: STACK: 0x14 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Aug 2 05:13:18 db09 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> Aug 2 05:13:18 db09 kernel: scsi1: Address or Write Phase Parity Error Detected in TARG. now an interesting thing is that there are no drives hooked to scsi1, only scsi0 and scsi2/3. (filing here as RH support has told me to file bugs here and then create a support case in the RH support tool if need to get priroity assigned to it).
Most of the log messages above come from ahd_dump_card_state() in "aic7xxx" SCSI driver.
The aic79xx driver is called to process an interrupt. The driver determines it is a PCI error (in ahd_pci_intr). ahd_pci_intr prints "PCI error Interrupt", then dumps the state of the HBA. (This is an extremely verbose output that the driver dumps routinely. It is useless to anyone other than an aic79xx firmware/hardware expert). Next, ahd_pci_intr determines that the error type is "Address or Write Phase Parity Error Detected" and the "pci_status_source" is TARG. As far as I can tell, TARG is a reference to a SCSI target. This is relevant because you (and the reporter in bug 140311) do not have any SCSI targets attached. This is apparently just noise caused by the fact that there is no attached SCSI bus. It would be ideal for us to determine the source of this and stop it, but as far as I can see, it is not a critical or dangerous situation.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.