Bug 197596

Summary: aic7xxx driver causes 'Infinite interupt loop'.
Product: Red Hat Enterprise Linux 3 Reporter: Chris Gilbert <chris.gilbert>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: petrides
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-13 12:55:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Gilbert 2006-07-04 13:00:18 UTC
Description of problem:

aic7xxx driver causes 'Infinite interupt loop'.  This happens when using
Netbackup 5.1 with the latest updates.  Normally backups are successful, but
often they fail for no disernable cause (Netbackup throws an error, 'unable to
contact storage device').  It appears to be a SCSI related issue, and indeed we
have already had several SCSI related problems with earlier kernels. The errors
suggest this driver is the cause.

Version-Release number of selected component (if applicable):

Kernel: 2.4.21-40ELsmp

/proc/scsi/aic7xxx/0:

Adaptec AIC7xxx driver version: 6.2.36
Adaptec aic7892 Ultra160 SCSI adapter
aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
Allocated SCBs: 6, SG List Length: 85


How reproducible:

Difficult to reproduce.  Only occurs after backups have been performing
correctly for some time to a tape changer (Dell 110T), but occurs frequently on
at least a weekly basis.  We are using the latest patches for RHN.  

I have attempted to install newer adaptec drivers (from rpm here
http://www.adaptec.com/en-US/downloads/rh/rhel_3?productId=ASC-39160&dn=Adaptec+SCSI+Card+39160
) on a test server of the same hardware configuration, but there are no kernel
modules available for our kernel version.

I will attempt to install the source distribution for 6.3.9 (at
http://www.adaptec.com/en-US/downloads/rh/rhel_3?productId=ASC-39160&dn=Adaptec+SCSI+Card+39160)
if necessary, but I haven't had a lot of luck with that yet.

Steps to Reproduce:

Unable to reproduce at will as of yet.
  
Additional info:

The output from dmesg is as follows:


scsi2:0:5:0: Cmd aborted from QINFIFO
aic7xxx_abort returns 0x2002
scsi2:0:5:0: Attempting to queue a TARGET RESET message
CDB: 0x12 0x0 0x0 0x0 0x60 0x0
scsi2:0:5:0: Command not found
aic7xxx_dev_reset returns 0x2002
scsi2:0:5:0: Attempting to queue an ABORT message
CDB: 0x0 0x0 0x0 0x0 0x0 0x0
Infinite interrupt loop, INTSTAT = 0scsi2: At time of recovery, card was not paused
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi2: Dumping Card State while idle, at SEQADDR 0x18
Card was paused
ACCUM = 0x3, SINDEX = 0x48, DINDEX = 0xe4, ARG_2 = 0x0
HCNT = 0x0 SCBPTR = 0x1
SCSIPHASE[0x0] SCSISIGI[0x18] ERROR[0x0] SCSIBUSL[0x0]
LASTPHASE[0x1] SCSISEQ[0x1a] SBLKCTL[0xa] SCSIRATE[0x0]
SEQCTL[0x10] SEQ_FLAGS[0xc0] SSTAT0[0x10] SSTAT1[0x8]
SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac]
SXFRCTL0[0x80] DFCNTRL[0x0] DFSTATUS[0x89]
STACK: 0x0 0x16a 0x17f 0x17
SCB count = 6
Kernel NEXTQSCB = 5
Card NEXTQSCB = 3
QINFIFO entries: 3
Waiting Queue entries:
Disconnected Queue entries:
QOUTFIFO entries:
Sequencer Free SCB List: 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31
Sequencer SCB Info:
  0 SCB_CONTROL[0xc0] SCB_SCSIID[0x67] SCB_LUN[0x0] SCB_TAG[0xff]
  1 SCB_CONTROL[0x0] SCB_SCSIID[0x57] SCB_LUN[0x0] SCB_TAG[0xff]
  2 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  3 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  4 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  5 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  6 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  7 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 16 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 17 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 18 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 19 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 20 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 21 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 22 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 23 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 24 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 25 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 26 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 27 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 28 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 29 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 30 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 31 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
Pending list:
  3 SCB_CONTROL[0x40] SCB_SCSIID[0x57] SCB_LUN[0x0]
Kernel Free SCB list: 4 2 1 0
Untagged Q(5): 3
DevQ(0:0:0): 0 waiting
DevQ(0:5:0): 0 waiting
DevQ(0:6:0): 0 waiting

<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
scsi2:0:5:0: Cmd aborted from QINFIFO
aic7xxx_abort returns 0x2002
st0: Error with sense data: Current st09:00: sense key Illegal Request
Additional sense indicates Medium removal prevented



Output from lspci:


00:00.0 Host bridge: Broadcom CMIC-HE (rev 22)
00:00.1 Host bridge: Broadcom CMIC-HE
00:00.2 Host bridge: Broadcom CMIC-HE
00:00.3 Host bridge: Broadcom CMIC-HE
00:03.0 SCSI storage controller: Adaptec AIC-7892P U160/m (rev 02)
00:04.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:05.0 Class ff00: Dell Remote Access Card III
00:05.1 Class ff00: Dell Remote Access Card III
00:05.2 Class ff00: Dell Remote Access Card III: BMC/SMIC device not present
00:0f.0 Host bridge: Broadcom CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: Broadcom CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 ISA bridge: Broadcom CSB5 LPC bridge
00:10.0 Host bridge: Broadcom CIOB30 (rev 03)
00:10.2 Host bridge: Broadcom CIOB30 (rev 03)
00:11.0 Host bridge: Broadcom CIOB30 (rev 03)
00:11.2 Host bridge: Broadcom CIOB30 (rev 03)
00:12.0 Host bridge: Broadcom CIOB30 (rev 03)
00:12.2 Host bridge: Broadcom CIOB30 (rev 03)
03:01.0 I2O: LSI Logic / Symbios Logic MegaRAID (rev 01)
08:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit
Ethernet (rev 14)
08:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit
Ethernet (rev 14)
09:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 01)
09:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 01)
13:01.0 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01)
13:01.1 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01)

Comment 1 Chris Gilbert 2006-07-04 13:07:30 UTC
Correction:  The second url should have been
http://www.adaptec.com/en-US/speed/scsi/linux/aic7Yxx-2_0_15-6_3_11-linux-2_4_tgz.htm
for the Linux source drivers.

Comment 2 Ernie Petrides 2006-07-06 01:28:00 UTC
RHEL3 is now closed.

Comment 3 Chris Gilbert 2006-07-06 08:58:12 UTC
Does this mean that a fix would not go into the 2.4.21 kernel, but only into RHEL4?

Comment 4 Ernie Petrides 2006-07-06 22:04:46 UTC
Probably, although I'm not authorized to make any official statements.

It would be better if you contacted Customer Support.

Comment 5 Prarit Bhargava 2007-09-13 12:55:54 UTC
Closing re: Comment #2.

P.