172552 – aic7xxx detects `data overrun' for any AHA-39160 devices

Bug 172552 - aic7xxx detects `data overrun' for any AHA-39160 devices

Summary: aic7xxx detects `data overrun' for any AHA-39160 devices

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tom Coughlan
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-11-07 05:58 UTC by James Ashton
Modified:	2007-11-30 22:07 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 18:51:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description James Ashton 2005-11-07 05:58:37 UTC

Description of problem:

Hardware: Dell PowerEdge 1850 with integrated RAID (PERC 4/Di, LSI Logic
MegaRAID 521S) and Dual Ultra 320 PCI SCSI card AHA-39160.  OS installed on
mirrored SCSI drives.  Tape robot with two LTO3 drives on one channel of AHA-39160.

Connecting any SCSI device to either AHA-39160 SCSI bus generates messages
during boot:

kernel: (scsi0:A:1:0): data overrun detected in Data-in phase.  Tag == 0x5.
kernel: (scsi0:A:1:0): Have seen Data Phase.  Length = 50.  NumSGs = 1.

The robot always generates

kernel: (scsi0:A:0:0): parity error detected in Data-in phase. SEQADDR(0x83)
SCSIRATE(0x0)
kernel: (scsi0:A:0:0): parity error detected in Message-in phase. SEQADDR(0x1a5)
SCSIRATE(0x0)
last message repeated 647 times

four times before the kernel gives up:

kernel: scsi: device set offline - not ready or command retry failed after bus
reset: host 0 channel 0 id 0 lun 0

The tapes drives then operate correctly but the robot is inaccessible.

Relevant kernel boot messages:

kernel: SCSI subsystem driver Revision: 1.00
kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
kernel:         <Adaptec 3960D Ultra160 SCSI adapter>
kernel:         aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
kernel:
kernel: scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
kernel:         <Adaptec 3960D Ultra160 SCSI adapter>
kernel:         aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
kernel:
kernel: blk: queue c36e2e18, I/O limit 4095Mb (mask 0xffffffff)
kernel: (scsi0:A:1): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
kernel: (scsi0:A:2): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
kernel:   Vendor: DELL      Model: PV-136T           Rev: 3.35
kernel:   Type:   Medium Changer                     ANSI SCSI revision: 02
kernel: blk: queue c36e2c18, I/O limit 4095Mb (mask 0xffffffff)
kernel:   Vendor: IBM       Model: ULTRIUM-TD3       Rev: 54K1
kernel:   Type:   Sequential-Access                  ANSI SCSI revision: 03
kernel: blk: queue f7fcf418, I/O limit 4095Mb (mask 0xffffffff)
kernel:   Vendor: IBM       Model: ULTRIUM-TD3       Rev: 54K1
kernel:   Type:   Sequential-Access                  ANSI SCSI revision: 03
kernel: blk: queue c36e2818, I/O limit 4095Mb (mask 0xffffffff)
kernel: megaraid: v2.10.8.2-RH1 (Release Date: Mon Jul 26 12:15:51 EDT 2004)
kernel: megaraid: found 0x1028:0x0013:bus 2:slot 14:func 0
kernel: scsi2:Found MegaRAID controller at 0xf886f000, IRQ:38
kernel: megaraid: [521S:H430] detected 1 logical drives.
kernel: megaraid: supports extended CDBs.
kernel: megaraid: channel[0] is raid.
kernel: scsi2 : LSI Logic MegaRAID 521S 254 commands 16 targs 4 chans 7 luns
kernel: scsi2: scanning scsi channel 0 for logical drives.
kernel:   Vendor: MegaRAID  Model: LD 0 RAID1   69G  Rev: 521S
kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
kernel: blk: queue f7fcf618, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
kernel: scsi2: scanning scsi channel 1 for logical drives.
kernel: scsi2: scanning scsi channel 2 for logical drives.
kernel: scsi2: scanning scsi channel 3 for logical drives.
kernel: scsi2: scanning scsi channel 4 [P0] for physical devices.
kernel:   Vendor: PE/PV     Model: 1x2 SCSI BP       Rev: 1.0
kernel:   Type:   Processor                          ANSI SCSI revision: 02
kernel: blk: queue f7fcf818, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
kernel: Attached scsi disk sda at scsi2, channel 0, id 0, lun 0

Version-Release number of selected component (if applicable):

kernel-2.4.21-37.EL

How reproducible:

Every time

Steps to Reproduce:
1. Correctly connect SCSI devices to either AHA-39160 SCSI bus
2. Boot
3. mtx -f /dev/sg0 status
  
Actual results:

cannot open SCSI device '/dev/changer' - No such device

Expected results:

Report of robot (tape media changer) status.

Additional info:

aic7xxx version 6.2.36.  Have tried latest version from adaptec (6.3.11) with
very similar results.  Swapping motherboard, SCSI card, SCSI cables, terminator
has no effect.  Tried connecting each SCSI device (2 tape drives, 1 robot)
individually in turn.  Problem occurs any time any device is connected to either
SCSI bus on the AHA-39160 card.

Installing Microsoft Windows 2000 Server on the same hardware shows no errors
using diagnostic tools.  Able to successfully backup and install using the tape
drives and robot.  This is _not_ a hardware problem.

Tried adding

options scsi_mod max_scsi_luns=128 scsi_allow_ghost_devices=1

to /etc/modules.conf, running mkinitrd and rebooting but no effect seen.

Tried installing Debian (stable/sarge) linux with 2.6.10 kernel and both 6.2.36
and 6.3.11 versions of aic7xxx.  Same problem seen as under Redhat.

A log showing messages relating to the robot being taken offline follows.  The
robot is an 8-bit SCSI device whereas the tape drives are 16-bit.

Nov  7 03:30:14 shrek kernel: (scsi0:A:0:0): parity error detected in Data-in
phase. SEQADDR(0x1a6) SCSIRATE(0x0)
Nov  7 03:30:14 shrek kernel: (scsi0:A:0:0): parity error detected in Message-in
phase. SEQADDR(0x1a5) SCSIRATE(0x0)
Nov  7 03:30:19 shrek last message repeated 647 times
Nov  7 03:30:19 shrek kernel: scsi0:0:0:0: Attempting to queue an ABORT message
Nov  7 03:30:19 shrek kernel: CDB: 0x0 0x0 0x0 0x0 0x0 0x0
Nov  7 03:30:19 shrek kernel: scsi0: At time of recovery, card was not paused
Nov  7 03:30:19 shrek kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins
<<<<<<<<<<<<<<<<<
Nov  7 03:30:19 shrek kernel: scsi0: Dumping Card State in Message-out phase, at
SEQADDR 0x16b
Nov  7 03:30:19 shrek kernel: Card was paused
Nov  7 03:30:19 shrek kernel: ACCUM = 0xa0, SINDEX = 0x81, DINDEX = 0xe4, ARG_2
= 0x1
Nov  7 03:30:19 shrek kernel: HCNT = 0x0 SCBPTR = 0x0
Nov  7 03:30:19 shrek kernel: SCSIPHASE[0x0] SCSISIGI[0xf4] ERROR[0x0] SCSIBUSL[0x3]
Nov  7 03:30:19 shrek kernel: LASTPHASE[0xa0] SCSISEQ[0x12] SBLKCTL[0xa]
SCSIRATE[0x0]
Nov  7 03:30:19 shrek kernel: SEQCTL[0x10] SEQ_FLAGS[0x20] SSTAT0[0x0] SSTAT1[0x4]
Nov  7 03:30:19 shrek kernel: SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac]
Nov  7 03:30:19 shrek kernel: SXFRCTL0[0x88] DFCNTRL[0x0] DFSTATUS[0x89]
Nov  7 03:30:19 shrek kernel: STACK: 0xd2 0xe1 0xe1 0x16a
Nov  7 03:30:19 shrek kernel: SCB count = 6
Nov  7 03:30:19 shrek kernel: Kernel NEXTQSCB = 4
Nov  7 03:30:19 shrek kernel: Card NEXTQSCB = 4
Nov  7 03:30:19 shrek kernel: QINFIFO entries:
Nov  7 03:30:19 shrek kernel: Waiting Queue entries:
Nov  7 03:30:19 shrek kernel: Disconnected Queue entries:
Nov  7 03:30:19 shrek kernel: QOUTFIFO entries:
Nov  7 03:30:19 shrek kernel: Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Nov  7 03:30:19 shrek kernel: Sequencer SCB Info:
Nov  7 03:30:19 shrek kernel:   0 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0]
SCB_TAG[0x5]
Nov  7 03:30:19 shrek kernel:   1 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   2 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   3 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   4 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   5 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   6 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   7 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   8 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:   9 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  10 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  11 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  12 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  13 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  14 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  15 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  16 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  17 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  18 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  19 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  20 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  21 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  22 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  23 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  24 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  25 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  26 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  27 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  28 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  29 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  30 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel:  31 SCB_CONTROL[0x0] SCB_SCSIID[0xff]
SCB_LUN[0xff] SCB_TAG[0xff]
Nov  7 03:30:19 shrek kernel: Pending list:
Nov  7 03:30:19 shrek kernel:   5 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0]
Nov  7 03:30:19 shrek kernel: Kernel Free SCB list: 3 2 1 0
Nov  7 03:30:19 shrek kernel: Untagged Q(0): 5
Nov  7 03:30:19 shrek kernel: DevQ(0:0:0): 0 waiting
Nov  7 03:30:19 shrek kernel: DevQ(0:1:0): 0 waiting
Nov  7 03:30:19 shrek kernel: DevQ(0:2:0): 0 waiting
Nov  7 03:30:19 shrek kernel:
Nov  7 03:30:19 shrek kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends
>>>>>>>>>>>>>>>>>>
Nov  7 03:30:19 shrek kernel: scsi0:0:0:0: Device is active, asserting ATN
Nov  7 03:30:19 shrek kernel: Recovery code sleeping
Nov  7 03:30:19 shrek kernel: (scsi0:A:0:0): Abort Message Sent
Nov  7 03:30:19 shrek kernel: (scsi0:A:0:0): SCB 5 - Abort Completed.
Nov  7 03:30:19 shrek kernel: Recovery SCB completes
Nov  7 03:30:19 shrek kernel: Recovery code awake
Nov  7 03:30:19 shrek kernel: aic7xxx_abort returns 0x2002
Nov  7 03:30:19 shrek kernel: scsi: device set offline - not ready or command
retry failed after bus reset: host 0 channel 0 id 0 lun 0

Comment 1 RHEL Program Management 2007-10-19 18:51:43 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.