Description of problem: Hardware: Dell PowerEdge 1850 with integrated RAID (PERC 4/Di, LSI Logic MegaRAID 521S) and Dual Ultra 320 PCI SCSI card AHA-39160. OS installed on mirrored SCSI drives. Tape robot with two LTO3 drives on one channel of AHA-39160. Connecting any SCSI device to either AHA-39160 SCSI bus generates messages during boot: kernel: (scsi0:A:1:0): data overrun detected in Data-in phase. Tag == 0x5. kernel: (scsi0:A:1:0): Have seen Data Phase. Length = 50. NumSGs = 1. The robot always generates kernel: (scsi0:A:0:0): parity error detected in Data-in phase. SEQADDR(0x83) SCSIRATE(0x0) kernel: (scsi0:A:0:0): parity error detected in Message-in phase. SEQADDR(0x1a5) SCSIRATE(0x0) last message repeated 647 times four times before the kernel gives up: kernel: scsi: device set offline - not ready or command retry failed after bus reset: host 0 channel 0 id 0 lun 0 The tapes drives then operate correctly but the robot is inaccessible. Relevant kernel boot messages: kernel: SCSI subsystem driver Revision: 1.00 kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36 kernel: <Adaptec 3960D Ultra160 SCSI adapter> kernel: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs kernel: kernel: scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36 kernel: <Adaptec 3960D Ultra160 SCSI adapter> kernel: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs kernel: kernel: blk: queue c36e2e18, I/O limit 4095Mb (mask 0xffffffff) kernel: (scsi0:A:1): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit) kernel: (scsi0:A:2): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit) kernel: Vendor: DELL Model: PV-136T Rev: 3.35 kernel: Type: Medium Changer ANSI SCSI revision: 02 kernel: blk: queue c36e2c18, I/O limit 4095Mb (mask 0xffffffff) kernel: Vendor: IBM Model: ULTRIUM-TD3 Rev: 54K1 kernel: Type: Sequential-Access ANSI SCSI revision: 03 kernel: blk: queue f7fcf418, I/O limit 4095Mb (mask 0xffffffff) kernel: Vendor: IBM Model: ULTRIUM-TD3 Rev: 54K1 kernel: Type: Sequential-Access ANSI SCSI revision: 03 kernel: blk: queue c36e2818, I/O limit 4095Mb (mask 0xffffffff) kernel: megaraid: v2.10.8.2-RH1 (Release Date: Mon Jul 26 12:15:51 EDT 2004) kernel: megaraid: found 0x1028:0x0013:bus 2:slot 14:func 0 kernel: scsi2:Found MegaRAID controller at 0xf886f000, IRQ:38 kernel: megaraid: [521S:H430] detected 1 logical drives. kernel: megaraid: supports extended CDBs. kernel: megaraid: channel[0] is raid. kernel: scsi2 : LSI Logic MegaRAID 521S 254 commands 16 targs 4 chans 7 luns kernel: scsi2: scanning scsi channel 0 for logical drives. kernel: Vendor: MegaRAID Model: LD 0 RAID1 69G Rev: 521S kernel: Type: Direct-Access ANSI SCSI revision: 02 kernel: blk: queue f7fcf618, I/O limit 4294967295Mb (mask 0xffffffffffffffff) kernel: scsi2: scanning scsi channel 1 for logical drives. kernel: scsi2: scanning scsi channel 2 for logical drives. kernel: scsi2: scanning scsi channel 3 for logical drives. kernel: scsi2: scanning scsi channel 4 [P0] for physical devices. kernel: Vendor: PE/PV Model: 1x2 SCSI BP Rev: 1.0 kernel: Type: Processor ANSI SCSI revision: 02 kernel: blk: queue f7fcf818, I/O limit 4294967295Mb (mask 0xffffffffffffffff) kernel: Attached scsi disk sda at scsi2, channel 0, id 0, lun 0 Version-Release number of selected component (if applicable): kernel-2.4.21-37.EL How reproducible: Every time Steps to Reproduce: 1. Correctly connect SCSI devices to either AHA-39160 SCSI bus 2. Boot 3. mtx -f /dev/sg0 status Actual results: cannot open SCSI device '/dev/changer' - No such device Expected results: Report of robot (tape media changer) status. Additional info: aic7xxx version 6.2.36. Have tried latest version from adaptec (6.3.11) with very similar results. Swapping motherboard, SCSI card, SCSI cables, terminator has no effect. Tried connecting each SCSI device (2 tape drives, 1 robot) individually in turn. Problem occurs any time any device is connected to either SCSI bus on the AHA-39160 card. Installing Microsoft Windows 2000 Server on the same hardware shows no errors using diagnostic tools. Able to successfully backup and install using the tape drives and robot. This is _not_ a hardware problem. Tried adding options scsi_mod max_scsi_luns=128 scsi_allow_ghost_devices=1 to /etc/modules.conf, running mkinitrd and rebooting but no effect seen. Tried installing Debian (stable/sarge) linux with 2.6.10 kernel and both 6.2.36 and 6.3.11 versions of aic7xxx. Same problem seen as under Redhat. A log showing messages relating to the robot being taken offline follows. The robot is an 8-bit SCSI device whereas the tape drives are 16-bit. Nov 7 03:30:14 shrek kernel: (scsi0:A:0:0): parity error detected in Data-in phase. SEQADDR(0x1a6) SCSIRATE(0x0) Nov 7 03:30:14 shrek kernel: (scsi0:A:0:0): parity error detected in Message-in phase. SEQADDR(0x1a5) SCSIRATE(0x0) Nov 7 03:30:19 shrek last message repeated 647 times Nov 7 03:30:19 shrek kernel: scsi0:0:0:0: Attempting to queue an ABORT message Nov 7 03:30:19 shrek kernel: CDB: 0x0 0x0 0x0 0x0 0x0 0x0 Nov 7 03:30:19 shrek kernel: scsi0: At time of recovery, card was not paused Nov 7 03:30:19 shrek kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< Nov 7 03:30:19 shrek kernel: scsi0: Dumping Card State in Message-out phase, at SEQADDR 0x16b Nov 7 03:30:19 shrek kernel: Card was paused Nov 7 03:30:19 shrek kernel: ACCUM = 0xa0, SINDEX = 0x81, DINDEX = 0xe4, ARG_2 = 0x1 Nov 7 03:30:19 shrek kernel: HCNT = 0x0 SCBPTR = 0x0 Nov 7 03:30:19 shrek kernel: SCSIPHASE[0x0] SCSISIGI[0xf4] ERROR[0x0] SCSIBUSL[0x3] Nov 7 03:30:19 shrek kernel: LASTPHASE[0xa0] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0x0] Nov 7 03:30:19 shrek kernel: SEQCTL[0x10] SEQ_FLAGS[0x20] SSTAT0[0x0] SSTAT1[0x4] Nov 7 03:30:19 shrek kernel: SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac] Nov 7 03:30:19 shrek kernel: SXFRCTL0[0x88] DFCNTRL[0x0] DFSTATUS[0x89] Nov 7 03:30:19 shrek kernel: STACK: 0xd2 0xe1 0xe1 0x16a Nov 7 03:30:19 shrek kernel: SCB count = 6 Nov 7 03:30:19 shrek kernel: Kernel NEXTQSCB = 4 Nov 7 03:30:19 shrek kernel: Card NEXTQSCB = 4 Nov 7 03:30:19 shrek kernel: QINFIFO entries: Nov 7 03:30:19 shrek kernel: Waiting Queue entries: Nov 7 03:30:19 shrek kernel: Disconnected Queue entries: Nov 7 03:30:19 shrek kernel: QOUTFIFO entries: Nov 7 03:30:19 shrek kernel: Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Nov 7 03:30:19 shrek kernel: Sequencer SCB Info: Nov 7 03:30:19 shrek kernel: 0 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x5] Nov 7 03:30:19 shrek kernel: 1 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 2 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 3 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 4 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 5 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 6 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 7 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 8 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 9 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 10 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 11 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 12 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 13 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 14 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 15 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 16 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 17 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 18 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 19 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 20 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 21 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 22 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 23 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 24 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 25 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 26 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 27 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 28 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 29 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 30 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: 31 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] Nov 7 03:30:19 shrek kernel: Pending list: Nov 7 03:30:19 shrek kernel: 5 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] Nov 7 03:30:19 shrek kernel: Kernel Free SCB list: 3 2 1 0 Nov 7 03:30:19 shrek kernel: Untagged Q(0): 5 Nov 7 03:30:19 shrek kernel: DevQ(0:0:0): 0 waiting Nov 7 03:30:19 shrek kernel: DevQ(0:1:0): 0 waiting Nov 7 03:30:19 shrek kernel: DevQ(0:2:0): 0 waiting Nov 7 03:30:19 shrek kernel: Nov 7 03:30:19 shrek kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> Nov 7 03:30:19 shrek kernel: scsi0:0:0:0: Device is active, asserting ATN Nov 7 03:30:19 shrek kernel: Recovery code sleeping Nov 7 03:30:19 shrek kernel: (scsi0:A:0:0): Abort Message Sent Nov 7 03:30:19 shrek kernel: (scsi0:A:0:0): SCB 5 - Abort Completed. Nov 7 03:30:19 shrek kernel: Recovery SCB completes Nov 7 03:30:19 shrek kernel: Recovery code awake Nov 7 03:30:19 shrek kernel: aic7xxx_abort returns 0x2002 Nov 7 03:30:19 shrek kernel: scsi: device set offline - not ready or command retry failed after bus reset: host 0 channel 0 id 0 lun 0
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.