Red Hat Bugzilla – Bug 115391
Problems scanning multiple LUNs with aic7xxx module
Last modified: 2007-11-30 17:07:00 EST
Description of problem: I have a pair of RHEL 3 servers connected via Adaptec 29320LP adapters to an nStor 4110S hardware RAID SCSI array. The array is configured with a pair of volumes, at LUN 0 and LUN 1 (target 1). With kernel-smp-2.4.21-4.0.2, only the first LUN would be detected at boot time. I tried adding "options scsi_mod max_scsi_luns=127" to modules.conf, running mkinitrd, and appending "max_scsi_luns=127" to the kernel line in grub.conf. None of these made any difference. When I manually loaded scsi_mod, I got a message saying that the max_scsi_luns option was not supported. Recently I upgraded to kernel-smp-2.4.21-9. Immediately I saw new behavior. When the aic79xx driver loads, it spews messages to the console which suggest that it is trying and failing to probe all of the LUNs (up through 255) on target 1. This takes about ten minutes to complete. Afterwards, system startup continues normally and the two LUNs are successfully configured as /dev/sda and /dev/sdb. This behavior occurred out of the box, without max_scsi_luns being set. I tried setting the option in modules.conf, the initrd and grub.conf with max_scsi_luns=3, but the driver still probes everything through LUN 255. Version-Release number of selected component (if applicable): kernel-smp-2.4.21-9 How reproducible: Steps to Reproduce: 1.Configure SCSI RAID array with two LUNs 2.Attach array to Adaptec 29320LP 3.Boot system Actual results: System startup pauses for ~10 minutes as it probes LUNs and reports errors to console. Afterwards, the system starts normally. Expected results: I would expect the driver to only probe LUN 0 if max_scsi_luns is not set. If I do set it, I would expect not expect the probing to continue passed the LUN number specified in max_scsi_luns. The RHEL 3 update 1 release notes also imply that probing should stop as soon as a non-responsive LUN is tested. Additional info: Sample console output when aic79xx module loads: Feb 7 00:13:28 qadb1 kernel: TAT0[0x0] Feb 7 00:13:28 qadb1 kernel: SSTAT1[0x11] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] Feb 7 00:13:28 qadb1 kernel: SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] Feb 7 00:13:28 qadb1 kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x80] Feb 7 00:13:28 qadb1 kernel: Feb 7 00:13:28 qadb1 kernel: SCB Count = 6 CMDS_PENDING = 1 LASTSCB 0x5 CURRSCB 0x5 NEXTSCB 0x0 Feb 7 00:13:28 qadb1 kernel: qinstart = 1 qinfifonext = 1 Feb 7 00:13:28 qadb1 kernel: QINFIFO: Feb 7 00:13:28 qadb1 kernel: WAITING_TID_QUEUES: Feb 7 00:13:28 qadb1 kernel: Pending list: Feb 7 00:13:28 qadb1 kernel: 5 FIFO_USE[0x0] SCB_CONTROL[0x44] SCB_SCSIID[0x17] Feb 7 00:13:28 qadb1 kernel: Total 1 Feb 7 00:13:28 qadb1 syslog: klogd startup succeeded Feb 7 00:13:28 qadb1 kernel: Kernel Free SCB list: 4 3 2 1 0 Feb 7 00:13:28 qadb1 kernel: Sequencer Complete DMA-inprog list: Feb 7 00:13:28 qadb1 kernel: Sequencer Complete list: Feb 7 00:13:28 qadb1 kernel: Sequencer DMA-Up and Complete list: Feb 7 00:13:28 qadb1 kernel: Feb 7 00:13:28 qadb1 kernel: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 Feb 7 00:13:28 qadb1 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] Feb 7 00:13:28 qadb1 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Feb 7 00:13:28 qadb1 kernel: SOFFCNT[0x1] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Feb 7 00:13:28 qadb1 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Feb 7 00:13:28 qadb1 kernel: scsi0: FIFO1 Free, LONGJMP == 0x8072, SCB 0x5 Feb 7 00:13:28 qadb1 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x88] Feb 7 00:13:28 qadb1 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Feb 7 00:13:28 qadb1 irqbalance: irqbalance startup succeeded Feb 7 00:13:28 qadb1 kernel: SOFFCNT[0x1] MDFFSTAT[0x5] SHADDR = 0x09e, SHCNT = 0xffff62 Feb 7 00:13:28 qadb1 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Feb 7 00:13:28 qadb1 kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Feb 7 00:13:28 qadb1 kernel: scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 Feb 7 00:13:28 qadb1 kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 Feb 7 00:13:28 qadb1 kernel: SIMODE0[0xc] Feb 7 00:13:28 qadb1 kernel: CCSCBCTL[0x4] Feb 7 00:13:28 qadb1 kernel: scsi0: REG0 == 0xff00, SINDEX = 0x1bc, DINDEX = 0x1ba Feb 7 00:13:28 qadb1 kernel: scsi0: SCBPTR == 0x27, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 Feb 7 00:13:28 qadb1 kernel: CDB 27 0 0 0 0 0 Feb 7 00:13:28 qadb1 kernel: STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x29 0x14d Feb 7 00:13:28 qadb1 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> Feb 7 00:13:28 qadb1 kernel: DevQ(0:1:0): 0 waiting Feb 7 00:13:28 qadb1 kernel: DevQ(0:1:231): 0 waiting Feb 7 00:13:28 qadb1 kernel: scsi0:A:1:39: Target did not send an IDENTIFY message. LASTPHASE = 0x60. Feb 7 00:13:28 qadb1 kernel: scsi0: Issued Channel A Bus Reset. 1 SCBs aborted Feb 7 00:13:28 qadb1 kernel: scsi0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET Feb 7 00:13:28 qadb1 kernel: SAVED_SCSIID == 0x17, SAVED_LUN == 0x28, REG0 == 0xff00 ACCUM = 0x0 Feb 7 00:13:28 qadb1 kernel: SEQ_FLAGS == 0xc0, SCBPTR == 0x28, BTT == 0xff00, SINDEX == 0x1bc Feb 7 00:13:28 qadb1 kernel: SELID == 0x10, SCB_SCSIID == 0x0, SCB_LUN == 0x0, SCB_CONTROL == 0x0 Feb 7 00:13:28 qadb1 kernel: SCSIBUS[0] == 0xb1, SCSISIGI == 0x65 Feb 7 00:13:28 qadb1 kernel: SXFRCTL0 == 0x88 Feb 7 00:13:28 qadb1 kernel: SEQCTL0 == 0x10 Feb 7 00:13:28 qadb1 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< Feb 7 00:13:28 qadb1 kernel: scsi0: Dumping Card State at program address 0x165 Mode 0x33 Feb 7 00:13:28 qadb1 kernel: Card was paused Feb 7 00:13:28 qadb1 kernel: HS_MAILBOX[0x0] INTCTL[0x0] SEQINTSTAT[0x0] SAVED_MODE[0x11] Feb 7 00:13:28 qadb1 kernel: DFFSTAT[0x31] SCSISIGI[0x65] SCSIPHASE[0x2] SCSIBUS[0xb1] Feb 7 00:13:28 qadb1 portmap: portmap startup succeeded Feb 7 00:13:28 qadb1 kernel: LASTPHASE[0x60] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x10] Feb 7 00:13:28 qadb1 kernel: SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x0] SSTAT0[0x0] Feb 7 00:13:28 qadb1 kernel: SSTAT1[0x11] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] Feb 7 00:13:28 qadb1 kernel: SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] Feb 7 00:13:28 qadb1 kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x80] Feb 7 00:13:28 qadb1 kernel: Feb 7 00:13:28 qadb1 kernel: SCB Count = 6 CMDS_PENDING = 1 LASTSCB 0x5 CURRSCB 0x5 NEXTSCB 0x0 Feb 7 00:13:28 qadb1 kernel: qinstart = 1 qinfifonext = 1 Feb 7 00:13:28 qadb1 kernel: QINFIFO: Feb 7 00:13:28 qadb1 kernel: WAITING_TID_QUEUES: Feb 7 00:13:28 qadb1 kernel: Pending list: Feb 7 00:13:28 qadb1 kernel: 5 FIFO_USE[0x0] SCB_CONTROL[0x44] SCB_SCSIID[0x17] Feb 7 00:13:28 qadb1 kernel: Total 1 Feb 7 00:13:28 qadb1 kernel: Kernel Free SCB list: 4 3 2 1 0 Feb 7 00:13:28 qadb1 kernel: Sequencer Complete DMA-inprog list: Feb 7 00:13:28 qadb1 kernel: Sequencer Complete list: Feb 7 00:13:28 qadb1 kernel: Sequencer DMA-Up and Complete list: Feb 7 00:13:28 qadb1 kernel: Feb 7 00:13:28 qadb1 kernel: scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 Feb 7 00:13:28 qadb1 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] Feb 7 00:13:28 qadb1 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Feb 7 00:13:28 qadb1 kernel: SOFFCNT[0x1] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Feb 7 00:13:28 qadb1 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Feb 7 00:13:28 qadb1 kernel: scsi0: FIFO1 Free, LONGJMP == 0x8072, SCB 0x5 Feb 7 00:13:28 qadb1 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x88] Feb 7 00:13:28 qadb1 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Feb 7 00:13:28 qadb1 kernel: SOFFCNT[0x1] MDFFSTAT[0x5] SHADDR = 0x09e, SHCNT = 0xffff62 Feb 7 00:13:28 qadb1 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Feb 7 00:13:28 qadb1 rpc.statd[1345]: Version 1.0.5 Starting Feb 7 00:13:28 qadb1 kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Feb 7 00:13:28 qadb1 kernel: scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 Feb 7 00:13:28 qadb1 kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 Feb 7 00:13:28 qadb1 kernel: SIMODE0[0xc] Feb 7 00:13:28 qadb1 kernel: CCSCBCTL[0x4] Feb 7 00:13:28 qadb1 kernel: scsi0: REG0 == 0xff00, SINDEX = 0x1bc, DINDEX = 0x1ba Feb 7 00:13:28 qadb1 nfslock: rpc.statd startup succeeded Feb 7 00:13:28 qadb1 kernel: scsi0: SCBPTR == 0x28, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 Feb 7 00:13:28 qadb1 kernel: CDB 28 0 0 0 0 0 Feb 7 00:13:28 qadb1 kernel: STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x29 0x14d Feb 7 00:13:28 qadb1 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> Feb 7 00:13:28 qadb1 kernel: DevQ(0:1:0): 0 waiting Feb 7 00:13:28 qadb1 kernel: DevQ(0:1:232): 0 waiting Feb 7 00:13:28 qadb1 kernel: scsi0:A:1:40: Target did not send an IDENTIFY message. LASTPHASE = 0x60. Feb 7 00:13:28 qadb1 kernel: scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
I apologize for the long delay in replying to this. All the LUNs on the target are scanned if the storage device is listed in the scsi_scan.c "whitelist", with the SPARSELUN flag set. This overrides the setting of max_scsi_luns. If you post the output of dmesg showing the boot messages, or /proc/scsi/scsi, I will check the list for your vendor-id and product-id. Aside from that, it appears that the adapter is having trouble scanning the LUN space. I know of one problem with the current version of the aic79xx driver that could be the cause. These drivers increase the max LUN they scan from 64 to 256. The problem occurs because on a parallel SCSI bus, the driver must use the packetized protocol to address LUNs > 63. The driver is not doing this, and instead, it is using the non-packetized protocol for LUN 64 and above. This can cause a veriety of problems, including command timeouts, and non-existant device configuration. If you are still having this problem, please post the vendor-id and product-id information, and the full listing of errors.
Since there are insufficient details provided in this report for us to investigate the issue further, and we have not received the feedback we requested, we will assume the problem was not reproduceable or has been fixed in a later update for this product. Users who have experienced this problem are encouraged to upgrade to the latest update release, and if this issue is still reproducible, please contact the Red Hat Global Support Services page on our website for technical support options: https://www.redhat.com/support If you have a telephone based support contract, you may contact Red Hat at 1-888-GO-REDHAT for technical support for the problem you are experiencing.