From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.0.1) Gecko/20020827 Netscape/7.0 Description of problem: Hi, We have a RH7.3 system running kernel 2.4.18-19.7.xsmp. We are getting panics with messages like these: aacraid: panic: length of sg list is too long It seems to happen after we read a lot of data in from a raid device. Please help, this problem is a show-stopper. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Run a program which reads a large amount of data 2. 3. Actual Results: system panic Expected Results: normal operation Additional info: The disks (an entire RAID container) were moved from a system running 2.4.18-10smp.
Here are the contents of /proc/scsi/sg/debug dev_max(currently)=11 max_active_device=5 (origin 1) scsi_dma_free_sectors=3872 sg_pool_secs_aval=320 def_reserved_size=32768 >>> device=sg0 scsi0 chan=0 id=0 lun=0 em=0 sg_tablesize=16 excl=0 FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active >>> device=sg1 scsi0 chan=0 id=2 lun=0 em=0 sg_tablesize=16 excl=0 FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active >>> device=sg2 scsi2 chan=0 id=2 lun=0 em=0 sg_tablesize=128 excl=0 FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(3): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(4): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(5): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active >>> device=sg3 scsi2 chan=0 id=3 lun=0 em=0 sg_tablesize=128 excl=0 FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active Here are the rest of the contents of /proc/scsi/sg/ # cat def_reserved_size 32768 # cat device_hdr host chan id lun type opens qdepth busy online # cat devices 0 0 0 0 0 8 10 0 1 0 0 2 0 0 3 10 0 1 2 0 2 0 0 6 253 0 1 2 0 3 0 0 3 253 0 1 2 0 4 0 0 1 253 0 1 # cat device_strs DELL PERCRAID Mirror V1.0 DELL PERCRAID RAID5 V1.0 IFT SR2000 0312 IFT SR2000 0312 IFT SR2000 0312 # cat host_hdr uid busy cpl scatg isa emul # cat hosts 0 0 512 16 0 0 0 0 2 128 0 0 1 0 2 128 0 0 2 0 2 128 0 0 1280 0 63 26 0 0 # cat host_strs percraid Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8 <Adaptec aic7899 Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8 <Adaptec 3960D Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8 <Adaptec 3960D Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs LSI Logic MegaRAID 161N 254 commands 15 targs 7 chans 7 luns # cat version 30124 Version: 3.1.24 (20020505) # cat allow_dio 0 Any ideas?
I'm initially baffled. The log you posted seems to show an ami megaraid not an aacraid. Also the base aacraid driver doesnt contain the error strings you report, although clearly you got an error from somewhere. Can you attach an lspci -vxx Also can you tell me what app is triggering the problem - is this just big cp/tar or stuff like oracle in use ? Finally was it stable with the -10 kernel ?
Thanks for helping us with this problem. It happens when we do a lot of I/O, using one of our user-level applications. I may be able to reproduce it using a big tar command. I don't know yet if it will be stable in the -10 kernel. I will be doing some tests on it. We do have a Megaraid board installed, but nothing is connected to it. The internal drives are connected to the aacraid controller. We also have three external SCSI disks, connected to an adaptec PCI card. (Those are the SR2000 entries shown above.) Here is the output from the lspci: lspci -vxx 00:00.0 Host bridge: ServerWorks CMIC-HE (rev 22) Flags: fast devsel 00: 66 11 11 00 00 00 00 00 22 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:00.1 Host bridge: ServerWorks CMIC-HE Flags: fast devsel 00: 66 11 11 00 00 00 00 00 00 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:00.2 Host bridge: ServerWorks CMIC-HE Flags: fast devsel 00: 66 11 11 00 00 00 00 00 00 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:00.3 Host bridge: ServerWorks CMIC-HE Flags: fast devsel 00: 66 11 11 00 00 00 00 00 00 00 00 06 10 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Dell Computer Corporation: Unknown device 0106 Flags: bus master, medium devsel, latency 32, IRQ 19 Memory at fe201000 (32-bit, non-prefetchable) [size=4K] I/O ports at ecc0 [size=64] Memory at fe000000 (32-bit, non-prefetchable) [size=1M] Capabilities: [dc] Power Management version 2 00: 86 80 29 12 17 01 90 02 08 00 00 02 08 20 00 00 10: 00 10 20 fe c1 ec 00 00 00 00 00 fe 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 08 38 00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: Dell Computer Corporation: Unknown device 0106 Flags: bus master, VGA palette snoop, stepping, medium devsel, latency 32 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] I/O ports at e800 [size=256] Memory at fe200000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at <unassigned> [disabled] [size=128K] Capabilities: [5c] Power Management version 2 00: 02 10 52 47 a7 00 90 02 27 00 00 03 08 20 00 00 10: 00 00 00 fd 01 e8 00 00 00 00 20 fe 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01 30: 00 00 00 00 5c 00 00 00 00 00 00 00 ff 00 08 00 00:0f.3 ISA bridge: ServerWorks: Unknown device 0225 Subsystem: ServerWorks: Unknown device 0230 Flags: bus master, medium devsel, latency 0 00: 66 11 25 02 05 00 00 02 00 00 01 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 30 02 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:10.0 Host bridge: ServerWorks CIOB30 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00:10.2 Host bridge: ServerWorks CIOB30 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00:11.0 Host bridge: ServerWorks CIOB30 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00:11.2 Host bridge: ServerWorks CIOB30 (rev 03) Flags: 66Mhz, medium devsel Capabilities: [60] PCI-X non-bridge device. 00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 01:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 14) Subsystem: Dell Computer Corporation Broadcom BCM5700 Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 22 Memory at fcf00000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- 00: e4 14 44 16 06 01 b0 02 14 00 00 02 08 20 00 00 10: 04 00 f0 fc 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 40 00 01:08.1 RAID bus controller: Dell Computer Corporation PowerEdge Expandable RAID Controller 3 (rev 01) Subsystem: Dell Computer Corporation PowerEdge Expandable RAID Controller 3/Di Flags: bus master, 66Mhz, slow devsel, latency 32, IRQ 20 Memory at f0000000 (32-bit, prefetchable) [size=128M] Expansion ROM at fcc00000 [disabled] [size=64K] Capabilities: [80] Power Management version 2 00: 28 10 0a 00 16 01 b0 04 01 00 04 01 08 20 80 00 10: 08 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01 30: 00 00 c0 fc 80 00 00 00 00 00 00 00 05 01 00 00 02:06.0 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01) Subsystem: Dell Computer Corporation: Unknown device 00c5 Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 20 BIST result: 00 I/O ports at dc00 [size=256] Memory at fcdff000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at fce00000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 00: 05 90 c5 00 17 01 b0 02 01 00 00 01 08 20 80 80 10: 01 dc 00 00 04 f0 df fc 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 c5 00 30: 00 00 e0 fc dc 00 00 00 00 00 00 00 05 01 28 19 02:06.1 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01) Subsystem: Dell Computer Corporation: Unknown device 0106 Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 21 BIST result: 00 I/O ports at d800 [disabled] [size=256] Memory at fcdfe000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at fce00000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 00: 05 90 cf 00 16 01 b0 02 01 00 00 01 08 20 80 80 10: 01 d8 00 00 04 e0 df fc 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01 30: 00 00 e0 fc dc 00 00 00 00 00 00 00 0b 02 28 19 03:08.0 PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 32 Bus: primary=03, secondary=04, subordinate=05, sec-latency=32 I/O behind bridge: 0000b000-0000bfff Memory behind bridge: efd00000-efffffff Prefetchable memory behind bridge: 00000000e0000000-00000000e7f00000 Capabilities: [dc] Power Management version 1 00: 86 80 54 b1 07 01 b0 02 00 00 04 06 00 20 01 00 10: 00 00 00 00 00 00 00 00 03 04 05 20 b1 b1 a0 22 20: d0 ef f0 ef 01 e0 f1 e7 00 00 00 00 00 00 00 00 30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 06 00 04:00.0 PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 32 Bus: primary=04, secondary=05, subordinate=05, sec-latency=32 Memory behind bridge: eff00000-efffffff Prefetchable memory behind bridge: 00000000e0000000-00000000e7f00000 Capabilities: [dc] Power Management version 1 00: 86 80 54 b1 07 01 b0 02 00 00 04 06 00 20 01 00 10: 00 00 00 00 00 00 00 00 04 05 05 20 f1 01 a0 22 20: f0 ef f0 ef 01 e0 f1 e7 00 00 00 00 00 00 00 00 30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 06 00 04:01.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 SCSI Processor (rev 06) Subsystem: American Megatrends Inc. QLA12160 on AMI MegaRAID Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 24 I/O ports at bc00 [size=256] Memory at efdff000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at efe00000 [disabled] [size=128K] Capabilities: [44] Power Management version 1 00: 77 10 16 12 17 01 b0 02 06 00 00 01 08 20 00 00 10: 01 bc 00 00 00 f0 df ef 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 1e 10 71 84 30: 00 00 e0 ef 44 00 00 00 00 00 00 00 0a 01 40 00 04:02.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 SCSI Processor (rev 06) Subsystem: American Megatrends Inc. QLA12160 on AMI MegaRAID Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 25 I/O ports at b800 [size=256] Memory at efdfe000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at efe00000 [disabled] [size=128K] Capabilities: [44] Power Management version 1 00: 77 10 16 12 17 01 b0 02 06 00 00 01 08 20 00 00 10: 01 b8 00 00 00 e0 df ef 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 1e 10 71 84 30: 00 00 e0 ef 44 00 00 00 00 00 00 00 05 01 40 00 05:00.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 20) Subsystem: Dell Computer Corporation PowerEdge RAID Controller 3/QC Flags: bus master, medium devsel, latency 32, IRQ 23 Memory at e0000000 (32-bit, prefetchable) [size=128M] Expansion ROM at eff00000 [disabled] [size=32K] Capabilities: [80] Power Management version 2 00: 1e 10 60 19 16 01 90 02 20 00 04 01 08 20 00 00 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 71 04 30: 00 00 f0 ef 80 00 00 00 00 00 00 00 0b 01 00 00 0f:06.1 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01) Subsystem: Adaptec AHA-3960D U160/m Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 44 BIST result: 00 I/O ports at 6800 [disabled] [size=256] Memory at dee00000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at ded00000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 00: 05 90 c0 00 16 01 b0 02 01 00 00 01 08 20 80 80 10: 01 68 00 00 04 00 e0 de 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 20 f6 30: 00 00 d0 de dc 00 00 00 00 00 00 00 05 02 28 19
Do you think that rebuilding the RAID-5 data container would help?
I dont that is involved. What firmware revision do you have. The message is apparently coming from the firmware not the linux side (ie we fed it something that made it unhappy). I need to talk to Adaptec about this.
The firmware version is 2.7.0 build 3153. I have some more messages from the firmware log: Nameserv: bogus mount index requested[0] (I think they start at 0) ... Nameserv: bogus mount index requested[31] Writecache: out of NVERR, into the NVUP state Panic: length of sg list is too big Fatal error: see system event log I found these by going into the percraid configuration at bootup, and then using ctrl-P to see the firmware log.
One thing that seems strange is that the SCSI IDs on the containers that seem to have the problems are {0,2}. I would have expected {0,1}, and indeed our other similar system uses {0,1}. Could that be related?
I ran another set of tests with it. Here is some more info: The "bigus" messages in the firmware log start with [3]. I unmounted the partition which occupies all of conmtainer 2 and reproduced the error. I ran a big 'tar' on container 2 and the error did not occur. I am now thinking maybe the error is with container 0 (the mirrored O/S container). Any ideas?
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/