Bug 84574 - (SCSI AACRAID)aacraid panic: sg list too long
(SCSI AACRAID)aacraid panic: sg list too long
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.3
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-02-18 21:12 EST by Need Real Name
Modified: 2007-04-18 12:51 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:40:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2003-02-18 21:12:29 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.0.1) Gecko/20020827
Netscape/7.0

Description of problem:
Hi,

We have a RH7.3 system running kernel 2.4.18-19.7.xsmp.

We are getting panics with messages like these:

aacraid: panic: length of sg list is too long

It seems to happen after we read a lot of data in from a raid
device.

Please help, this problem is a show-stopper.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Run a program which reads a large amount of data
2.
3.
    

Actual Results:  system panic

Expected Results:  normal operation

Additional info:

The disks (an entire RAID container) were moved from a system
running 2.4.18-10smp.
Comment 1 Need Real Name 2003-02-18 22:25:39 EST
Here are the contents of /proc/scsi/sg/debug

dev_max(currently)=11 max_active_device=5 (origin 1)
 scsi_dma_free_sectors=3872 sg_pool_secs_aval=320 def_reserved_size=32768
 >>> device=sg0 scsi0 chan=0 id=0 lun=0   em=0 sg_tablesize=16 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
 >>> device=sg1 scsi0 chan=0 id=2 lun=0   em=0 sg_tablesize=16 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
 >>> device=sg2 scsi2 chan=0 id=2 lun=0   em=0 sg_tablesize=128 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(3): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(4): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(5): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
 >>> device=sg3 scsi2 chan=0 id=3 lun=0   em=0 sg_tablesize=128 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active

Here are the rest of the contents of /proc/scsi/sg/ 

# cat def_reserved_size
32768

# cat device_hdr
host    chan    id      lun     type    opens   qdepth  busy    online

# cat devices
0       0       0       0       0       8       10      0       1
0       0       2       0       0       3       10      0       1
2       0       2       0       0       6       253     0       1
2       0       3       0       0       3       253     0       1
2       0       4       0       0       1       253     0       1

# cat device_strs
DELL            PERCRAID Mirror         V1.0
DELL            PERCRAID RAID5          V1.0
IFT             SR2000                  0312
IFT             SR2000                  0312
IFT             SR2000                  0312

# cat host_hdr
uid     busy    cpl     scatg   isa     emul

# cat hosts
0       0       512     16      0       0
0       0       2       128     0       0
1       0       2       128     0       0
2       0       2       128     0       0
1280    0       63      26      0       0

# cat host_strs
percraid
Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8         <Adaptec aic7899
Ultra160 SCSI adapter>         aic7899: Ultra160 Wide Channel B, SCSI Id=7,
32/253 SCBs 
Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8         <Adaptec 3960D
Ultra160 SCSI adapter>         aic7899: Ultra160 Wide Channel A, SCSI Id=7,
32/253 SCBs 
Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8         <Adaptec 3960D
Ultra160 SCSI adapter>         aic7899: Ultra160 Wide Channel B, SCSI Id=7,
32/253 SCBs 
LSI Logic MegaRAID 161N 254 commands 15 targs 7 chans 7 luns

# cat version
30124   Version: 3.1.24 (20020505)

# cat allow_dio
0

Any ideas?
Comment 2 Alan Cox 2003-02-19 06:52:30 EST
I'm initially baffled. The log you posted seems to show an ami megaraid not an
aacraid. Also
the base aacraid driver doesnt contain the error strings you report, although
clearly you got
an error from somewhere.

Can you attach an lspci -vxx
Also can you tell me what app is triggering the problem - is this just big
cp/tar or stuff like
oracle in use ?
Finally was it stable with the -10 kernel ?
Comment 3 Need Real Name 2003-02-19 09:34:36 EST
Thanks for helping us with this problem.  

It happens when we do a lot of I/O, using one of our user-level applications.
I may be able to reproduce it using a big tar command.

I don't know yet if it will be stable in the -10 kernel.  I will be doing
some tests on it.

We do have a Megaraid board installed, but nothing is connected to it.
The internal drives are connected to the aacraid controller.  

We also have three external SCSI disks, connected to an adaptec PCI card.
(Those are the SR2000 entries shown above.)

Here is the output from the lspci:
lspci -vxx 
00:00.0 Host bridge: ServerWorks CMIC-HE (rev 22)
        Flags: fast devsel
00: 66 11 11 00 00 00 00 00 22 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:00.1 Host bridge: ServerWorks CMIC-HE
        Flags: fast devsel
00: 66 11 11 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:00.2 Host bridge: ServerWorks CMIC-HE
        Flags: fast devsel
00: 66 11 11 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:00.3 Host bridge: ServerWorks CMIC-HE
        Flags: fast devsel
00: 66 11 11 00 00 00 00 00 00 00 00 06 10 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
        Subsystem: Dell Computer Corporation: Unknown device 0106
        Flags: bus master, medium devsel, latency 32, IRQ 19
        Memory at fe201000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at ecc0 [size=64]
        Memory at fe000000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [dc] Power Management version 2
00: 86 80 29 12 17 01 90 02 08 00 00 02 08 20 00 00
10: 00 10 20 fe c1 ec 00 00 00 00 00 fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 08 38

00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
(prog-if 00 [VGA])
        Subsystem: Dell Computer Corporation: Unknown device 0106
        Flags: bus master, VGA palette snoop, stepping, medium devsel, latency 32
        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        I/O ports at e800 [size=256]
        Memory at fe200000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [5c] Power Management version 2
00: 02 10 52 47 a7 00 90 02 27 00 00 03 08 20 00 00
10: 00 00 00 fd 01 e8 00 00 00 00 20 fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01
30: 00 00 00 00 5c 00 00 00 00 00 00 00 ff 00 08 00

00:0f.3 ISA bridge: ServerWorks: Unknown device 0225
        Subsystem: ServerWorks: Unknown device 0230
        Flags: bus master, medium devsel, latency 0
00: 66 11 25 02 05 00 00 02 00 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 66 11 30 02
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:10.0 Host bridge: ServerWorks CIOB30 (rev 03)
        Flags: 66Mhz, medium devsel
        Capabilities: [60] PCI-X non-bridge device.
00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

00:10.2 Host bridge: ServerWorks CIOB30 (rev 03)
        Flags: 66Mhz, medium devsel
        Capabilities: [60] PCI-X non-bridge device.
00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

00:11.0 Host bridge: ServerWorks CIOB30 (rev 03)
        Flags: 66Mhz, medium devsel
        Capabilities: [60] PCI-X non-bridge device.
00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

00:11.2 Host bridge: ServerWorks CIOB30 (rev 03)
        Flags: 66Mhz, medium devsel
        Capabilities: [60] PCI-X non-bridge device.
00: 66 11 10 00 42 01 b0 22 03 00 00 06 00 20 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00

01:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit
Ethernet (rev 14)
        Subsystem: Dell Computer Corporation Broadcom BCM5700
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 22
        Memory at fcf00000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
00: e4 14 44 16 06 01 b0 02 14 00 00 02 08 20 00 00
10: 04 00 f0 fc 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 40 00

01:08.1 RAID bus controller: Dell Computer Corporation PowerEdge Expandable RAID
Controller 3 (rev 01)
        Subsystem: Dell Computer Corporation PowerEdge Expandable RAID
Controller 3/Di
        Flags: bus master, 66Mhz, slow devsel, latency 32, IRQ 20
        Memory at f0000000 (32-bit, prefetchable) [size=128M]
        Expansion ROM at fcc00000 [disabled] [size=64K]
        Capabilities: [80] Power Management version 2
00: 28 10 0a 00 16 01 b0 04 01 00 04 01 08 20 80 00
10: 08 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01
30: 00 00 c0 fc 80 00 00 00 00 00 00 00 05 01 00 00

02:06.0 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
        Subsystem: Dell Computer Corporation: Unknown device 00c5
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 20
        BIST result: 00
        I/O ports at dc00 [size=256]
        Memory at fcdff000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at fce00000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
00: 05 90 c5 00 17 01 b0 02 01 00 00 01 08 20 80 80
10: 01 dc 00 00 04 f0 df fc 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 c5 00
30: 00 00 e0 fc dc 00 00 00 00 00 00 00 05 01 28 19

02:06.1 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01)
        Subsystem: Dell Computer Corporation: Unknown device 0106
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 21
        BIST result: 00
        I/O ports at d800 [disabled] [size=256]
        Memory at fcdfe000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at fce00000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
00: 05 90 cf 00 16 01 b0 02 01 00 00 01 08 20 80 80
10: 01 d8 00 00 04 e0 df fc 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 06 01
30: 00 00 e0 fc dc 00 00 00 00 00 00 00 0b 02 28 19

03:08.0 PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, medium devsel, latency 32
        Bus: primary=03, secondary=04, subordinate=05, sec-latency=32
        I/O behind bridge: 0000b000-0000bfff
        Memory behind bridge: efd00000-efffffff
        Prefetchable memory behind bridge: 00000000e0000000-00000000e7f00000
        Capabilities: [dc] Power Management version 1
00: 86 80 54 b1 07 01 b0 02 00 00 04 06 00 20 01 00
10: 00 00 00 00 00 00 00 00 03 04 05 20 b1 b1 a0 22
20: d0 ef f0 ef 01 e0 f1 e7 00 00 00 00 00 00 00 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 06 00

04:00.0 PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, medium devsel, latency 32
        Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
        Memory behind bridge: eff00000-efffffff
        Prefetchable memory behind bridge: 00000000e0000000-00000000e7f00000
        Capabilities: [dc] Power Management version 1
00: 86 80 54 b1 07 01 b0 02 00 00 04 06 00 20 01 00
10: 00 00 00 00 00 00 00 00 04 05 05 20 f1 01 a0 22
20: f0 ef f0 ef 01 e0 f1 e7 00 00 00 00 00 00 00 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 06 00

04:01.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 SCSI
Processor (rev 06)
        Subsystem: American Megatrends Inc. QLA12160 on AMI MegaRAID
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 24
        I/O ports at bc00 [size=256]
        Memory at efdff000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at efe00000 [disabled] [size=128K]
        Capabilities: [44] Power Management version 1
00: 77 10 16 12 17 01 b0 02 06 00 00 01 08 20 00 00
10: 01 bc 00 00 00 f0 df ef 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 1e 10 71 84
30: 00 00 e0 ef 44 00 00 00 00 00 00 00 0a 01 40 00

04:02.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 SCSI
Processor (rev 06)
        Subsystem: American Megatrends Inc. QLA12160 on AMI MegaRAID
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 25
        I/O ports at b800 [size=256]
        Memory at efdfe000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at efe00000 [disabled] [size=128K]
        Capabilities: [44] Power Management version 1
00: 77 10 16 12 17 01 b0 02 06 00 00 01 08 20 00 00
10: 01 b8 00 00 00 e0 df ef 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 1e 10 71 84
30: 00 00 e0 ef 44 00 00 00 00 00 00 00 05 01 40 00

05:00.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 20)
        Subsystem: Dell Computer Corporation PowerEdge RAID Controller 3/QC
        Flags: bus master, medium devsel, latency 32, IRQ 23
        Memory at e0000000 (32-bit, prefetchable) [size=128M]
        Expansion ROM at eff00000 [disabled] [size=32K]
        Capabilities: [80] Power Management version 2
00: 1e 10 60 19 16 01 90 02 20 00 04 01 08 20 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 71 04
30: 00 00 f0 ef 80 00 00 00 00 00 00 00 0b 01 00 00

0f:06.1 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01)
        Subsystem: Adaptec AHA-3960D U160/m
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 44
        BIST result: 00
        I/O ports at 6800 [disabled] [size=256]
        Memory at dee00000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at ded00000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
00: 05 90 c0 00 16 01 b0 02 01 00 00 01 08 20 80 80
10: 01 68 00 00 04 00 e0 de 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 20 f6
30: 00 00 d0 de dc 00 00 00 00 00 00 00 05 02 28 19
Comment 4 Need Real Name 2003-02-19 16:16:23 EST
Do you think that rebuilding the RAID-5 data container would help?
Comment 5 Alan Cox 2003-02-19 17:43:19 EST
I dont that is involved. What firmware revision do you have. The message is
apparently coming from the firmware not the linux side (ie we fed it something
that made it unhappy). I need to talk to Adaptec about this.
Comment 6 Need Real Name 2003-02-19 17:58:57 EST
The firmware version is 2.7.0 build 3153.

I have some more messages from the firmware log:

   Nameserv: bogus mount index requested[0]  (I think they start at 0)
   ...
   Nameserv: bogus mount index requested[31]

   Writecache: out of NVERR, into the NVUP state

   Panic: length of sg list is too big

   Fatal error: see system event log

I found these by going into the percraid configuration at bootup, and then
using ctrl-P to see the firmware log.


Comment 7 Need Real Name 2003-02-19 19:21:37 EST
One thing that seems strange is that the SCSI IDs on the containers
that seem to have the problems are {0,2}.  I would have expected {0,1}, and
indeed our other similar system uses {0,1}.

Could that be related?
Comment 8 Need Real Name 2003-02-19 20:24:19 EST
I ran another set of tests with it.  Here is some more info:

   The "bigus" messages in the firmware log start with [3].

   I unmounted the partition which occupies all of conmtainer 2
   and reproduced the error.

   I ran a big 'tar' on container 2 and the error did not occur.

I am now thinking maybe the error is with container 0 (the mirrored
O/S container).

Any ideas?
Comment 9 Bugzilla owner 2004-09-30 11:40:33 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.