Bug 118432 - Megaraid driver does not work reliably with 8G memory
Megaraid driver does not work reliably with 8G memory
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Peter Martuccelli
:
Depends On:
Blocks: 170417
  Show dependency treegraph
 
Reported: 2004-03-16 13:20 EST by Tymm Twillman
Modified: 2010-10-21 22:32 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-18 10:42:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patches that seem to fix problem w/megaraid, >8G memory on opteron (20.39 KB, patch)
2004-03-19 20:38 EST, Tymm Twillman
no flags Details | Diff
log of 2.4.21-9.0.1 w/megaraid2 watchdog dump (12.91 KB, text/plain)
2004-03-25 19:58 EST, Tymm Twillman
no flags Details
megaraid2 failure on boot (26.20 KB, text/plain)
2006-03-14 18:45 EST, murray lotnicz
no flags Details
output from lspci & lspci -nvv (10.17 KB, text/plain)
2006-03-14 18:46 EST, murray lotnicz
no flags Details

  None (edit)
Description Tymm Twillman 2004-03-16 13:20:35 EST
Description of problem:
megaraid driver often freezes (but only if loaded from initrd) in
first call to megaIssueCmd (from mega_i_query_adapter); (freezes in
while loops after first WRINDOOR).  If loaded from command line, it
works fine.  debugging has shown nothing too weird (or at least not
much different between working case and nonworking case) about memory
addresses used; doesn't *look* like a 64-bit issue from this
perspective, but not sure.  setting the DMA mask to 0xffffffff
(instead of 0xffffffffffffffffUL) doesn't make a difference.

megaraid2 driver often panics on loading from initrd; works fine also
when loaded from command line.

Version-Release number of selected component (if applicable):
ernel-2.4.21-4EL, kernel-2.4.21-9.0.1EL

How reproducible:
almost always

Steps to Reproduce:
1. install 8G ram in x86_64 system
2. install megaraid 320 board
3. set up initrd to load megaraid driver on boot (mostly tried with it    
   being boot device)
4. boot
  
Actual results:
kernel freezes when initializing megaraid driver

Expected results:
kernel probes megaraid board, gets config, scans SCSI devices, boots

Additional info:

-bash-2.05b# cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon)
-bash-2.05b# free
             total       used       free     shared    buffers     cached
Mem:       8036584     611612    7424972          0     117516     120876
-/+ buffers/cache:     373220    7663364
Swap:      2048276          0    2048276

-bash-2.05b# lsmod
Module                  Size  Used by    Not tainted
parport_pc             20100   1  (autoclean)
lp                     10088   0  (autoclean)
parport                42656   1  (autoclean) [parport_pc lp]
button                  6032   0  (unused)
autofs                 13924   0  (autoclean) (unused)
tg3                    55664   1
floppy                 63448   0  (autoclean)
sd_mod                 14548   0  (autoclean) (unused)
scsi_mod              130372   1  (autoclean) [sd_mod]
keybdev                 3136   0  (unused)
mousedev                6728   0  (unused)
hid                    21864   0  (unused)
input                   7520   0  [keybdev mousedev hid]
usb-ohci               23376   0  (unused)
usbcore                86848   1  [hid usb-ohci]
ext3                   89872   4
jbd                    58576   4  [ext3]

-bash-2.05b# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : AMD Opteron(tm) Processor 246
stepping        : 8
cpu MHz         : 2004.571
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 3997.69
TLB size        : 1088 4K pages
clflush size    : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp
 
processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : AMD Opteron(tm) Processor 246
stepping        : 8
cpu MHz         : 2004.571
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 3997.69
TLB size        : 1088 4K pages
clflush size    : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp

-bash-2.05b# cat /proc/pci
PCI devices found:
  Bus  0, device   6, function  0:
    PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 7).
      Master Capable.  Latency=115.  Min Gnt=14.
  Bus  0, device   7, function  0:
    ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 5).
  Bus  0, device   7, function  1:
    IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 3).
      Master Capable.  Latency=64.
      I/O at 0x1020 [0x102f].
  Bus  0, device   7, function  2:
    SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 2).
      IRQ 255.
      Master Capable.  Latency=64.
      I/O at 0x1000 [0x101f].
  Bus  0, device   7, function  3:
    Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 5).
      Master Capable.  Latency=64.
  Bus  0, device  10, function  0:
    PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge
(rev 18).
      Master Capable.  Latency=64.  Min Gnt=4.
  Bus  0, device  10, function  1:
    PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 1).
      Non-prefetchable 64 bit memory at 0xfc000000 [0xfc000fff].
  Bus  0, device  11, function  0:
    PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge
(#2) (rev 18).
      Master Capable.  Latency=64.  Min Gnt=4.
  Bus  0, device  11, function  1:
    PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (#2) (rev 1).
      Non-prefetchable 64 bit memory at 0xfc001000 [0xfc001fff].
  Bus  0, device  24, function  0:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (rev 0).
  Bus  0, device  24, function  1:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (rev 0).
  Bus  0, device  24, function  2:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (rev 0).
  Bus  0, device  24, function  3:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (rev 0).
  Bus  0, device  25, function  0:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (#2) (rev 0).
  Bus  0, device  25, function  1:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (#2) (rev 0).
  Bus  0, device  25, function  2:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (#2) (rev 0).
  Bus  0, device  25, function  3:
    Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge (#2) (rev 0).
  Bus  1, device   0, function  0:
    USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 11).
      IRQ 19.
      Master Capable.  Latency=64.  Max Lat=80.
      Non-prefetchable 32 bit memory at 0xfc100000 [0xfc100fff].
  Bus  1, device   0, function  1:
    USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (#2)
(rev 11).
      IRQ 19.
      Master Capable.  Latency=64.  Max Lat=80.
      Non-prefetchable 32 bit memory at 0xfc101000 [0xfc101fff].
  Bus  1, device   6, function  0:
    VGA compatible controller: ATI Technologies Inc Rage XL (rev 39).
      IRQ 18.
      Master Capable.  Latency=66.  Min Gnt=8.
      Non-prefetchable 32 bit memory at 0xfd000000 [0xfdffffff].
      I/O at 0x2000 [0x20ff].
      Non-prefetchable 32 bit memory at 0xfc102000 [0xfc102fff].
  Bus  2, device   2, function  0:
    RAID bus controller: LSI Logic / Symbios Logic PowerEdge
Expandable RAID Controller 4 (rev 1).
      IRQ 26.
      Master Capable.  Latency=64.
      Prefetchable 32 bit memory at 0xfe300000 [0xfe30ffff].
  Bus  2, device   3, function  0:
    Ethernet controller: Broadcom Corporation NetXtreme BCM5702
Gigabit Ethernet (rev 2).
      IRQ 27.
      Master Capable.  Latency=64.  Min Gnt=64.
      Non-prefetchable 64 bit memory at 0xfe000000 [0xfe00ffff].
  Bus  2, device   4, function  0:
    Ethernet controller: Broadcom Corporation NetXtreme BCM5702
Gigabit Ethernet (#2) (rev 2).
      IRQ 27.
      Master Capable.  Latency=64.  Min Gnt=64.
      Non-prefetchable 64 bit memory at 0xfe010000 [0xfe01ffff].
Comment 1 Tom Coughlan 2004-03-18 14:01:09 EST
We will need to do some reconfiguring to get a x86_64 system with 8GB.

In the meantime, would you post the panic message and stack trace when
the megaraid2 panics?

Thanks. 
Comment 2 Tymm Twillman 2004-03-19 20:38:33 EST
Created attachment 98703 [details]
patches that seem to fix problem w/megaraid, >8G memory on opteron

includes additional bits w/mtrr fix, consistent dma mask, exception handling...
unfortunately don't have enough available time on this machine (rebooting &
testing) to seperate out & verify exactly which pieces specifically fixed the
problem...
Comment 3 Tom Coughlan 2004-03-22 07:30:12 EST
Reassigning to the arch/x86_64/ maintainer.  Jim, please review.  Are
you familiar with this patch?
Comment 4 Tymm Twillman 2004-03-22 12:26:59 EST
BTW patches are from diffs with suse kernel (which worked with
megaraid); there are a good # of other differences but these looked to
be key.
Comment 5 Tymm Twillman 2004-03-22 13:58:30 EST
drat.  

seems the patches disable the extra >4G ram.  looks like i'll have the
system a little while longer after all.

will try to get serial console up so kernel panic w/megaraid2 can be
captured.
Comment 6 Tymm Twillman 2004-03-25 18:39:30 EST
This is full boot log for -4.ELsmp including dump -- ok, so it's a
lockup...  will try to get dump of -9.0.1.ELsmp.

Also found that kernel.org version 2.4.22 works ok (also tried 2.4.25
and that works just fine too); 2.4.21 boots fine most of the time but
then crashes at some point later (will work on getting dump info for
that too).

dumb mistake with patch i appended -- turns out that someone pulled 4G
out of the system :( -- sadly i'm remote so wasn't able to catch that
when system booted.  that has since been taken care of :)  but that
patch doesn't seem to fix problem.


Any word on when you may be able to get 8G smp opteron system up
w/megaraid?  If necessary we can look into loaning hardware.


Linux version 2.4.21-4.ELsmp (bhcompile@dolly.devel.redhat.com) (gcc
version 3.2.3 20030502 (Red Hat Linux 3.2.3-20)) #1 SMP Fri Oct 3
17:32:58 EDT 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009b800 (usable)
 BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000ce000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000fbf70000 (usable)
 BIOS-e820: 00000000fbf70000 - 00000000fbf76000 (ACPI data)
 BIOS-e820: 00000000fbf76000 - 00000000fbf80000 (ACPI NVS)
 BIOS-e820: 00000000fbf80000 - 00000000fc000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000200000000 (usable)
kernel direct mapping tables upto 10200000000 @ 8000-11000
found SMP MP-table at 000f69a0
hm, page 000f6000 reserved twice.
hm, page 000f7000 reserved twice.
hm, page 0009b000 reserved twice.
hm, page 0009c000 reserved twice.
On node 0 totalpages: 2097152
zone(0): 4096 pages.
zone(1): 2093056 pages.
zone(2): 0 pages.
ACPI: RSDP (v002 PTLTD                      ) @ 0x00000000000f6920
ACPI: XSDT (v001 PTLTD           XSDT   01540.00000) @ 0x00000000fbf734d8
ACPI: FADT (v003 AMD    HAMMER   01540.00000) @ 0x00000000fbf75e46
ACPI: MADT (v001 PTLTD           APIC   01540.00000) @ 0x00000000fbf75f3a
ACPI: SPCR (v001 PTLTD  $UCRTBL$ 01540.00000) @ 0x00000000fbf75fb0
ACPI: DSDT (v001 AMD-K8  AMDACPI 01540.00000) @ 0x0000000000000000
ACPI: BIOS passes blacklist
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] polarity[0x1] trigger[0x1] lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] polarity[0x1] trigger[0x1] lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0])
IOAPIC[0]: Assigned apic_id 2
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23
ACPI: IOAPIC (id[0x03] address[0xfc000000] global_irq_base[0x18])
IOAPIC[1]: Assigned apic_id 3
IOAPIC[1]: apic_id 3, version 17, address 0xfc000000, IRQ 24-27
ACPI: IOAPIC (id[0x04] address[0xfc001000] global_irq_base[0x1c])
IOAPIC[2]: Assigned apic_id 4
IOAPIC[2]: apic_id 4, version 17, address 0xfc001000, IRQ 28-31
ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x1]
trigger[0x1])
Using ACPI (MADT) for SMP configuration information
Checking aperture...
CPU 0: aperture @ 0 size 65536 KB
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
Mapping aperture over 65536 KB of RAM @ 10000000
Kernel command line: ro root=LABEL=/ console=ttyS0,38400
Initializing CPU#0
time.c: Detected 1.193182 MHz PIT timer.
time.c: Detected 2004.598 MHz TSC timer.
Console: colour VGA+ 80x25
Calibrating delay loop... 3997.69 BogoMIPS
Memory: 8027092k/8388608k available (1873k kernel code, 292824k
reserved, 1932k data,
220k init)
Dentry cache hash table entries: 262144 (order: 10, 4194304 bytes)
Inode cache hash table entries: 262144 (order: 10, 4194304 bytes)
Mount cache hash table entries: 256 (order: 0, 4096 bytes)
Buffer cache hash table entries: 524288 (order: 10, 4194304 bytes)
Page-cache hash table entries: 524288 (order: 10, 4194304 bytes)
CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64
bytes/line/2 way)
CPU: L2 Cache: 1024K (64 bytes/line/8 way)
Machine Check Reporting enabled for CPU#0
POSIX conformance testing by UNIFIX
mtrr: v2.02 (20020716))
CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64
bytes/line/2 way)
CPU: L2 Cache: 1024K (64 bytes/line/8 way)
CPU0: AMD Opteron(tm) Processor 246 stepping 08
per-CPU timeslice cutoff: 5119.91 usecs.
task migration cache decay timeout: 10 msecs.
Booting processor 1/1 rip 6000 page 000001001e64c000
Initializing CPU#1
Calibrating delay loop... 3997.69 BogoMIPS
CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64
bytes/line/2 way)
CPU: L2 Cache: 1024K (64 bytes/line/8 way)
Machine Check Reporting enabled for CPU#1
CPU1: AMD Opteron(tm) Processor 246 stepping 08
Total of 2 processors activated (7995.39 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
Setting 4 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 4 ... ok.
..TIMER: vector=0x31 pin1=2 pin2=0
testing the IO APIC.......................
 
 
 
.................................... done.
Using local APIC timer interrupts.
Detected 12.528 MHz APIC timer.
cpu: 0, clocks: 2004597, slice: 668199
CPU0<T0:2004592,T1:1336384,D:9,S:668199,C:2004597>
cpu: 1, clocks: 2004597, slice: 668199
CPU1<T0:2004592,T1:668192,D:2,S:668199,C:2004597>
checking TSC synchronization across CPUs: passed.
time.c: Using PIT/TSC based timekeeping.
Starting migration thread for cpu 0
Starting migration thread for cpu 1
ACPI: Subsystem revision 20030619
PCI: Using configuration type 1
 tbxface-0117 [03] acpi_load_tables      : ACPI Tables successfully
acquired
Parsing all Control
Methods:............................................................
Table [DSDT](id F004) - 297 Objects with 30 Devices 60 Methods 30 Regions
ACPI Namespace successfully loaded at root ffffffff8055c1e0
evxfevnt-0093 [04] acpi_enable           : Transition to ACPI mode
successful
evgpeblk-0748 [06] ev_create_gpe_block   : GPE 00 to 15 [_GPE] 2 regs
at 0000000000008020 on int 9
evgpeblk-0748 [06] ev_create_gpe_block   : GPE 176 to 207 [_GPE] 4
regs at 00000000000080B0 on int 9
Completing Region/Field/Buffer/Package
initialization:..............................................................
Initialized 30/30 Regions 0/0 Fields 15/15 Buffers 17/17 Packages (305
nodes)
Executing all Device _STA and_INI methods:...............................
31 Devices found containing: 31 _STA, 0 _INI methods
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: System [ACPI] (supports S0 S1 S4 S5)
ACPI: PCI Root Bridge [PCI0] (00:00)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 5 10 11, disabled)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 5 10 11, disabled)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 5 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 5 10 *11)
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 5
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
 pci_irq-0297 [11] acpi_pci_irq_derive   : Unable to derive IRQ for
device 00:07.2
PCI: No IRQ known for interrupt pin D of device 00:07.2 - using IRQ 255
PCI: Using ACPI for IRQ routing
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 7956M
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ 10000000 size 65536 KB
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
aio_setup: num_physpages = 524288
aio_setup: sizeof(struct page) = 104
Hugetlbfs mounted.
Total HugeTLB memory allocated, 0
IA32 emulation $Id: sys_ia32.c,v 1.56 2003/04/10 10:45:37 ak Exp $
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT
SHARE_IRQ SERIAL_PCI SERIAL_ACPI enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
register_serial(): autoconfig failed
register_serial(): autoconfig failed
Real Time Clock Driver v1.10e
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 256 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
AMD8111: IDE controller at PCI slot 00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
AMD_IDE: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) UDMA100
controller on pci00:07.1
    ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:pio, hdd:pio
hda: ST320410A, ATA DISK drive
blk: queue ffffffff805b8820, no I/O memory limit
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=38792/16/63,
UDMA(100)
ide-floppy driver 0.99.newide
Partition check:
 hda: [PTBL] [2434/255/63] hda1 hda2 hda3 hda4 < hda5 hda6 >
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 32768 buckets, 512Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
Initializing IPsec netlink socket
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: ext2 filesystem found at block 0
RAMDISK: Loading 8000 blocks [1 disk] into ram disk... done.
VFS: Mounted root (ext2 filesystem).
Red Hat nash version 3.5.13 starSCSI subsystem driver Revision: 1.00
ting
Loading scsi_mod.o modulemegaraid: v2.00.5 (Release Date: Thu Apr 24
14:06:55 EDT 2003)
 
Loading sd_mod.megaraid: found 0x1000:0x1960:bus 2:so module
Loadinlot 2:func 0
g megaraid2.o moscsi0:Found MegaRAID controller at 0xffffff000001f000,
IRQ:26
dule
NMI Watchdog detected LOCKUP on CPU1, eip ffffffffa0024a02, registers:
CPU 1
Pid: 16, comm: insmod Not tainted
RIP: 0010:[<ffffffffa0024a02>]{:megaraid2:issue_scb_block+194}
RSP: 0018:00000101fdf4bc58  EFLAGS: 00000056
RAX: 0000000000000000 RBX: 00000101fdf4bca8 RCX: 0000000000000000
RDX: 0000000010021011 RSI: 00000101fdf4bcb8 RDI: 00000101fde57020
RBP: 00000101fde57010 R08: 0100000010020000 R09: 0000000000000000
R10: 00000101fde60000 R11: 0000000000000800 R12: 00000101fde63100
R13: 00000101fde57008 R14: 0000000000001000 R15: 0000000000001960
FS:  0000000000000000(0000) GS:ffffffff805fb540(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005ee000 CR3: 000000001e636000 CR4: 00000000000006e0
 
Call Trace:  <EOE> [<ffffffffa0023c4b>]{:megaraid2:mega_query_adapter+91}
       [<ffffffff8011cfb5>]{pci_alloc_consistent+597}
[<ffffffffa00236f9>]{:megaraid2:mega_find_card+1193}
       [<ffffffffa002b680>]{:megaraid2:driver_template+0}
       [<ffffffffa002b680>]{:megaraid2:driver_template+0}
       [<ffffffffa002b680>]{:megaraid2:driver_template+0}
       [<ffffffffa0023197>]{:megaraid2:megaraid_detect+167}
       [<ffffffffa0002115>]{:scsi_mod:scsi_register_host+117}
       [<ffffffff801528f0>]{__alloc_pages+112}
[<ffffffffa0028780>]{:megaraid2:init_this_scsi_driver+32}
       [<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00230b8>]
       [<ffffffff801100c7>]{system_call+119}
Process insmod (pid: 16, stackpage=101fdf4b000)
Stack: 00000101fdf4bc58 0000000000000018 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:  <EOE> [<ffffffffa0023c4b>]{:megaraid2:mega_query_adapter+91}
       [<ffffffff8011cfb5>]{pci_alloc_consistent+597}
[<ffffffffa00236f9>]{:megaraid2:mega_find_card+1193}
       [<ffffffffa002b680>]{:megaraid2:driver_template+0}
       [<ffffffffa002b680>]{:megaraid2:driver_template+0}
       [<ffffffffa002b680>]{:megaraid2:driver_template+0}
       [<ffffffffa0023197>]{:megaraid2:megaraid_detect+167}
       [<ffffffffa0002115>]{:scsi_mod:scsi_register_host+117}
       [<ffffffff801528f0>]{__alloc_pages+112}
[<ffffffffa0028780>]{:megaraid2:init_this_scsi_driver+32}
       [<ffffffff801256b6>]{sys_init_module+1686} [<ffffffffa00230b8>]
       [<ffffffff801100c7>]{system_call+119}
 
Code: 0f b6 45 10 fe c0 74 f6 c6 45 10 ff 0f b6 45 40 3c 77 74 0a
 
console shuts up ...
 ERROR: /bin/insmod exited abnormally!
Loading jbd.o module
Loading ext3.o module
Mounting /proc filesystem
Creating block devices
Creating root device
 
 
 
 
[root@penguin root]#

Comment 7 Tymm Twillman 2004-03-25 18:42:15 EST
add'l note: setting IOMMU w/64M aperture doesn't seem to affect
complaint about mapping over memory/losing 64M... I assume the bios is
a bit buggy.  
Comment 8 Tymm Twillman 2004-03-25 19:58:24 EST
Created attachment 98867 [details]
log of 2.4.21-9.0.1 w/megaraid2 watchdog dump
Comment 9 Marc Mondragon 2004-06-25 17:13:30 EDT
FWIW,

I'm seeing this as well, Penguin Computing Altus 3200 8GB RAM.
Megaraid 320-2 controller.   I tried a couple of suggestions from
Penguin, setting kernel parameters acpi=off and numa=off with no
sucess.  Here is the dump:

Loading sd_mod.o module
Loading megaraid2.o module
megaraid: v2.10.1.1 (Release Date: Fri Jan 16 14:47:19 EST 2004)
megaraid: found 0x1000:0x1960:bus 2:slot 1:func 0
scsi0:Found MegaRAID controller at 0xffffff0000013000, IRQ:25
NMI Watchdog detected LOCKUP on CPU1, eip ffffffffa0024a18, registers:
CPU 1
Pid: 16, comm: insmod Not tainted
RIP: 0010:[<ffffffffa0024a18>]{:megaraid2:issue_scb_block+200}
RSP: 0018:000001000e2a7c08  EFLAGS: 00000056
RAX: 0000000000000000 RBX: 00000101febc1010 RCX: 0000000000000000
RDX: 0000000010021011 RSI: 000001000e2a7c68 RDI: 00000101febc1020
RBP: 00000101febc4100 R08: 0000000010020000 R09: 0000000000000000
R10: 00000101febc2000 R11: 0000000000000800 R12: 000001000e2a7c58
R13: 00000101febc1008 R14: 0000000000001000 R15: 0000000000001960
FS:  0000000000000000(0000) GS:ffffffff805d98c0(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005ed000 CR3: 000000001e676000 CR4: 00000000000006e0
                                                                     
          
Call Trace: [<ffffffff801ea176>]{vt_console_print+742}
       [<ffffffff80124402>]{__call_console_drivers+82}
[<ffffffff801248c9>]{printk+473}
       [<ffffffff8011e20d>]{wake_up_cpu+29}
[<ffffffff8011ed1e>]{load_balance+830}
       [<ffffffff8011ed9a>]{rebalance_tick+58}
[<ffffffff8011f143>]{scheduler_tick+659}
       [<ffffffff8011b263>]{smp_apic_timer_interrupt+291}
       [<ffffffff8010de20>]{default_idle+0}
[<ffffffff8010de20>]{default_idle+0}                                 
                                              
       [<ffffffff80110804>]{reschedule_interrupt+64}  <EOI>
[<ffffffff8011f8c2>]{thread_return+0}
       [<ffffffff8010de20>]{default_idle+0}
[<ffffffff8010de20>]{default_idle+0}                                 
                                              
       [<ffffffff8010dee0>]{cpu_idle+96}
Process insmod (pid: 16, stackpage=1000e2a7000)
Stack: 000001000e2a7c08 0000000000000018 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace: [<ffffffff801ea176>]{vt_console_print+742}
       [<ffffffff80124402>]{__call_console_drivers+82}
[<ffffffff801248c9>]{printk+473}
       [<ffffffff8011e20d>]{wake_up_cpu+29}
[<ffffffff8011ed1e>]{load_balance+830}
       [<ffffffff8011ed9a>]{rebalance_tick+58}
[<ffffffff8011f143>]{scheduler_tick+659}
       [<ffffffff8011b263>]{smp_apic_timer_interrupt+291}
       [<ffffffff8010de20>]{default_idle+0}
[<ffffffff8010de20>]{default_idle+0}                                 
                                              
       [<ffffffff80110804>]{reschedule_interrupt+64}  <EOI>
[<ffffffff8011f8c2>]{thread_return+0}
       [<ffffffff8010de20>]{default_idle+0}
[<ffffffff8010de20>]{default_idle+0}                                 
                                              
       [<ffffffff8010dee0>]{cpu_idle+96}
                                                                     
          
Code: 0f b6 43 10 fe c0 74 f6 c6 43 10 ff 0f b6 43 11 fe c0 75 0e
                                                                     
          
console shuts up ...
 ERROR: /bin/insmod exited abnormally!

I believe non-smp kernels boot fine, the smp's have the problem.  I've
also tried using the parameter nmi_watchdog=1 alone and it boots -- go
figure.  It's a dev machine so I can provide testing if it helps.

Thanks,

Marc Mondragon


Marc Mondragon
Comment 10 Philip Pokorny 2004-06-25 21:23:27 EDT
LSI has diagnosed this hang as being caused by "phantom" commands
being written to the RAID controller.  They believe this is a kernel
issue.

Red Hat, has assigned this an enterprise issue tracker and the ID is
40242.  The ID was created sometime before 6/7/2004.  But this ID
seems to only be visible inside Red Hat.
Comment 15 Jim Paradis 2004-12-13 10:26:04 EST
Tymm,

I'm revisiting some outstanding issues.  Has there been any change on this
issue, or any change in behavior with more recent update kernels?
Comment 16 Tymm Twillman 2004-12-13 14:10:50 EST
Recent (RHEL3U3) kernels do seem to work for the most part.  I believe we've
been seeing some issues with 16G machines however (12G mostly seem to be ok, I
think).

I have moved into another area of Penguin and I'm not working with the RAID
issues directly any more, however I will ask current engineers to update the bug.
Comment 17 glshank 2005-01-04 13:38:32 EST
when?
Comment 18 Jim Paradis 2005-09-07 16:17:50 EDT
Changing bug state to NEEDINFO_REPORTER.

Please let us know whether or not this problem manifests in the latest RHEL3
update.  Also, please add the current responsible individuals at Penguin to the
cc list for this bug report.
Comment 19 murray lotnicz 2006-03-14 18:42:14 EST
i'm running into the same problem with 2.4.21-37.0.1.ELsmp and 2.4.21-37.ELsmp.
the uniprocessor kernels are booting ok, i attached a dump from boot along with
output from `lspci` and `lspci -nvv`.

hardware:

tyan s2882 mobo                         bios V3.05
lsi logic megaraid 320-2                FW_1L37
2 dual-core opteron 275
16gb ram
dual broadcom gigabit nics
intel e100 nic  
Comment 20 murray lotnicz 2006-03-14 18:45:42 EST
Created attachment 126130 [details]
megaraid2 failure on boot
Comment 21 murray lotnicz 2006-03-14 18:46:55 EST
Created attachment 126131 [details]
output from lspci & lspci -nvv
Comment 22 murray lotnicz 2006-03-18 15:55:09 EST
i'm seeing the same behavior as Tymm mentioned in his last post. removing 4gb of
ram enables the machine to boot to SMP kernels and it's been stable with
bonnie++ and sysbench benchmarking. it's running rhws3_x86_64 update 6. is there
any ETA on when 16gb will be supported?
Comment 25 Peter Martuccelli 2006-12-19 14:21:11 EST
Is this still a problem on RHEL3 U8, 2.4.21-47.EL, on ia64 systems running with
> 4GB of memory?
Comment 26 Philip Pokorny 2006-12-19 14:48:32 EST
ia64 would imply Itanium.  This was an x86_64 (AMD Opteron, now also Intel
EM64T) system.

We stopped shipping LSI MegaRAID controllers due to these problems, but we're
about to test them again.  So hopefully we can get you an answer soon.

See also 64-bit DMA bug against RHEL4 (bug 194533).  Perhaps that was related to
this problem?
Comment 28 RHEL Product and Program Management 2007-01-18 10:42:08 EST
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 

Note You need to log in before you can comment on or make changes to this bug.