Bug 79924 - Kernel BUG at page_alloc.c:220!
Summary: Kernel BUG at page_alloc.c:220!
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
: 80023 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-12-18 02:39 UTC by Paul Zimdars
Modified: 2005-10-31 22:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:40:18 UTC
Embargoed:


Attachments (Terms of Use)

Description Paul Zimdars 2002-12-18 02:39:29 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823
Netscape/7.0

Description of problem:
We have a 64 node cluster. We run a scientific job that heavily depends on
memory and cpu. 

Here is the uname output from a node:

Linux mach-0-0 2.4.18-17.7.xsmp #6 Tue Dec 17 16:41:44 PST 2002 i686 unknown

The error below can be caused by any process such as (bash, sh, kswapd, etc..).
I also turned off SMP and gave the test a try without a single crash. When I
turned SMP back on the nodes would start to die. We loose between 5-10 nodes out
of 64 each run and usually within the first 10-15 minutes.


Nov 22 18:51:59 mach-0-35 kernel: kernel BUG at page_alloc.c:220!
Nov 22 18:51:59 mach-0-35 kernel: invalid operand: 0000
Nov 22 18:51:59 mach-0-35 kernel: CPU:    0
Nov 22 18:51:59 mach-0-35 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 22 18:51:59 mach-0-35 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 22 18:51:59 mach-0-35 kernel: EFLAGS: 00010202
Nov 22 18:51:59 mach-0-35 kernel: eax: 00000040   ebx: c23bc8f0   ecx: 00038000
  edx: 0006942f
Nov 22 18:51:59 mach-0-35 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: efe31dcc
Nov 22 18:51:59 mach-0-35 kernel: ds: 0018   es: 0018   ss: 0018
Nov 22 18:51:59 mach-0-35 kernel: Process mlsl2 (pid: 1928, stackpage=efe31000)
Nov 22 18:51:59 mach-0-35 kernel: Stack: 00038000 0003142f 00000296 00000000
c028b128 c028b200 000001ff 00000000
Nov 22 18:51:59 mach-0-35 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000018 00104025 00000000
Nov 22 18:51:59 mach-0-35 kernel:        00000001 00000025 c0127ded 69430025
00000000 f69451c0 f61bec60 efef2118
Nov 22 18:51:59 mach-0-35 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [it_real_fn+16/80] [han
dle_mm_fault+154/288]
Nov 22 18:51:59 mach-0-35 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c011c5e0>] [<c01281da>]
Nov 22 18:51:59 mach-0-35 kernel:   [<c011d57b>] [<c011d431>] [<c012900a>]
[<c010a64d>] [<c011472a>] [<c012939b>]
Nov 22 18:51:59 mach-0-35 kernel:   [<c01293ab>] [<c010ea9e>] [<c0114570>]
[<c0108bfc>]
Nov 22 18:51:59 mach-0-35 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.We have a program that processes satellite data using PVM.
2.Tried it with and without PVM. Same results.


    

Actual Results:  5-10 nodes would die.

Expected Results:  No crash.

Additional info:

64 node cluster configuration. The drives are IDE, we used RedHat 7.2, ext3, 2
GB virtual memory and 4gb swap.

Comment 1 Arjan van de Ven 2002-12-18 10:14:41 UTC
First of all this trace sort of looks to be from a modified kernel.
Can you attach dmesg, lsmod and lspci from such a system before it oopses?

Comment 2 Paul Zimdars 2002-12-18 12:12:40 UTC
Hi,

Well I have used my own 2.4.19 modified kernel, 2.4.18 xsmp kernel, and a
modified 2.4.18 redhat source (I removed almost everything and smp for a
test)but still had the same crashes. The only time all the nodes have not
crashed was when I disabled SMP. I could provide more errors. Another one from a
different node has been placed at the end.

Here is lsmod:
[root@mach-0-35 root]# lsmod
Module                  Size  Used by    Not tainted
[root@mach-0-35 root]#

[root@mach-0-35 root]# lspci
00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13)
00:00.1 Host bridge: ServerWorks: Unknown device 0012
00:00.2 Host bridge: ServerWorks: Unknown device 0000
00:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0d)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.3 Host bridge: ServerWorks: Unknown device 0225
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
01:03.0 Ethernet controller: BROADCOM Corporation NetXtreme BCM5701 Gigabit
Ethernet (rev 15)

Linux version 2.4.18-17.7.x (root@mach-0-0) (gcc version 2.96 20000731 (Red Hat
Linux 7.1 2.96-98)) #6 Tue Dec 17 16:41:44 PST 2002
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 000000000009f800 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000080000000 (usable)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
1152MB HIGHMEM available.
896MB LOWMEM available.
On node 0 totalpages: 524288
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 294912 pages.
Kernel command line: auto BOOT_IMAGE=bzImage ro root=303 BOOT_FILE=/boot/bzImage
console=ttyS0
Initializing CPU#0
Detected 2199.941 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4364.15 BogoMIPS
Memory: 2061592k/2097152k available (1205k kernel code, 30948k reserved, 337k
data, 236k init, 1179648k highmem)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode cache hash table entries: 131072 (order: 8, 1048576 bytes)
Mount cache hash table entries: 32768 (order: 6, 262144 bytes)
ramfs: mounted with options: <defaults>
ramfs: max_pages=258227 max_file_pages=0 max_inodes=0 max_dentries=258227
Buffer cache hash table entries: 131072 (order: 7, 524288 bytes)
Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 0K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU: Intel(R) XEON(TM) CPU 2.20GHz stepping 04
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfdba1, last bus=4
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
PCI: Discovered primary peer bus 03 [IRQ]
PCI: Discovered primary peer bus 04 [IRQ]
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
cpufreq: Intel(R) SpeedStep(TM) support $Revision: 1.34 $
cpufreq: Intel(R) SpeedStep(TM) for this chipset not (yet) available.
cpufreq: CPU#0 P4/Xeon(TM) CPU On-Demand Clock Modulation available
CPU clock: 2199.941 MHz (219.994-2199.941 MHz)
Starting kswapd
allocated 64 pages and 64 bhs reserved for the highmem bounces
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 okir.de).
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI
enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
oprofile: can't get RTC I/O Ports
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller on PCI bus 00 dev 79
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
 ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
hda: ST340016A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: 78165360 sectors (40021 MB) w/2048KiB Cache, CHS=77545/16/63, UDMA(100)
Partition check:
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 >
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<saw.com.sg> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:48:51:7E:7E, IRQ 10.
  Board assembly 000000-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0xb874c1d3).
tg3.c:v1.1 (Aug 30, 2002)
eth1: Tigon3 [partno(BCM95700A6) rev 0105 PHY(5701)] (PCIX:100MHz:64-bit)
10/100/1000BaseT Ethernet 00:30:48:51:7c:8d
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 16384 buckets, 128Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
ip_conntrack (8192 buckets, 65536 max)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 236k freed
Adding Swap: 2048216k swap-space (priority -1)
Adding Swap: 2048248k swap-space (priority -2)
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,3), internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,5), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
tg3: eth1: Link is up at 1000 Mbps, full duplex.
tg3: eth1: Flow control is off for TX and off for RX.

Nov 20 18:57:06 mach-0-51 kernel: kernel BUG at page_alloc.c:220!
Nov 20 18:57:06 mach-0-51 kernel: invalid operand: 0000
Nov 20 18:57:06 mach-0-51 kernel: CPU:    0
Nov 20 18:57:06 mach-0-51 kernel: EIP:    0010:[<c0130cdd>]    Not tainted
Nov 20 18:57:06 mach-0-51 kernel: EFLAGS: 00010202
Nov 20 18:57:06 mach-0-51 kernel: eax: 00000040   ebx: c1b39ea0   ecx: 00038000
  edx: 0003bdf8
Nov 20 18:57:06 mach-0-51 kernel: esi: c02a9a88   edi: 00048000   ebp: c1000020
  esp: f55f3e14
Nov 20 18:57:06 mach-0-51 kernel: ds: 0018   es: 0018   ss: 0018
Nov 20 18:57:06 mach-0-51 kernel: Process mlsl2 (pid: 1415, stackpage=f55f3000)
Nov 20 18:57:06 mach-0-51 kernel: Stack: 00038000 00003df8 00000286 00000000
c02a9a88 c02a9b60 000001ff 00000000
Nov 20 18:57:06 mach-0-51 kernel:        00181002 c0130f71 c02a9a88 c02a9b5c
000001d2 00002945 00000000 00000000
Nov 20 18:57:06 mach-0-51 kernel:        0c1ab98c 00181002 c0131674 00000002
00000000 00000008 f60789ac c01260dd
Nov 20 18:57:06 mach-0-51 kernel: Call Trace:    [<c0130f71>] [<c0131674>]
[<c01260dd>] [<c0126121>] [<c0126592>]
Nov 20 18:57:06 mach-0-51 kernel:   [<c01086ad>] [<c0113502>] [<c011f3c6>]
[<c011f619>] [<c011c10b>] [<c011bfc1>]
Nov 20 18:57:06 mach-0-51 kernel:   [<c011bd4b>] [<c0113350>] [<c0108bfc>]
Nov 20 18:57:06 mach-0-51 kernel: Code: 0f 0b dc 00 56 aa 26 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b

below is another node:

Nov 19 18:02:48 mach-0-55 kernel: kernel BUG at page_alloc.c:220!
Nov 19 18:02:48 mach-0-55 kernel: invalid operand: 0000
Nov 19 18:02:48 mach-0-55 kernel: CPU:    0
Nov 19 18:02:48 mach-0-55 kernel: EIP:    0010:[<c0130cdd>]    Not tainted
Nov 19 18:02:48 mach-0-55 kernel: EFLAGS: 00010202
Nov 19 18:02:48 mach-0-55 kernel: eax: 00000040   ebx: c122dea0   ecx: 00001000
  edx: 0000b9f8
Nov 19 18:02:48 mach-0-55 kernel: esi: c02a99d4   edi: 00037000   ebp: c1000020
  esp: f60dfe24
Nov 19 18:02:48 mach-0-55 kernel: ds: 0018   es: 0018   ss: 0018
Nov 19 18:02:48 mach-0-55 kernel: Process mlsl2 (pid: 1573, stackpage=f60df000)
Nov 19 18:02:48 mach-0-55 kernel: Stack: 00001000 0000a9f8 00000286 00000000
c02a99d4 c02a9b64 000003fd 00000000
Nov 19 18:02:48 mach-0-55 kernel:        f6ae4180 c0130f71 c02a9a88 c02a9b5c
000001d2 00000018 00000001 f67e4a80
Nov 19 18:02:48 mach-0-55 kernel:        00104025 f6ae4180 c012623b f67e4a80
63052000 f67e4a80 f6ae4180 c0126344
Nov 19 18:02:48 mach-0-55 kernel: Call Trace:    [<c0130f71>] [<c012623b>]
[<c0126344>] [<c0126582>] [<c0126fe9>]
Nov 19 18:02:48 mach-0-55 kernel:   [<c0113502>] [<c010ea2e>] [<c011bd4b>]
[<c0113350>] [<c0108bfc>]
Nov 19 18:02:48 mach-0-55 kernel:
Nov 19 18:02:48 mach-0-55 kernel: Code: 0f 0b dc 00 56 aa 26 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b

Comment 3 Arjan van de Ven 2002-12-18 12:17:47 UTC
well in your modified kernel you have disabled the config option that we enable
to get usable backtraces........ makes it hard to investigate you know 

Comment 4 Paul Zimdars 2002-12-18 18:44:25 UTC
Hi,

Here is some more information, do any f these help? If not I will put back the
RedHat XSMP kernel and get more info or I can enable all the kernel debugging
options?

Dec  5 01:06:01 mach-0-7 kernel: kernel BUG at page_alloc.c:220!
Dec  5 01:06:01 mach-0-7 kernel: invalid operand: 0000
Dec  5 01:06:01 mach-0-7 kernel: CPU:    0
Dec  5 01:06:01 mach-0-7 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Dec  5 01:06:01 mach-0-7 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Dec  5 01:06:01 mach-0-7 kernel: EFLAGS: 00010202
Dec  5 01:06:01 mach-0-7 kernel: eax: 00000040   ebx: c2020d70   ecx: 00038000 
 edx: 00056047
Dec  5 01:06:01 mach-0-7 kernel: esi: c028b128   edi: 00048000   ebp: c1000020 
 esp: ef16fdcc
Dec  5 01:06:01 mach-0-7 kernel: ds: 0018   es: 0018   ss: 0018
Dec  5 01:06:01 mach-0-7 kernel: Process mlsl2 (pid: 2775, stackpage=ef16f000)
Dec  5 01:06:01 mach-0-7 kernel: Stack: 00038000 0001e047 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Dec  5 01:06:01 mach-0-7 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000044 00104025 00000000 
Dec  5 01:06:01 mach-0-7 kernel:        00000001 00000025 c0127ded 66e48025
00000000 f6997400 f6a48920 ef25c0d8 
Dec  5 01:06:01 mach-0-7 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[ip_frag_create+16/192]
Dec  5 01:06:01 mach-0-7 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c0206860>]
Dec  5 01:06:01 mach-0-7 kernel:   [ip_frag_queue+516/864]
[ip_frag_queue+128/864] [eth_type_trans+115/192] [do_page_fault+442/1401]
[ip_frag_queue+128/864] [set_rx_mode+369/1504]
Dec  5 01:06:01 mach-0-7 kernel:   [<c0206b14>] [<c0206990>] [<c01fec83>]
[<c011472a>] [<c0206990>] [<c01c4b41>]
Dec  5 01:06:01 mach-0-7 kernel:   [update_wall_time+38/80] [timer_bh+73/976]
[update_process_times+48/160] [do_page_fault+0/1401] [error_code+52/60]
Dec  5 01:06:01 mach-0-7 kernel:   [<c0120836>] [<c0120a89>] [<c0120980>]
[<c0114570>] [<c0108bfc>]
Dec  5 01:06:01 mach-0-7 kernel: 
Dec  5 01:06:01 mach-0-7 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80 00
00 00 74 08 0f 0b


Nov 26 17:53:00 mach-0-39 kernel: kernel BUG at page_alloc.c:220!
Nov 26 17:53:00 mach-0-39 kernel: invalid operand: 0000
Nov 26 17:53:00 mach-0-39 kernel: CPU:    0
Nov 26 17:53:00 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 26 17:53:00 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 26 17:53:00 mach-0-39 kernel: EFLAGS: 00010202
Nov 26 17:53:00 mach-0-39 kernel: eax: 00000040   ebx: c25e5cd0   ecx: 00038000
  edx: 00074c99
Nov 26 17:53:00 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: e0507dbc
Nov 26 17:53:00 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 26 17:53:00 mach-0-39 kernel: Process mlsl2 (pid: 1441, stackpage=e0507000)
Nov 26 17:53:00 mach-0-39 kernel: Stack: 00038000 0003cc99 00000292 00000000
c028b128 c028b200 000001ff 00000000 
Nov 26 17:53:00 mach-0-39 kernel:        0005ca02 c0132f01 c028b128 c028b1fc
000001d2 0005c902 00000000 00000000 
Nov 26 17:53:00 mach-0-39 kernel:        0c0854f6 0005ca02 c0133604 00000002
00000002 00000008 00000000 c0127bfd 
Nov 26 17:53:00 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[read_swap_cache_async+116/158] [swapin_readahead+77/80] [do_swap_page+70/400]
[handle_mm_fault+180/288]
Nov 26 17:53:00 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0133604>]
[<c0127bfd>] [<c0127c46>] [<c01281f4>]
Nov 26 17:53:00 mach-0-39 kernel:   [sys_getsockname+44/128]
[sys_sendto+166/240] [ip_route_output_slow+740/1648]
[ip_route_output_slow+304/1648] [neigh_proxy_process+243/288]
[do_page_fault+442/1401]
Nov 26 17:53:00 mach-0-39 kernel:   [<c01f483c>] [<c01f49b6>] [<c0206b44>]
[<c0206990>] [<c01fec83>] [<c011472a>]
Nov 26 17:53:00 mach-0-39 kernel:   [process_timeout+0/96]
[update_wall_time+38/80] [timer_bh+73/976] [update_process_times+48/160]
[smp_apic_timer_interrupt+239/288] [do_page_fault+0/1401]
Nov 26 17:53:00 mach-0-39 kernel:   [<c01151f0>] [<c0120836>] [<c0120a89>]
[<c0120980>] [<c0112b4f>] [<c0114570>]
Nov 26 17:53:00 mach-0-39 kernel:   [error_code+52/60]
Nov 26 17:53:00 mach-0-39 kernel:   [<c0108bfc>]
Nov 26 17:53:00 mach-0-39 kernel: 
Nov 26 17:53:00 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b


Nov 27 12:53:48 mach-0-39 kernel: CPU:    0
Nov 27 12:53:48 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 12:53:48 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 12:53:48 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 12:53:48 mach-0-39 kernel: eax: 00000040   ebx: c1fc8a80   ecx: 00038000
  edx: 000542e2
Nov 27 12:53:48 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: ef9e9dcc
Nov 27 12:53:48 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 12:53:48 mach-0-39 kernel: Process mlsl2 (pid: 2040, stackpage=ef9e9000)
Nov 27 12:53:48 mach-0-39 kernel: Stack: 00038000 0001c2e2 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 12:53:48 mach-0-39 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000018 00104025 00000000 
Nov 27 12:53:48 mach-0-39 kernel:        00000001 00000025 c0127ded 442b7025
00000000 f65a5f20 f6421b60 c4fc7848 
Nov 27 12:53:48 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[ip_route_output_slow+0/1648]
Nov 27 12:53:48 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c0206860>]
Nov 27 12:53:48 mach-0-39 kernel:   [ip_route_output_slow+692/1648]
[ip_route_output_slow+304/1648] [neigh_proxy_process+243/288]
[do_page_fault+442/1401] [update_process_times+48/160] [sys_brk+202/240]
Nov 27 12:53:48 mach-0-39 kernel:   [<c0206b14>] [<c0206990>] [<c01fec83>]
[<c011472a>] [<c0120980>] [<c012874a>]
Nov 27 12:53:48 mach-0-39 kernel:   [do_page_fault+0/1401] [error_code+52/60]
Nov 27 12:53:48 mach-0-39 kernel:   [<c0114570>] [<c0108bfc>]
Nov 27 12:53:48 mach-0-39 kernel: 
Nov 27 12:53:48 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 
Nov 27 12:53:49 mach-0-39 kernel:  kernel BUG at page_alloc.c:220!
Nov 27 12:53:49 mach-0-39 kernel: invalid operand: 0000
Nov 27 12:53:49 mach-0-39 kernel: CPU:    0
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 12:53:49 mach-0-39 kernel: eax: 00000040   ebx: c1ccbfc0   ecx: 00038000
  edx: 000443fe
Nov 27 12:53:49 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: ef773dd0
Nov 27 12:53:49 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 12:53:49 mach-0-39 kernel: Process pvmd3 (pid: 1993, stackpage=ef773000)
Nov 27 12:53:49 mach-0-39 kernel: Stack: 00038000 0000c3fe 00000282 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        0016f502 c0132f01 c028b128 c028b1fc 
000001d2 0016f502 00000000 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        0c197ff6 0016f502 c0133604 0016f502
00000000 080876c4 00000000 c0127c4c 
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[read_swap_cache_async+116/158] [do_swap_page+76/400] [handle_mm_fault+180/288]
[do_page_fault+442/1401]
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0133604>]
[<c0127c4c>] [<c01281f4>] [<c011472a>]
Nov 27 12:53:49 mach-0-39 kernel:   [copy_page_range+397/624]
[build_mmap_rb+84/96] [do_fork+1746/2048] [sys_close+4/112]
[do_page_fault+0/1401] [error_code+52/60]
Nov 27 12:53:49 mach-0-39 kernel:   [<c01267bd>] [<c0129994>] [<c0117e02>]
[<c0139784>] [<c0114570>] [<c0108bfc>]
Nov 27 12:53:49 mach-0-39 kernel: 
Nov 27 12:53:49 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 
Nov 27 12:53:49 mach-0-39 kernel:  kernel BUG at page_alloc.c:220!
Nov 27 12:53:49 mach-0-39 kernel: invalid operand: 0000
Nov 27 12:53:49 mach-0-39 kernel: CPU:    0
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 12:53:49 mach-0-39 kernel: eax: 00000040   ebx: c20c4c60   ecx: 00038000
  edx: 000596ec
Nov 27 12:53:49 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: e246ddcc
Nov 27 12:53:49 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 12:53:49 mach-0-39 kernel: Process mlsl2 (pid: 2047, stackpage=e246d000)
Nov 27 12:53:49 mach-0-39 kernel: Stack: 00038000 000216ec 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 0000055e 00104025 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        00000001 00000025 c0127ded 00000000
f65a5f20 f65a5f20 f6421aa0 ef7e8008 
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[svcauth_null+192/240]
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c0246120>]
Nov 27 12:53:49 mach-0-39 kernel:   [__vma_link+116/192]
[do_page_fault+442/1401] [do_mmap_pgoff+1220/1392] [blk_ioctl+407/1184]
[old_mmap+238/304] [do_page_fault+0/1401]
Nov 27 12:53:49 mach-0-39 kernel:   [<c0128854>] [<c011472a>] [<c0128e74>]
[<c01bb4d7>] [<c010ea9e>] [<c0114570>]
Nov 27 12:53:49 mach-0-39 kernel:   [error_code+52/60]
Nov 27 12:53:49 mach-0-39 kernel:   [<c0108bfc>]
Nov 27 12:53:49 mach-0-39 kernel: 
Nov 27 12:53:49 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 

Nov 27 13:00:00 mach-0-39 kernel:  kernel BUG at page_alloc.c:220!
Nov 27 13:00:00 mach-0-39 kernel: invalid operand: 0000
Nov 27 13:00:00 mach-0-39 kernel: CPU:    0
Nov 27 13:00:00 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 13:00:00 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 13:00:00 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 13:00:00 mach-0-39 kernel: eax: 00000040   ebx: c2210cd0   ecx: 00038000
  edx: 00060599
Nov 27 13:00:00 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: e246ddcc
Nov 27 13:00:00 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 13:00:00 mach-0-39 kernel: Process sh (pid: 2049, stackpage=e246d000)
Nov 27 13:00:00 mach-0-39 kernel: Stack: 00038000 00028599 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 13:00:00 mach-0-39 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000132 00104025 00000000 
Nov 27 13:00:00 mach-0-39 kernel:        00000001 00000025 c0127ded 00000000
f4ebe3c0 f4ebe3c0 f63ee920 f53f1af8 
Nov 27 13:00:00 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[sys_munmap+2/80]
Nov 27 13:00:00 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c01296c2>]
Nov 27 13:00:00 mach-0-39 kernel:   [__vma_link+116/192]
[do_page_fault+442/1401] [do_mmap_pgoff+1220/1392] [zap_page_range+945/1056]
[unmap_fixup+115/352] [sys_munmap+2/80]
Nov 27 13:00:00 mach-0-39 kernel:   [<c0128854>] [<c011472a>] [<c0128e74>]
[<c0126c51>] [<c01292c3>] [<c01296c2>]
Nov 27 13:00:00 mach-0-39 kernel:   [sys_close+4/112] [sys_munmap+67/80]
[do_page_fault+0/1401] [error_code+52/60]
Nov 27 13:00:00 mach-0-39 kernel:   [<c0139784>] [<c0129703>] [<c0114570>]
[<c0108bfc>]
Nov 27 13:00:00 mach-0-39 kernel: 
Nov 27 13:00:00 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 



Comment 5 Paul Zimdars 2002-12-19 00:17:07 UTC
Hi,

I was just wondering if the output above helped any. I went into the kernel and
turned on all the debugging information under the kernel debug section. I
enabled smp and lost 4 seperate nodes now. I will bring them back up and see
what information they can provide me. 

Comment 6 Mike McLean 2003-01-02 17:28:00 UTC
This bug has been inappropriately marked MODIFIED. Please review the bug life
cycle information at 
http://bugzilla.redhat.com/bugzilla/bug_status.cgi


Comment 7 Paul Zimdars 2003-01-08 18:55:44 UTC
Hi,

I was wondering if anyone has been able to respond???

Thanks,

Pauld


Comment 8 Paul Zimdars 2003-01-17 08:34:57 UTC
Hi,

After looking through all the logs I noticed this on each machine that is 
common:

..MP-BIOS bug: 8254 timer not connected to IO-APIC

[root@mach-0-30 log]# cat /proc/interrupts
           CPU0       CPU1
  0:     288851          0    IO-APIC-edge  timer
  1:          2          0    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  4:        207          0    IO-APIC-edge  serial
  8:          1          0    IO-APIC-edge  rtc
 14:       8073          0    IO-APIC-edge  ide0
 30:      53277          0   IO-APIC-level  eth0
 31:     161010          0   IO-APIC-level  eth1
NMI:          0          0
LOC:     288532     288541
ERR:          0
MIS:          0
[root@mach-0-30 log]#

[root@mach-0-30 log]# uname -a
Linux mach-0-30 2.4.19 #7 SMP Thu Dec 12 13:49:51 PST 2002 i686 unknown
[root@mach-0-30 log]#



Comment 9 Need Real Name 2003-04-22 00:04:35 UTC
This happened to me over the weekend.  Is there anything else I can provide to help?

[root@kmc2 log]# ksymoops -k ./ksyms.3 < ./oops.txt 
ksymoops 2.4.1 on i686 2.4.7-10enterprise.  Options used
     -V (default)
     -k ./ksyms.3 (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.7-10enterprise/ (default)
     -m /boot/System.map-2.4.7-10enterprise (default)

Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
ksymoops: No such file or directory
Warning (compare_ksyms_lsmod): module 3c59x is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module appletalk is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module eepro100 is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module ipx is in lsmod but not in ksyms, probably
no symbols exported
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says
c01c09e0, System.map says c0160900.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol sd  , sd_mod says f881cce4,
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/sd_mod.o says f881cba0. 
Ignoring /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/sd_mod.o entry
Warning (compare_maps): mismatch on symbol proc_scsi  , scsi_mod says f8818088,
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f8816910. 
Ignoring /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_devicelist  , scsi_mod says
f88180b4, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says
f881693c.  Ignoring
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hostlist  , scsi_mod says
f88180b0, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says
f8816938.  Ignoring
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hosts  , scsi_mod says f88180b8,
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f8816940. 
Ignoring /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_logging_level  , scsi_mod says
f8818084, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says
f881690c.  Ignoring
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
 kernel BUG at page_alloc.c:220!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c013620a>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010086
eax: 00000020   ebx: c0263540   ecx: c02616dc   edx: 12f58a7a
esi: c0263540   edi: 00000000   ebp: 00000000   esp: e6101da8
ds: 0018   es: 0018   ss: 0018
Process sh (pid: 5906, stackpage=e6101000)
Stack: c02490a3 000000dc 00000000 00000283 c0263964 00000000 c0263540 c0263540
       c0263a24 00000000 000000d2 c01365c4 00000001 000000d2 dcbf5220 00000000
       c99bd464 c01367df 000000d2 00000000 c0263a20 d8f030c0 dcbf5220 00104000
Call Trace: [<c02490a3>] [<c01365c4>] [<c01367df>] [<c0129d75>] [<c012a973>]
   [<c01172c0>] [<c0117466>] [<c0125443>] [<c01172c0>] [<c0107268>]
Code: 0f 0b 59 8b 56 08 5b 89 d3 8b 53 04 8b 03 89 50 04 89 02 ff

>>EIP; c013620a <rmqueue+7a/300>   <=====
Trace; c02490a3 <call_spurious_interrupt+1eaca/24d47>
Trace; c01365c4 <_wrapped_alloc_pages+74/280>
Trace; c01367df <__alloc_pages+f/a0>
Trace; c0129d75 <do_wp_page+1b5/410>
Trace; c012a973 <handle_mm_fault+103/150>
Trace; c01172c0 <do_page_fault+0/540>
Trace; c0117466 <do_page_fault+1a6/540>
Trace; c0125443 <sys_rt_sigaction+93/f0>
Trace; c01172c0 <do_page_fault+0/540>
Trace; c0107268 <error_code+38/40>
Code;  c013620a <rmqueue+7a/300>
00000000 <_EIP>:
Code;  c013620a <rmqueue+7a/300>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c013620c <rmqueue+7c/300>
   2:   59                        pop    %ecx
Code;  c013620d <rmqueue+7d/300>
   3:   8b 56 08                  mov    0x8(%esi),%edx
Code;  c0136210 <rmqueue+80/300>
   6:   5b                        pop    %ebx
Code;  c0136211 <rmqueue+81/300>
   7:   89 d3                     mov    %edx,%ebx
Code;  c0136213 <rmqueue+83/300>
   9:   8b 53 04                  mov    0x4(%ebx),%edx
Code;  c0136216 <rmqueue+86/300>
   c:   8b 03                     mov    (%ebx),%eax
Code;  c0136218 <rmqueue+88/300>
   e:   89 50 04                  mov    %edx,0x4(%eax)
Code;  c013621b <rmqueue+8b/300>
  11:   89 02                     mov    %eax,(%edx)
Code;  c013621d <rmqueue+8d/300>
  13:   ff 00                     incl   (%eax)

 kernel BUG at page_alloc.c:220!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c013620a>]
EFLAGS: 00010086
Warning (Oops_read): Code line not seen, dumping what data is available

>>EIP; c013620a <rmqueue+7a/300>   <=====


12 warnings and 3 errors issued.  Results may not be reliable.
[root@kmc2 log]#

Comment 10 Dave Jones 2003-12-17 02:28:41 UTC
*** Bug 80023 has been marked as a duplicate of this bug. ***

Comment 11 Bugzilla owner 2004-09-30 15:40:18 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.