Bug 80023 - Kernel BUG at page_alloc.c:220!
Summary: Kernel BUG at page_alloc.c:220!
Keywords:
Status: CLOSED DUPLICATE of bug 79924
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-12-18 21:46 UTC by Paul Zimdars
Modified: 2006-02-21 18:50 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-21 18:50:24 UTC
Embargoed:


Attachments (Terms of Use)

Description Paul Zimdars 2002-12-18 21:46:51 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823
Netscape/7.0

Description of problem:
We have a 64 node cluster. We run a scientific job that heavily depends on
memory and cpu. 

Here is the uname output from a node:

Linux mach-0-0 2.4.18-17.7.xsmp #6 Tue Dec 17 16:41:44 PST 2002 i686 unknown

The error below can be caused by any process such as (bash, sh, kswapd, etc..).
I also turned off SMP and gave the test a try without a single crash. When I
turned SMP back on the nodes would start to die. We loose between 5-10 nodes out
of 64 each run and usually within the first 10-15 minutes.


Nov 22 18:51:59 mach-0-35 kernel: kernel BUG at page_alloc.c:220!
Nov 22 18:51:59 mach-0-35 kernel: invalid operand: 0000
Nov 22 18:51:59 mach-0-35 kernel: CPU:    0
Nov 22 18:51:59 mach-0-35 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 22 18:51:59 mach-0-35 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 22 18:51:59 mach-0-35 kernel: EFLAGS: 00010202
Nov 22 18:51:59 mach-0-35 kernel: eax: 00000040   ebx: c23bc8f0   ecx: 00038000
  edx: 0006942f
Nov 22 18:51:59 mach-0-35 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: efe31dcc
Nov 22 18:51:59 mach-0-35 kernel: ds: 0018   es: 0018   ss: 0018
Nov 22 18:51:59 mach-0-35 kernel: Process mlsl2 (pid: 1928, stackpage=efe31000)
Nov 22 18:51:59 mach-0-35 kernel: Stack: 00038000 0003142f 00000296 00000000
c028b128 c028b200 000001ff 00000000
Nov 22 18:51:59 mach-0-35 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000018 00104025 00000000
Nov 22 18:51:59 mach-0-35 kernel:        00000001 00000025 c0127ded 69430025
00000000 f69451c0 f61bec60 efef2118
Nov 22 18:51:59 mach-0-35 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [it_real_fn+16/80] [han
dle_mm_fault+154/288]
Nov 22 18:51:59 mach-0-35 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c011c5e0>] [<c01281da>]
Nov 22 18:51:59 mach-0-35 kernel:   [<c011d57b>] [<c011d431>] [<c012900a>]
[<c010a64d>] [<c011472a>] [<c012939b>]
Nov 22 18:51:59 mach-0-35 kernel:   [<c01293ab>] [<c010ea9e>] [<c0114570>]
[<c0108bfc>]
Nov 22 18:51:59 mach-0-35 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.We have a program that processes satellite data using PVM.
2.Tried it with and without PVM. Same results.


    

Actual Results:  5-10 nodes would die.

Expected Results:  No crash.

Additional info:

64 node cluster configuration. The drives are IDE, we used RedHat 7.2, ext3, 2
GB virtual memory and 4gb swap.

Comment 1 Paul Zimdars 2002-12-18 21:49:01 UTC
Ack sorry..ignore this one. I had the wrong window open and hit enter. Must of
recreated the same bug as # 79924

Comment 2 Dave Jones 2003-12-17 02:28:39 UTC

*** This bug has been marked as a duplicate of 79924 ***

Comment 3 Red Hat Bugzilla 2006-02-21 18:50:24 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.


Note You need to log in before you can comment on or make changes to this bug.