97581 – (HIGHMEM)machines hangs during multinode mpi job (seems to be in kswapd)

Bug 97581 - (HIGHMEM)machines hangs during multinode mpi job (seems to be in kswapd)

Summary: (HIGHMEM)machines hangs during multinode mpi job (seems to be in kswapd)

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-06-17 22:29 UTC by jdavis
Modified:	2007-04-18 16:54 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:41:09 UTC
Embargoed:

Attachments	(Terms of Use)

Description jdavis 2003-06-17 22:29:24 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020918

Description of problem:
dell precision 450N, running redhat 7.3, latest errata kernel
2.4.20-18.7smp,
dual 2.8Ghz cpus, 2GB DDR memory, ide disks, intel 10/100/1000 nic running
at 100meg/full duplex (use e1000 driver). Computers are brand new and
we have run dell diagnostics without errors. have run memtest for 30 minutes

use redhats kernel, no re-compilation

glibc-2.2.5-43

most critical rpm updates have been applied 

running mpi jobs with heavy floating point, use pgi fortran/c++
compilers

if you have any ideas, please cc jdavis in prompt


have similar behavior on several precision 450n but not always able
to get ksymoops
Jun 16 18:44:26 wolf65 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000004
Jun 16 18:44:26 wolf65 kernel: c013d8d4
Jun 16 18:44:26 wolf65 kernel: *pde = 00000000
Jun 16 18:44:26 wolf65 kernel: Oops: 0002
Jun 16 18:44:26 wolf65 kernel: CPU:    0
Jun 16 18:44:26 wolf65 kernel: EIP:    0010:[<c013d8d4>]    Not
tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jun 16 18:44:26 wolf65 kernel: EFLAGS: 00010046
Jun 16 18:44:26 wolf65 kernel: eax: 00000000   ebx: c102c250   ecx:
00000000   edx: 00000000
Jun 16 18:44:26 wolf65 kernel: esi: 00001000   edi: c1000030   ebp:
c030c0a0   esp: c46fbed8
Jun 16 18:44:26 wolf65 kernel: ds: 0018   es: 0018   ss: 0018
Jun 16 18:44:26 wolf65 kernel: Process kswapd (pid: 7,
stackpage=c46fb000)
Jun 16 18:44:26 wolf65 kernel: Stack: c102c288 c1000030 c030d270
00000246 ffffffff 00000c9d 0000064e 00000000
Jun 16 18:44:26 wolf65 kernel:        c102c288 00000008 c030c0a0
c013b927 c102c288 000001f4 c46fa000 00000006
Jun 16 18:44:26 wolf65 kernel:        000000f9 c92f6600 ffffffff
c015a1ec c92f6600 00000100 c030c0a0 000001d0
Jun 16 18:44:26 wolf65 kernel: Call Trace:   [<c013b927>]
rebalance_laundry_zone [kernel] 0x157 (0xc46fbf04))
Jun 16 18:44:26 wolf65 kernel: [<c015a1ec>] dput [kernel] 0x1c
(0xc46fbf24))
Jun 16 18:44:26 wolf65 kernel: [<c013c18f>] rebalance_inactive_zone
[kernel] 0x32f (0xc46fbf3c))
Jun 16 18:44:26 wolf65 kernel: [<c013c1ed>] rebalance_inactive
[kernel] 0x3d (0xc46fbf6c))
Jun 16 18:44:26 wolf65 kernel: [<c013c321>]
do_try_to_free_pages_kswapd [kernel] 0x31 (0xc46fbf90))
Jun 16 18:44:26 wolf65 kernel: [<c013c7d1>] kswapd [kernel] 0x141
(0xc46fbfd4))
Jun 16 18:44:26 wolf65 kernel: [<c0105000>] stext [kernel] 0x0
(0xc46fbfe8))
Jun 16 18:44:26 wolf65 kernel: [<c0107266>] arch_kernel_thread
[kernel] 0x26 (0xc46fbff0))
Jun 16 18:44:26 wolf65 kernel: [<c013c690>] kswapd [kernel] 0x0
(0xc46fbff8))
Jun 16 18:44:26 wolf65 kernel: Code: 89 50 04 89 02 c7 03 00 00 00 00
c7 43 04 00 00 00 00 d1 64
>>EIP; c013d8d4 <__free_pages_ok+384/420>   <=====
Trace; c013b927 <rebalance_laundry_zone+157/590>
Trace; c015a1ec <dput+1c/170>
Trace; c013c18f <rebalance_inactive_zone+32f/350>
Trace; c013c1ed <rebalance_inactive+3d/80>
Trace; c013c321 <do_try_to_free_pages_kswapd+31/310>
Trace; c013c7d1 <kswapd+141/4e0>
Trace; c0105000 <_stext+0/0>
Trace; c0107266 <arch_kernel_thread+26/30>
Trace; c013c690 <kswapd+0/4e0>
Code;  c013d8d4 <__free_pages_ok+384/420>
00000000 <_EIP>:
Code;  c013d8d4 <__free_pages_ok+384/420>   <=====
   0:   89 50 04                  mov    %edx,0x4(%eax)   <=====
Code;  c013d8d7 <__free_pages_ok+387/420>
   3:   89 02                     mov    %eax,(%edx)
Code;  c013d8d9 <__free_pages_ok+389/420>
   5:   c7 03 00 00 00 00         movl   $0x0,(%ebx)
Code;  c013d8df <__free_pages_ok+38f/420>
   b:   c7 43 04 00 00 00 00      movl   $0x0,0x4(%ebx)
Code;  c013d8e6 <__free_pages_ok+396/420>
  12:   d1 64 00 00               shll   0x0(%eax,%eax,1)

Jun 16 22:38:59 wolf65 kernel: 1151MB HIGHMEM available.
Jun 16 22:38:59 wolf65 kernel: CPU 0 (0x0000) enabledProcessor #0
Pentium 4(tm) XEON(tm) APIC version 16
Jun 16 22:38:59 wolf65 kernel: CPU 1 (0x0200) enabledProcessor #2
Pentium 4(tm) XEON(tm) APIC version 16
Jun 16 22:38:59 wolf65 kernel: CPU 2 (0x0100) enabledProcessor #1
Pentium 4(tm) XEON(tm) APIC version 16
Jun 16 22:38:59 wolf65 kernel: CPU 3 (0x0300) enabledProcessor #3
Pentium 4(tm) XEON(tm) APIC version 16
Jun 16 22:38:59 wolf65 kernel: cpu: 0, clocks: 1328987, slice: 265797
Jun 16 22:38:59 wolf65 kernel: cpu: 1, clocks: 1328987, slice: 265797
Jun 16 22:38:59 wolf65 kernel: cpu: 3, clocks: 1328987, slice: 265797
Jun 16 22:38:59 wolf65 kernel: cpu: 2, clocks: 1328987, slice: 265797
Jun 16 22:38:59 wolf65 kernel: e1000: eth0 NIC Link is Up 100 Mbps
Full Duplex
Jun 17 10:21:03 wolf65 kernel: 1151MB HIGHMEM available.

i had 9 of 80 systems exhibit this problem. only 1 got oops.
another one i was able to alt-sysreq-t, -m , -p and get
information that looked very much like oops

i have seen problem on various redhat errata kernels, seems like it might
be better since i downloaded and installed intels latest e1000 driver 5.1.11
which is not in latest redhat kernels

thanks if any help and i'll be glad to post more information

Jeff Davis
jdavis


Version-Release number of selected component (if applicable):
kernel-smp-2.4.20-18.7

How reproducible:
Sometimes

Steps to Reproduce:
1.run multinode mpi job
2.one or more nodes hang
3.nodes responds to pings but can't login by any means including console
4.have to power cycle affected machines
    

Actual Results:  power cycle affected machines

Additional info:

Comment 1 Terje Marthinussen 2003-06-19 16:32:55 UTC

Might be a different issue, but I experience similar symptoms on a brand new PC 
with an asustek P4P800 motherboard.

Same kernel but 2GB of memory and 2.8GHZ HT cpu.

For me it occurs during load (some network, a fair bit CPU, and potentially 
very heavy IO load).

I've done extensive IO stress and the system has been stable. Disk IO seems to 
be the troublemaker.

I have another idential machine where I haven't seen the problem yet, however 
this is not doing similarily stressfull tasks (I might try it there as well 
however).

I first thought this was due to the main chunk of IO taking place on a 2 disk 
raid0  on the serial ATA adapter on this motherboard.

I've later tried with a Promise SATA 150 TX4 card and finally, I've eventually 
managed to provoke it when using the built in IDE controller (just happened)

I've got one oops which I cannot reach right now as the machine is in a 
physically different location, but I'll try to get it.

It is different from the reported here. However it occurs when executing a user 
process that does a lot of disk IO (kswapd is probably doing a lot of IO here 
as well)

It occurs more often when doing IO on raid0 (linux softraid) vs. a single disk.

The end state is the same as reported.
The machine will continue to answer pings but it is not possible to log in or 
do anything on the console and the machine has to be reset/powercylced.

Might take anything from 2 minutes to a day or 2 before this happens.

Comment 2 Terje Marthinussen 2003-06-19 16:34:50 UTC

To add a bit more. The previous time this happened, it wiped out the softraid 
running on the promise controller (that is, md could not recognize that there 
was a raid there anymore.

I'll try to get the single oops I've got tomorrow.

Comment 3 Terje Marthinussen 2003-06-19 16:45:41 UTC

Ok, came to think about one more thing (last update until tomorrow, promise, I 
know this is bad bug reporting of me)...

This only seem to happen when doing operations that involves a lot of writing.

I doing some basic search engine testing on this machine. I can stress it 
extensively on searching, and it will stay stable for days.

If I do index updates, it will usually die withing hours if I do those on the 
raid0. 

I've only managed to make the same happen 1 time on the /drive (regular PATA).

I don't have any experience with debugging Linux kernel issues (my experience 
is mainly with *BSD and that is a few years ago now...). If you have 
information you need to help solve this, tell me what and how, and I'll try to 
get that.

Comment 4 Terje Marthinussen 2003-06-21 19:19:07 UTC

Sorry, didn't find the Oops from earlier again. However, have tried a few 
things.

First returned to the RH 7.3 release kernel. That very quickly crashed with the 
error reported in http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=64107#c4

So, jumped to the other end of the kernels and compiled 2.4.21-rc8.

This kernel has been running for a couple of days without any hickup, however 
things went very slow and I realized that I had forgot to compile it with 
highmem support (2 GB of memory on this machine).

A few hours ago, I recompiled 2.4.21-rc8 with highmem (4GB option) support and 
highmen I/O support.

3 hours later the machine crashed again.
I cannot get the ksym as I'm not at machine location anymore, but I hooked a 
serial console on it and that currently shows:

 kernel BUG at vmscan.c:358!                  
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012fd1e>]    Not tainted
EFLAGS: 00010202
eax: 00000040   ebx: 00000000   ecx: c2461800   edx: c286e000
esi: c24617e4   edi: 00000006   ebp: 00012f62   esp: c286ff34
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 5, stackpage=c286f000)
Stack: 00000046 c286e000 00000166 000001d0 c02fd1e8 c286e000 00000000 00000000 
       00000000 00000020 000001d0 00000006 00000020 c0130182 00000006 00000000 
       c02fd1e8 00000006 000001d0 c02fd1e8 00000000 c01301ec 00000020 c02fd1e8 
Call Trace:    [<c0130182>] [<c01301ec>] [<c01302ff>] [<c0130376>] [<c01304b1>]
  [<c0130410>] [<c0105000>] [<c0105716>] [<c0130410>]

Code: 0f 0b 66 01 b8 87 2a c0 8b 41 fc a9 80 00 00 00 74 08 0f 0b 

Unfortunately I didn't compile it with kernel debugger. 
Guess I have to add that this time.

Comment 5 jdavis 2003-06-23 14:52:46 UTC

e1000 driver update didn't appear to make a difference

had 3 servers experience problem this weekend

Comment 6 Terje Marthinussen 2003-07-02 04:34:25 UTC

Well, made a little detour into AAC kernels to see how they behaved (was
recommended to try that).

I've kept the machine under low load last week as I needed to keep it running
but did not need to put so much load on it.

However, yesterday I had to update some data, and voila
kernel BUG at vmscan.c:358!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012ffce>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202 
eax: 00000040   ebx: 00000000   ecx: e7e08e30   edx: c286e000 
esi: e7e08e14   edi: 00000017   ebp: 0000f70e   esp: c286ff34 
ds: 0018   es: 0018   ss: 0018  
Process kswapd (pid: 5, stackpage=c286f000)   
Stack: f7283c00 c286e000 00000200 000001d0 c02fdae8 c286e000 00000000 00000000
       00000000 00000020 000001d0 00000006 00000020 c0130432 00000006 00000004 
       c02fdae8 00000006 000001d0 c02fdae8 00000000 c013049c 00000020 c02fdae8
Call Trace:    [<c0130432>] [<c013049c>] [<c01305af>] [<c0130626>] [<c0130761>]
  [<c01306c0>] [<c0105000>] [<c0105716>] [<c01306c0>]
Code: 0f 0b 66 01 b8 90 2a c0 8b 41 fc a9 80 00 00 00 74 08 0f 0b 

>>EIP; c012ffce <END_OF_CODE+3ce226cf/????>   <=====
Trace; c0130432 <END_OF_CODE+3ce22b33/????>
Trace; c013049c <END_OF_CODE+3ce22b9d/????>
Trace; c01305af <END_OF_CODE+3ce22cb0/????>
Trace; c0130626 <END_OF_CODE+3ce22d27/????>
Trace; c0130761 <END_OF_CODE+3ce22e62/????>
Trace; c01306c0 <END_OF_CODE+3ce22dc1/????>
Trace; c0105000 <END_OF_CODE+3cdf7701/????>
Trace; c0105716 <END_OF_CODE+3cdf7e17/????>
Trace; c01306c0 <END_OF_CODE+3ce22dc1/????>
Code;  c012ffce <END_OF_CODE+3ce226cf/????>
00000000 <_EIP>:
Code;  c012ffce <END_OF_CODE+3ce226cf/????>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012ffd0 <END_OF_CODE+3ce226d1/????>
   2:   66 01 b8 90 2a c0 8b      add    %di,0x8bc02a90(%eax)
Code;  c012ffd7 <END_OF_CODE+3ce226d8/????>
   9:   41                        inc    %ecx
Code;  c012ffd8 <END_OF_CODE+3ce226d9/????>
   a:   fc                        cld    
Code;  c012ffd9 <END_OF_CODE+3ce226da/????>
   b:   a9 80 00 00 00            test   $0x80,%eax
Code;  c012ffde <END_OF_CODE+3ce226df/????>
  10:   74 08                     je     1a <_EIP+0x1a> c012ffe8
<END_OF_CODE+3ce226e9/????>
Code;  c012ffe0 <END_OF_CODE+3ce226e1/????>
  12:   0f 0b                     ud2a   

Not sure what that mean.

This OOPS is somewhat different from the original one on 2.4.20-18 however, so
I'll compile the redhat kernel with serial support and try to reproduce that oops.

Comment 7 Terje Marthinussen 2003-07-03 06:13:20 UTC

FYI, the previous crash wash with 2.4.21-rc8aa1

Back at 2.4.20-18.7smp since then.
Crash today. Got the following.

Jul  3 12:48:08 sashimi kernel: Page has mapping still set. This is a serious
situation. However if you 
Jul  3 12:48:08 sashimi kernel: are using the NVidia binary only module please
report this bug to 
Jul  3 12:48:08 sashimi kernel: NVidia and not to the linux kernel mailinglist.
(Not sure what this does here. The machine has ATI graphics.
Jul  3 12:48:08 sashimi kernel: ------------[ cut here ]------------
Jul  3 12:48:08 sashimi kernel: kernel BUG at page_alloc.c:114!
Jul  3 12:48:08 sashimi kernel: invalid operand: 0000
Jul  3 12:48:08 sashimi kernel: sg 3c2000 raid0 sym53c8xx sd_mod scsi_mod ext3 jbd  
Jul  3 12:48:08 sashimi kernel: CPU:    0
Jul  3 12:48:08 sashimi kernel: EIP:    0010:[<c013d5c7>]    Not tainted
Jul  3 12:48:08 sashimi kernel: EFLAGS: 00010296
Jul  3 12:48:08 sashimi kernel: 
Jul  3 12:48:08 sashimi kernel: EIP is at __free_pages_ok [kernel] 0x77
(2.4.20-18.7smp)
Jul  3 12:48:08 sashimi kernel: eax: 00000033   ebx: c12e8e90   ecx: c2da2000  
edx: 00000001
Jul  3 12:48:08 sashimi kernel: esi: 00000000   edi: c030d310   ebp: dd7ff3b4  
esp: c44c5ec4
Jul  3 12:48:08 sashimi kernel: ds: 0018   es: 0018   ss: 0018
Jul  3 12:48:08 sashimi kernel: Process kswapd (pid: 4, stackpage=c44c5000)
Jul  3 12:48:08 sashimi kernel: Stack: c0254fa0 c0254f40 c0254ee0 f243cb20
c12e8e90 c014a903 c44bce00 c12e8e90 
Jul  3 12:48:08 sashimi kernel:        000001d0 c014895f c030d310 c12e8e90
c030d310 00000000 c0139d15 c12e8e90 
Jul  3 12:48:08 sashimi kernel:        000001d0 00000001 000000ec c030d310
c030e4c8 00000013 c013bdfa c030d310 
Jul  3 12:48:08 sashimi kernel: Call Trace:   [<c014a903>] try_to_free_buffers
[kernel] 0xd3 (0xc44c5ed8))
Jul  3 12:48:08 sashimi kernel: [<c014895f>] try_to_release_page [kernel] 0x2f
(0xc44c5ee8))
Jul  3 12:48:08 sashimi kernel: [<c0139d15>] launder_page [kernel] 0x8b5
(0xc44c5efc))
Jul  3 12:48:08 sashimi kernel: [<c013bdfa>] rebalance_dirty_zone [kernel] 0x9a
(0xc44c5f1c))
Jul  3 12:48:08 sashimi kernel: [<c013c07e>] rebalance_inactive_zone [kernel]
0x21e (0xc44c5f3c))
Jul  3 12:48:08 sashimi kernel: [<c013c1ed>] rebalance_inactive [kernel] 0x3d
(0xc44c5f6c))
Jul  3 12:48:08 sashimi kernel: [<c013c321>] do_try_to_free_pages_kswapd
[kernel] 0x31 (0xc44c5f90))
Jul  3 12:48:08 sashimi kernel: [<c013c7d1>] kswapd [kernel] 0x141 (0xc44c5fd4))
Jul  3 12:48:08 sashimi kernel: [<c0105000>] stext [kernel] 0x0 (0xc44c5fe8))
Jul  3 12:48:08 sashimi kernel: [<c0107266>] arch_kernel_thread [kernel] 0x26
(0xc44c5ff0))
Jul  3 12:48:08 sashimi kernel: [<c013c690>] kswapd [kernel] 0x0 (0xc44c5ff8))
Jul  3 12:48:08 sashimi kernel: 
Jul  3 12:48:08 sashimi kernel: 
Jul  3 12:48:08 sashimi kernel: Code: 0f 0b 72 00 e8 56 25 c0 83 c4 0c 8b 3d b0
84 3b c0 89 d8 29 


Ksymoops adds
>>EIP; c013d5c7 <__free_pages_ok+77/420>   <=====
Trace; c014a903 <try_to_free_buffers+d3/150>
Trace; c014895f <try_to_release_page+2f/50>
Trace; c0139d15 <launder_page+8b5/990>
Trace; c013bdfa <rebalance_dirty_zone+9a/100>
Trace; c013c07e <rebalance_inactive_zone+21e/350>
Trace; c013c1ed <rebalance_inactive+3d/80>
Trace; c013c321 <do_try_to_free_pages_kswapd+31/310>
Trace; c013c7d1 <kswapd+141/4e0>
Trace; c0105000 <_stext+0/0>
Trace; c0107266 <arch_kernel_thread+26/30>
Trace; c013c690 <kswapd+0/4e0>
Code;  c013d5c7 <__free_pages_ok+77/420>
00000000 <_EIP>:
Code;  c013d5c7 <__free_pages_ok+77/420>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c013d5c9 <__free_pages_ok+79/420>
   2:   72 00                     jb     4 <_EIP+0x4> c013d5cb
<__free_pages_ok+7b/420>
Code;  c013d5cb <__free_pages_ok+7b/420>
   4:   e8 56 25 c0 83            call   83c0255f <_EIP+0x83c0255f> 43d3fb26
Before first symbol
Code;  c013d5d0 <__free_pages_ok+80/420>
   9:   c4 0c 8b                  les    (%ebx,%ecx,4),%ecx
Code;  c013d5d3 <__free_pages_ok+83/420>
   c:   3d b0 84 3b c0            cmp    $0xc03b84b0,%eax
Code;  c013d5d8 <__free_pages_ok+88/420>
  11:   89 d8                     mov    %ebx,%eax
Code;  c013d5da <__free_pages_ok+8a/420>
  13:   29 00                     sub    %eax,(%eax)

is there any easy way to get a debugger on linux without applying some 3rd part
patches? Like on *BSD?

Comment 8 Terje Marthinussen 2003-07-08 19:05:23 UTC

a 2.4.21 oops I had lying around if it is of any help. Its not a RH kernel, but
might always be helpfull

Haven't really tried much more neither for debugging or testing.
Installed 2.5.74 just to see what happened. No crash so far, but I've only
loaded it hard for the last 24 hours, so its early to say for sure.

However, except for a single occurance while running 2.4.21-rc8aa1, I've so far
never managed to have this system running under this load for 24 hours, so this
looks very promising.

Had to update modutils, glibc and change network adapter as the onboard 3com is
not supported, but otherwise the system is the same on 2.5.74

Could of course be the 3com chip which is causing the trouble but I don't see
anything indicating that this would be the case, so I hope this can rule out HW
as the problem.

Terje

Warning (compare_maps): ksyms_base symbol create_bounce_R__ver_create_bounce not
 found in System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol highmem_start_page_R__ver_highmem_star
t_page not found in System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol kmap_high_R__ver_kmap_high not found i
n System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol kmap_prot_R__ver_kmap_prot not found i
n System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol kmap_pte_R__ver_kmap_pte not found in 
System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol kunmap_high_R__ver_kunmap_high not fou
nd in System.map.  Ignoring ksyms_base entry
kernel BUG at vmscan.c:358!                  
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012fd1e>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000040   ebx: 00000000   ecx: c2461800   edx: c286e000
esi: c24617e4   edi: 00000006   ebp: 00012f62   esp: c286ff34
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 5, stackpage=c286f000)
Stack: 00000046 c286e000 00000166 000001d0 c02fd1e8 c286e000 00000000 00000000 
       00000000 00000020 000001d0 00000006 00000020 c0130182 00000006 00000000 
       c02fd1e8 00000006 000001d0 c02fd1e8 00000000 c01301ec 00000020 c02fd1e8 
Call Trace:    [<c0130182>] [<c01301ec>] [<c01302ff>] [<c0130376>] [<c01304b1>]
  [<c0130410>] [<c0105000>] [<c0105716>] [<c0130410>]
Code: 0f 0b 66 01 b8 87 2a c0 8b 41 fc a9 80 00 00 00 74 08 0f 0b 

>>EIP; c012fd1e <swap_out+2ce/4b0>   <=====
Trace; c0130182 <shrink_cache+282/3c0>
Trace; c01301ec <shrink_cache+2ec/3c0>
Trace; c01302ff <refill_inactive+3f/120>
Trace; c0130376 <refill_inactive+b6/120>
Trace; c01304b1 <try_to_free_pages_zone+51/60>
Trace; c0130410 <shrink_caches+30/80>
Trace; c0105000 <_stext+0/0>
Trace; c0105716 <arch_kernel_thread+26/30>
Trace; c0130410 <shrink_caches+30/80>
Code;  c012fd1e <swap_out+2ce/4b0>
00000000 <_EIP>:
Code;  c012fd1e <swap_out+2ce/4b0>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012fd20 <swap_out+2d0/4b0>
   2:   66 01 b8 87 2a c0 8b      add    %di,0x8bc02a87(%eax)
Code;  c012fd27 <swap_out+2d7/4b0>
   9:   41                        inc    %ecx
Code;  c012fd28 <swap_out+2d8/4b0>
   a:   fc                        cld    
Code;  c012fd29 <swap_out+2d9/4b0>
   b:   a9 80 00 00 00            test   $0x80,%eax
Code;  c012fd2e <swap_out+2de/4b0>
  10:   74 08                     je     1a <_EIP+0x1a> c012fd38 <swap_out+2e8/4b0>
Code;  c012fd30 <swap_out+2e0/4b0>
  12:   0f 0b                     ud2a

Comment 9 jdavis 2003-07-14 21:00:23 UTC

just go the following ksymoops also

by the way, no nvidia driver loaded on system. also, not even running X

Jul 14 15:03:38 wolf93 kernel: are using the NVidia binary only module please
report this bug to 
Jul 14 15:03:38 wolf93 kernel: kernel BUG at page_alloc.c:114!
Jul 14 15:03:38 wolf93 kernel: invalid operand: 0000
Jul 14 15:03:38 wolf93 kernel: CPU:    1
Jul 14 15:03:38 wolf93 kernel: EIP:    0010:[<c013d5c7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jul 14 15:03:38 wolf93 kernel: EFLAGS: 00010296
Jul 14 15:03:38 wolf93 kernel: eax: 00000033   ebx: c2ad7940   ecx: 00000001  
edx: 00000001
Jul 14 15:03:38 wolf93 kernel: esi: 00000000   edi: 44400000   ebp: d88111b4  
esp: e94f5e54
Jul 14 15:03:38 wolf93 kernel: ds: 0018   es: 0018   ss: 0018
Jul 14 15:03:38 wolf93 kernel: Process fxmig_field (pid: 17386, stackpage=e94f5000)
Jul 14 15:03:38 wolf93 kernel: Stack: c0254fa0 c0254f40 c0254ee0 c1fdbcb0
c1c40030 c030f750 00000202 ffffffff 
Jul 14 15:03:38 wolf93 kernel:        d88111b4 00000040 c2ad7940 00000040
44400000 c0393320 c012b76a c2ad7940 
Jul 14 15:03:38 wolf93 kernel:        7ab4e067 00000163 c012c175 e08d5900
4441e000 fffe5390 4441e000 4441f000 
Jul 14 15:03:38 wolf93 kernel: Call Trace:   [<c012b76a>] __free_pte [kernel]
0x4a (0xe94f5e8c))
Jul 14 15:03:38 wolf93 kernel: [<c012c175>] zap_page_range [kernel] 0x385
(0xe94f5e9c))
Jul 14 15:03:38 wolf93 kernel: [<f8aa1fb2>] nfs_file_write [nfs] 0xa2 (0xe94f5f28))
Jul 14 15:03:38 wolf93 kernel: [<c012f00f>] do_munmap [kernel] 0x1ff (0xe94f5f64))
Jul 14 15:03:38 wolf93 kernel: [<c012106b>] do_softirq [kernel] 0x6b (0xe94f5f8c))
Jul 14 15:03:38 wolf93 kernel: [<c012f0c3>] sys_munmap [kernel] 0x33 (0xe94f5fa4))
Jul 14 15:03:38 wolf93 kernel: [<c0108be3>] system_call [kernel] 0x33 (0xe94f5fc0))
Jul 14 15:03:38 wolf93 kernel: Code: 0f 0b 72 00 e8 56 25 c0 83 c4 0c 8b 3d b0
84 3b c0 89 d8 29 

>>EIP; c013d5c7 <__free_pages_ok+77/420>   <=====
Trace; c012b76a <__free_pte+4a/50>
Trace; c012c175 <zap_page_range+385/4d0>
Trace; f8aa1fb2 <[nfs]nfs_file_write+a2/c0>
Trace; c012f00f <do_munmap+1ff/280>
Trace; c012106b <do_softirq+6b/d0>
Trace; c012f0c3 <sys_munmap+33/50>
Trace; c0108be3 <system_call+33/38>
Code;  c013d5c7 <__free_pages_ok+77/420>
00000000 <_EIP>:
Code;  c013d5c7 <__free_pages_ok+77/420>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c013d5c9 <__free_pages_ok+79/420>
   2:   72 00                     jb     4 <_EIP+0x4> c013d5cb
<__free_pages_ok+7b/420>
Code;  c013d5cb <__free_pages_ok+7b/420>
   4:   e8 56 25 c0 83            call   83c0255f <_EIP+0x83c0255f> 43d3fb26
Before first symbol
Code;  c013d5d0 <__free_pages_ok+80/420>
   9:   c4 0c 8b                  les    (%ebx,%ecx,4),%ecx
Code;  c013d5d3 <__free_pages_ok+83/420>
   c:   3d b0 84 3b c0            cmp    $0xc03b84b0,%eax
Code;  c013d5d8 <__free_pages_ok+88/420>
  11:   89 d8                     mov    %ebx,%eax
Code;  c013d5da <__free_pages_ok+8a/420>
  13:   29 00                     sub    %eax,(%eax)

Comment 10 jdavis 2003-07-14 21:12:32 UTC

have logged problem which happens on similar machine 99135


diffenet ksymoops output though, but hardware, os, apps all the same

Comment 11 Bugzilla owner 2004-09-30 15:41:09 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.