Bug 102282 - kernel pagefault in VM cleanup
kernel pagefault in VM cleanup
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
: 101946 (view as bug list)
Depends On:
Blocks: 101028 106472
  Show dependency treegraph
Reported: 2003-08-13 08:50 EDT by Todd Palino
Modified: 2007-11-30 17:06 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-10-11 11:26:12 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Description of bug-hunting session (2.88 KB, text/plain)
2003-08-29 16:33 EDT, Jim Paradis
no flags Details
log from test run (33.26 KB, text/plain)
2003-09-10 09:44 EDT, Jeffrey Moyer
no flags Details
Page-free debug patch (1.23 KB, patch)
2003-09-12 06:33 EDT, Stephen Tweedie
no flags Details | Diff
VM list-operation debug patch (4.61 KB, patch)
2003-09-12 06:35 EDT, Stephen Tweedie
no flags Details | Diff

  None (edit)
Description Todd Palino 2003-08-13 08:50:01 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624

Description of problem:
Under stress test (using the stress tool with 8 cpu threads, 4 io threads, 3 vm
threads, and 3 hdd threads), a pagefault occurs in the kernel after
approximately 14 hours of load.  In addition, there is a problem in the page
fault routines as well.  This is the second crash we have experienced while
testing with this tool (the first appeared to be in the SCSI modules, but not
enough instrumentation was being used to capture the fault).

Here is the stack trace:

invalid kernel-mode pagefault 2! [addr:00000004, eip:02154c51]

Pid/TGid: 560/560, comm:               stress
EIP: 0060:[<02154c51>] CPU: 0
EIP is at __free_pages_ok [kernel] 0x2b1 (2.4.21-1.1931.2.349.2.2.entsmp)
 ESP: 0002:02399f80 EFLAGS: 00010046    Not tainted
EAX: 00000000 EBX: 03778dcc ECX: 03778e08 EDX: 00000000
ESI: 02399f80 EDI: 0001ee19 EBP: 0300002c DS: 0068 ES: 0068 FS: 0000 GS: 0033
CR0: 8005003b CR2: 00000004 CR3: 00101000 CR4: 000006f0
Call Trace:   [<02145339>] wait_on_page_timeout [kernel] 0xc9 (0x20a73dac)
[<021524ef>] rebalance_laundry_zone [kernel] 0x12f (0x20a73de8)
[<02153054>] do_try_to_free_pages [kernel] 0x134 (0x20a73e1c)
[<02153691>] try_to_free_pages [kernel] 0x51 (0x20a73e38)
[<021553a7>] __alloc_pages [kernel] 0x167 (0x20a73e48)
[<02140070>] do_anonymous_page [kernel] 0xf0 (0x20a73e88)
[<02140b63>] handle_mm_fault [kernel] 0xf3 (0x20a73ec0)
[<0211f5cc>] do_page_fault [kernel] 0x1bc (0x20a73ef4)
[<0212518e>] context_switch [kernel] 0x9e (0x20a73f40)
[<02123377>] schedule [kernel] 0x2f7 (0x20a73f5c)
[<0211f410>] do_page_fault [kernel] 0x0 (0x20a73fa0)
[<0211f410>] do_page_fault [kernel] 0x0 (0x20a73fb0)

invalid operand: 0000
iptable_filter ip_tables ide-cd cdrom autofs eepro100 mii microcode keybdev
mousedev hid input usbcore ext3 jbd ips aic7xxx sd_mod scsi_mod  
CPU:    0
EIP:    0060:[<0211f488>]    Not tainted
EFLAGS: 00010006

EIP is at do_page_fault [kernel] 0x78 (2.4.21-1.1931.2.349.2.2.entsmp)
eax: 00000001   ebx: 00000004   ecx: 00000001   edx: 02375e14
esi: 00000002   edi: 0211f410   ebp: 00000002   esp: 20a73ca4
ds: 0068   es: 0068   ss: 0068
Process stress (pid: 560, stackpage=20a73000)
Stack: 20a74000 00000002 00000004 02154c51 00000080 00000001 02109dcb 00000004 
       00000008 00000400 0372f320 0372f2e4 0241f000 20a9e580 fffd7d8d 0242d180 
       00000000 00000000 20a9e000 20a9e000 20a73d70 0242d200 21c70068 02420068 
Call Trace:   [<02154c51>] __free_pages_ok [kernel] 0x2b1 (0x20a73cb0)
[<02109dcb>] __switch_to [kernel] 0x2fb (0x20a73cbc)
[<021233a7>] schedule [kernel] 0x327 (0x20a73d08)
[<021cbd8c>] submit_bh_rsector [kernel] 0x4c (0x20a73d20)
[<02121b80>] wake_up_cpu [kernel] 0x20 (0x20a73d48)
[<0211f410>] do_page_fault [kernel] 0x0 (0x20a73d5c)
[<02154c51>] __free_pages_ok [kernel] 0x2b1 (0x20a73d98)
[<02145339>] wait_on_page_timeout [kernel] 0xc9 (0x20a73dac)
[<021524ef>] rebalance_laundry_zone [kernel] 0x12f (0x20a73de8)
[<02153054>] do_try_to_free_pages [kernel] 0x134 (0x20a73e1c)
[<02153691>] try_to_free_pages [kernel] 0x51 (0x20a73e38)
[<021553a7>] __alloc_pages [kernel] 0x167 (0x20a73e48)
[<02140070>] do_anonymous_page [kernel] 0xf0 (0x20a73e88)
[<02140b63>] handle_mm_fault [kernel] 0xf3 (0x20a73ec0)
[<0211f5cc>] do_page_fault [kernel] 0x1bc (0x20a73ef4)
[<0212518e>] context_switch [kernel] 0x9e (0x20a73f40)
[<02123377>] schedule [kernel] 0x2f7 (0x20a73f5c)
[<0211f410>] do_page_fault [kernel] 0x0 (0x20a73fa0)
[<0211f410>] do_page_fault [kernel] 0x0 (0x20a73fb0)

Code:  Bad EIP value.

Version-Release number of selected component (if applicable):

How reproducible:
Didn't try

Steps to Reproduce:
1. Get the stress tool from http://weather.ou.edu/~apw/projects/stress/
2. Run stress with command line options "-c 8 -i 4 -m 3 -d 3"
3. Wait for crash (approx. 14-15 hours)

Actual Results:  After 14 hours have elapsed, the kernel will pagefault with the
above trace.

Expected Results:  Stress tool should have continued until interrupted.

Additional info:

Hardware: IBM x330, dual P3 1.2GHz, 512 MB, ServeRAID 4MX RAID card with dual 36
GB drives attached (RAID 1)

Will update further.  Currently attempting to reproduce.
Comment 1 Todd Palino 2003-08-13 13:51:07 EDT
This bug is reproduceable.  The second attempt produced the same pagefault and
then invalid operand in the pagefault routines after less than 5 hours.
Comment 2 Arjan van de Ven 2003-08-15 10:00:40 EDT
can you reproduce this with the latest kernel available via RHN ?
Comment 3 Todd Palino 2003-08-18 14:11:21 EDT
Yes, but it now appears as a NULL pointer dereference, rather than a pagefault.
 This is reproduceable as well (first time happened after 13 hours, second after
less than 5).  Here's the dump:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
*pde = 11db7001
*pte = 1fc2e067
Oops: 0002
iptable_filter ip_tables ide-cd cdrom autofs e100 microcode keybdev mousedev hid
input usbcore ext3 jbd ips aic7xxx sd_mod scsi_mod  
CPU:    0
EIP:    0060:[<c0154ac1>]    Not tainted
EFLAGS: 00010046

EIP is at __free_pages_ok [kernel] 0x2b1 (2.4.21-1.1931.2.393.entsmp)
eax: 00000000   ebx: c17508cc   ecx: c1750908   edx: 00000000
esi: c03a0f80   edi: 0001e359   ebp: c100002c   esp: dc6cbd8c
ds: 0068   es: 0068   ss: 0068
Process sleep (pid: 1006, stackpage=dc6cb000)
Stack: c03a0f80 00000002 c129f558 c03a3e10 c1781330 c03a21dc c1750908 c03a0f80 
       c103c02c c03a2158 00000282 ffffffff 0000f1ac c1750908 00000000 00000011 
       c03a0f80 c015235f c1750908 000001f4 00000000 c03a2148 c0152c0c c137a7f0 
Call Trace:   [<c015235f>] rebalance_laundry_zone [kernel] 0x12f (0xdc6cbdd0)
[<c0152c0c>] rebalance_dirty_zone [kernel] 0x9c (0xdc6cbde4)
[<c0152ec4>] do_try_to_free_pages [kernel] 0x134 (0xdc6cbe04)
[<c0153501>] try_to_free_pages [kernel] 0x51 (0xdc6cbe20)
[<c0155217>] __alloc_pages [kernel] 0x167 (0xdc6cbe30)
[<c0140c08>] do_no_page [kernel] 0x398 (0xdc6cbe70)
[<c0143b9d>] unmap_fixup [kernel] 0x11d (0xdc6cbea0)
[<c0140f11>] handle_mm_fault [kernel] 0xd1 (0xdc6cbec0)
[<c011f5ec>] do_page_fault [kernel] 0x13c (0xdc6cbef4)
[<c014343d>] do_mmap_pgoff [kernel] 0x4ad (0xdc6cbf08)
[<c0112eb5>] old_mmap [kernel] 0x105 (0xdc6cbf64)
[<c011f4b0>] do_page_fault [kernel] 0x0 (0xdc6cbfb0)

Code: 89 50 04 89 02 c7 43 04 00 00 00 00 c7 03 00 00 00 00 d1 64 
Comment 4 Arjan van de Ven 2003-08-21 14:53:57 EDT
Rik: what's the version number of the kernel that got the flags update patch ?
Comment 5 Rik van Riel 2003-08-21 17:22:38 EDT
The page->flags atomic update patch went into kernel 2.4.21-1.1931.2.399

The symptoms of this bug report suggest that the problem may be fixed by the
page->flags fix.  However, I am not 100% sure.

Todd, could you please test kernel .399 or newer to verify whether the bug still
exists ?

Thank you,
Comment 6 Todd Palino 2003-08-25 15:23:27 EDT
Tested with the .399 kernel.  It moved slightly, and it took 34 hours to
generate, but still basically the same problem.

Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
*pde = 1e6ee001
*pte = 1e6e8067
Oops: 0002
iptable_filter ip_tables ide-cd cdrom autofs e100 microcode keybdev mousedev hid
input usbcore ext3 jbd ips aic7xxx sd_mod scsi_mod  
CPU:    0
EIP:    0060:[<c0154c31>]    Not tainted
EFLAGS: 00010046

EIP is at __free_pages_ok [kernel] 0x2c1 (2.4.21-1.1931.2.399.entsmp)
eax: 00000000   ebx: c119bff8   ecx: c119bfbc   edx: 00000000
esi: c03a0f80   edi: 00005ddc   ebp: c100002c   esp: de6edda4
ds: 0068   es: 0068   ss: 0068
Process stress (pid: 584, stackpage=de6ed000)
Stack: c03a0f80 00000002 c0145a09 c03a3e10 c1781b70 c03a21dc c119bfbc c03a0f80 
       c103c02c c03a2158 00000282 ffffffff 00002eee c119bfbc 00000000 0000003a 
       c03a0f80 c01524af c119bfbc 000001f4 00000000 c03a2148 c1632664 c03a0f80 
Call Trace:   [<c0145a09>] wait_on_page_timeout [kernel] 0xc9 (0xde6eddac)
[<c01524af>] rebalance_laundry_zone [kernel] 0x12f (0xde6edde8)
[<c0153024>] do_try_to_free_pages [kernel] 0x134 (0xde6ede1c)
[<c0153661>] try_to_free_pages [kernel] 0x51 (0xde6ede38)
[<c0155387>] __alloc_pages [kernel] 0x167 (0xde6ede48)
[<c0140470>] do_anonymous_page [kernel] 0xf0 (0xde6ede88)
[<c0140f31>] handle_mm_fault [kernel] 0xd1 (0xde6edec0)
[<c011f60c>] do_page_fault [kernel] 0x13c (0xde6edef4)
[<e080e461>] scsi_finish_command [scsi_mod] 0x81 (0xde6edf44)
[<e080e1b6>] scsi_softirq_handler [scsi_mod] 0x76 (0xde6edf58)
[<c010dbd8>] do_IRQ [kernel] 0x148 (0xde6edf98)
[<c011f4d0>] do_page_fault [kernel] 0x0 (0xde6edfb0)

Code: 89 50 04 89 02 c7 43 04 00 00 00 00 c7 03 00 00 00 00 d1 64 

Comment 8 Rik van Riel 2003-08-26 09:05:13 EDT
I just started the crash tool on my test system (also a dual CPU system with
512MB RAM). I am running the .411 kernel. I'll let you know if/when I reproduce
the crash.
Comment 9 Rik van Riel 2003-08-28 12:32:32 EDT
Is this bug reproducible on any other machine, or has it only been seen on this
one system ?

I've been running the test here now and all that happens is that my little test
system gets so overloaded the cron jobs can't finish in time for new ones to be
started and system load spirals higher and higher.

There are no crashes, though ...
Comment 10 Todd Palino 2003-08-28 12:37:59 EDT
I had been testing a couple other systems, but power problems with the storms
has caused problems keeping the systems online during the tests.

I will start a test today, to run over the weekend (or until systems crash) with
several platforms.
Comment 11 Rik van Riel 2003-08-28 17:16:51 EDT
We have reproduced an eerily similar (quite likely the same) bug on an AMD64
system here. We know which line of code is causing the oops.  What we don't yet
know is why...

The oops is happening in the list_del() in the following piece of code from

                if (BAD_RANGE(zone,buddy1))
                if (BAD_RANGE(zone,buddy2))
                mask <<= 1;
                index >>= 1;
                page_idx &= mask;

I am about to audit the VM code (and our individual VM patches) to figure out
what could cause this problem. 
Comment 12 Jim Paradis 2003-08-29 16:33:07 EDT
Created attachment 94076 [details]
Description of bug-hunting session

Short form of the story: somehow the zone free area lists are getting screwed
up such that the self-pointers for two lists end up getting swapped, with bad
results.  Read the attachment for details

I booby-trapped list_add() to try and catch this state of affairs at its
creation, I'm currently running stress tests to try and make it happen.
Comment 13 Jim Paradis 2003-08-29 17:15:21 EDT
*** Bug 101946 has been marked as a duplicate of this bug. ***
Comment 14 Rik van Riel 2003-08-29 19:31:12 EDT
*** Bug 101946 has been marked as a duplicate of this bug. ***
Comment 16 Jim Paradis 2003-09-02 18:31:50 EDT
Another run; this time the zone free lists weren't corrupted but one of the
buddies in buddy coalescing was:

(gdb) print *((struct page *)$r9)
$18 = {list = {next = 0x100015fc4a0, prev = 0xffffffff8045f628},
  mapping = 0x0, index = 0x3893a, next_hash = 0x10002101f98, count = {
    counter = 0x0}, flags = 0x100000000000000, lru = {next = 0x0, prev = 0x0},
  pte = {chain = 0x0, direct = 0x0}, age = 0xfe, pprev_hash = 0x0,
  buffers = 0x0}

here list->area->prev is pointing to one of the free area entries rather
than a valid page.  It could be that the corruption is happening on page 
lists and these lists eventually get linked into the free area lists...

Still investigating.
Comment 19 Jeffrey Moyer 2003-09-04 16:46:50 EDT
Here is a mail message Jim sent me, discussing how to reproduce the issue on
x86.  Note: I wan unable to reproduce the problem on pro5 (dual xeon w/HT).



Here's the x86 version of the test I used to reproduce

First, obtain and install the following RPMs from ~jparadis/QA:


If they complain about missing pieces, install them with
--nodeps... they oughta still work.

Next, open up two shell windows.

In the first window, cd to a Linux source tree (e.g.

Issue the following command:

	% contest -n 10 io_load mem_load

In the second window, cd to /usr/bin/ctcs and issue the

	% ./hell-hound.sh

Select the default (0) for additional memory, then answer "no"
to all of the tests *except* the memory test.  Answer "yes" to

Hopefully, your system will crash in about half an hour...

Let me know how it goes!

Comment 20 Jim Paradis 2003-09-05 11:55:41 EDT
Actually, that was my formula for reproducing it on one of the hammer boxes. 
I've since tried it on taroon-latest with other hammer boxes and been unable to
reproduce.  jmoyer has tried on x86 and also been unable to reproduce.  I'm
going to try different loads (maybe run them over the weekend) and see if this
recurs or not...
Comment 21 Jay Turner 2003-09-05 13:23:22 EDT
I've been able to replicate once this morning on a single-proc i686 with 512M
RAM running contest and stress in parallel.  Unfortunately I didn't have a
serial console attached when I did it, but I'm attempting to replicate again
with one attached and will post the results.
Comment 22 Arjan van de Ven 2003-09-06 04:44:15 EDT
can we get a module list of every system that reproduced this?
(and see if there's something in common that's not on systems where we tried
hard but failed to reproduce)
Comment 23 Todd Palino 2003-09-08 09:45:06 EDT
I re-ran my tests over the weekend on i686 hardware.  I was able to duplicate
the issue (using just the stress tool originally mentioned) on 3 systems: a Xeon
2proc with HT enabled, and 2 separate P3 2proc systems.  All systems were
running the 421 SMP kernel with all of the latest Taroon packages installed (as
of Friday afternoon).  OS configurations were identical.

The P3 systems run with the following modules (according to /proc/modules):
  in use: e100, usbcore, ext3, jdb, ips, sd_mod, scsi_mod
  loaded: ide-cd, cdrom, autofs, microcode, keybdev, mousedev, hid, input, aic7xxx
The Xeon systems have the following:
  in use: tg3, usbcore, ext3, jbd, mptscsih, mptbase, sd_mod, scsi_mod
  loaded: autofs, microcode, keybdev, mousedev, hid, input, mptctl
Comment 25 Jay Turner 2003-09-09 07:08:31 EDT
I'm getting a slightly different oops, but it appears to be related to the same
thing.  I've seen this repeatedly on a couple of machines here in Centennial
(both UP and SMP.)  In addition, I was able to replicate on a kernel with
CONFIG_HUGETLBFS off.  I have some debug code from Ingo that I'm going to apply
and will post results of that as soon as the machines fall over.

VM: reclaim_page, found unknown page
Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
*pde = 00000000
Oops: 0002
parport_pc lp parport nfs lockd sunrpc e100 floppy microcode keybdev mousedev
hid input ehci-hcd usb-uhci usbcore ext3 jbd
CPU:    0
EIP:    0060:[<c0144cb0>]    Not tainted
EFLAGS: 00010246
EIP is at __lru_cache_del [kernel] 0x1e0 (2.4.21-1.1931.2.423.ent)
eax: 00000000   ebx: c14e9240   ecx: c14e925c   edx: 00000000
esi: c0349e80   edi: 0000003f   ebp: 00000000   esp: c1d85d98
ds: 0068   es: 0068   ss: 0068
Process stress (pid: 3638, stackpage=c1d85000)
Stack: c1d84000 c14e9240 00000000 c0144d65 c0148461 c14e9240 00000141 c013b612
       c14e9240 c16fb0d0 00000000 c1d84000 00000000 00000000 00000000 c1019dd8
       c01571fe d9dd2988 00000000 c14e9240 0000003f 00000000 c0146239 c14e9240
Call Trace:   [<c0144d65>] lru_cache_del [kernel] 0x5 (0xc1d85da4)
[<c0148461>] __free_pages_ok [kernel] 0x31 (0xc1d85da8)
[<c013b612>] wait_on_page_timeout [kernel] 0xc2 (0xc1d85db4)
[<c01571fe>] try_to_free_buffers [kernel] 0x8e (0xc1d85dd8)
[<c0146239>] rebalance_laundry_zone [kernel] 0xd9 (0xc1d85df0)
[<c0146c04>] do_try_to_free_pages [kernel] 0x134 (0xc1d85e14)
[<c0147231>] try_to_free_pages [kernel] 0x51 (0xc1d85e30)
[<c0148dd7>] __alloc_pages [kernel] 0x167 (0xc1d85e40)
[<c013b150>] add_to_page_cache_unique [kernel] 0x50 (0xc1d85e54)
[<c0149d2c>] read_swap_cache_async [kernel] 0xac (0xc1d85e84)
[<c01367c1>] swapin_readahead [kernel] 0x51 (0xc1d85ea4)
[<c0136a2f>] do_swap_page [kernel] 0x24f (0xc1d85ec0)
[<c0137344>] handle_mm_fault [kernel] 0xf4 (0xc1d85edc)
[<c011a80c>] do_page_fault [kernel] 0x13c (0xc1d85f0c)
[<e0850e12>] rh_init_int_timer [usb-uhci] 0x62 (0xc1d85f2c)
[<e0850d60>] rh_int_timer_do [usb-uhci] 0x0 (0xc1d85f34)
[<c012b66e>] __run_timers [kernel] 0xae (0xc1d85f38)
[<c012b327>] timer_bh [kernel] 0x47 (0xc1d85f64)
[<c012a63d>] tqueue_bh [kernel] 0x1d (0xc1d85f6c)
[<c011e59b>] context_switch [kernel] 0x7b (0xc1d85f84)
[<c011d105>] schedule [kernel] 0x125 (0xc1d85fa0)
[<c011a6d0>] do_page_fault [kernel] 0x0 (0xc1d85fb0)
Code: 89 50 04 89 02 c7 41 04 00 00 00 00 c7 43 1c 00 00 00 00 0f
Kernel panic: Fatal exception
Comment 26 Ingo Molnar 2003-09-09 09:44:48 EDT
could you please also try a test with swap turned off - if that is possible
without OOM-ing quickly.
Comment 27 Jim Paradis 2003-09-09 10:00:33 EDT
Ingo:  ~jparadis/QA is on the Boston share... I'll copy the rpms over to an
equivalent place on the Centennial share.

Comment 28 Jim Paradis 2003-09-09 10:27:18 EDT
Here's another possible way to reproduce this bug a bit faster:

I've only run this once and lost the output (durn screen blanking) but I *think*
this is another way of reproducing the bug a bit faster:  run "stress" (not
stress-kernel, but the original stress test that tpalino used) opposite "contest
-c -n 10 io_load mem_load".  The "-c" flag prevents the swapon/swapoff that
contest usually does.  Without it stress gets oom-killed too much to be useful.

I tried this last nite and logged uptime to a file, then went home.  Came in the
next day and discovered that my system (dual-Xeon DELL WS, 512Mb) only stayed up
38 minutes after I left...
Comment 29 Stephen Tweedie 2003-09-10 07:34:53 EDT
Jay, is there any chance you can capture a vmcore dump of one of these failures?
Comment 30 Jay Turner 2003-09-10 08:48:20 EDT
I will indeed try.  To this point, I've had code running on three boxes since
yesterday and nothing has fallen over.  It must know that we're looking for it :-)
Comment 31 Jeffrey Moyer 2003-09-10 09:43:23 EDT
I reproduced this on a UP celeron w/ 128MB of RAM running a UP kernel with
Ingo's patch and hugetlbfs turned off.  The bug was reproduced using the
instructions from Jim's email posted above.  I'll attach the log output.
Comment 32 Jeffrey Moyer 2003-09-10 09:44:16 EDT
Created attachment 94370 [details]
log from test run
Comment 33 Jay Turner 2003-09-10 10:15:18 EDT
Ditto for me, also on a UP system but with 512M RAM.  I have the netdump log,
vmcore and serial console output if anyone is interested (already handed off
vmcore and log to sct.)
Comment 34 Larry Woodman 2003-09-10 10:56:43 EDT
Can someone make Jay's latest vmcore, log and console output available?

Comment 35 Jay Turner 2003-09-10 11:10:37 EDT
yakko.test.redhat.com:/var/crash (root/standard testlab password)
Comment 36 Jay Turner 2003-09-11 13:51:40 EDT
Crashed my SMP ix86 box again running the 431 kernel.  vmcore and log file on
yakko.test.redhat.com:/var/crash (root/standard testlab password)
Comment 37 Stephen Tweedie 2003-09-12 05:44:40 EDT
Ingo asked a few comments above whether we could reproduce without swap.

Well, my last 4 reproducer runs all died with solid lockups after exhausting
swap, so I've been forced to add another 1G LVM swap on my 256MB test box. 
Looks like swapless reproducer is a non-starter, at least with the reproducer
recipes we've been using so far.
Comment 38 Stephen Tweedie 2003-09-12 06:33:43 EDT
Created attachment 94438 [details]
Page-free debug patch

First of two debug booby-trap patch I've been using while chasing this:

Catches any double-frees simply by tracking the free state of a page
independently  of page->count: set the free bit in __free_pages_ok(), clear it
in rmqueue(), and BUG() if it's already set/clear.
Comment 39 Stephen Tweedie 2003-09-12 06:35:13 EDT
Created attachment 94439 [details]
VM list-operation debug patch

Second patch, relies on the page-free debug patch.  Trap any list add/del
operations in the VM if the page is already free.  Also adds a BUG() to a
couple of VM corner cases which are marked as "can't happen" and which
currently only printk a warning.
Comment 40 Larry Woodman 2003-09-12 16:08:51 EDT
Bug was found, fix is currently undergoing test.

Comment 41 Jay Turner 2003-09-13 17:46:13 EDT
I built a kernel which included Arjan's vmdebug.patch and Stephen's
reclaim-fix.patch against 431.  That kernel just oops'd on an SMP machine. 
Looking at the trace, it appears that the swapoff process got OOMkilled, which
led to an oops from the swapoff process.  Definitely not the same footprint as
the earlier oops, but I'm not sure if it's something to be worried about of not.
 When OOMkill kicks in, all bets are kind of off for what's going to happen. 
Anyway, the full log, the kernel in question (2.4.21-1.1931.2.431.jkt3entsmp) as
well as a bzip'd vmcore are available on yakko.test.redhat.com (root/standard
lab password.)
Comment 44 Stephen Tweedie 2003-09-15 04:30:58 EDT
EIP is at atomic_dec_and_lock [kernel] 0x10 (2.4.21-1.1931.2.431.jkt3entsmp)
Call Trace:   [<c0178950>] dput [kernel] 0x30 (0xcaa21f58)
[<c017df0d>] __mntput [kernel] 0x1d (0xcaa21f6c)
[<c0156d42>] sys_swapoff [kernel] 0x252 (0xcaa21f7c)
[<c01443fb>] sys_munmap [kernel] 0x4b (0xcaa21fa4)
Comment 48 Jim Paradis 2003-09-15 11:05:50 EDT
Just another datapoint: I applied arjan's and sct's patches to a .431 kernel and
tried my stress tests on the particular hammer box that tends to reproduce the
problem quickly.  Without the patches it crashed in under 40 minutes.  With the
patches the system ran the same stress tests for twelve hours and stayed up
afterwards.  I'd say that's a strong indication that this one is fixed.
Comment 49 Jim Paradis 2003-09-15 11:50:35 EDT
amending my previous comment to state that the *original* issue appears to be
fixed, the new swapoff issue is still obviously open.
Comment 50 Bill Nottingham 2004-10-11 11:26:12 EDT
Closing MODIFIED bugs as fixed. Please reopen if the problem perists.

Note You need to log in before you can comment on or make changes to this bug.