Bug 129965 - crash: Bad page state at prep_new_page
Summary: crash: Bad page state at prep_new_page
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-08-15 23:04 UTC by Ellen Shull
Modified: 2015-01-04 22:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-06 02:36:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Ellen Shull 2004-08-15 23:04:48 UTC
Description of problem: 
System was running normally, when I suddenly got a wall from the 
kernel.  System locked up shortly thereafter; X was totally 
unresponsive, could not ctrl-alt-f1 to console, but was able to 
alt-sysreq-e/s/u/b (or at least the b worked, couldn't see any 
output); hopefully that tells you something about how locked it was. 
 
The complete message from the kernel, from logs: 
 
kernel: Bad page state at prep_new_page (in process 'X', page 
03007560) 
kernel: flags:0x00000004 mapping:00001c00 mapcount:0 count:0 
kernel: Backtrace: 
kernel:  [<0212ddaf>] bad_page+0x56/0x80 
kernel:  [<0212e107>] prep_new_page+0x23/0x39 
kernel:  [<0212e541>] buffered_rmqueue+0x11c/0x144 
kernel:  [<0212e60c>] __alloc_pages+0xa3/0x282 
kernel:  [<02136725>] do_anonymous_page+0x59/0x133 
kernel:  [<0212e541>] buffered_rmqueue+0x11c/0x144 
kernel:  [<02136858>] do_no_page+0x59/0x249 
kernel:  [<02105faf>] do_IRQ+0x168/0x174 
kernel:  [<02136b66>] handle_mm_fault+0x70/0xe8 
kernel:  [<0211358c>] do_page_fault+0x170/0x4a0 
kernel:  [<021380d1>] do_mmap_pgoff+0x3af/0x60f 
kernel:  [<02103859>] __switch_to+0x17e/0x199 
kernel:  [<0228f5fd>] schedule+0x371/0x38e 
kernel:  [<0211341c>] do_page_fault+0x0/0x4a0 
kernel: Trying to fix it up, but a reboot is needed 
 
Version-Release number of selected component (if applicable): 
kernel-2.6.7-1.517.i686.rpm 
 
all other packages current rawhide as of 20040814 (i.e. haven't 
installed xorg-x11*6.7.99.2-5.i386.rpm yet) 
 
How reproducible: 
This is the first occurrence  
 
Steps to Reproduce: 
No idea yet.  CPU was running 100% by md5crk (http://md5crk.com) and 
Azureus (bittorrent client) was busy downloading, so lots of disk and 
net activity.  But I put the same load on the system on a daily 
basis.  Upgraded to kernel build 517 from 503 just yesterday, so I'm 
suspecting the new kernel build for now.

Comment 1 Arjan van de Ven 2004-08-16 06:48:38 UTC
which video driver is this?

Comment 2 Ellen Shull 2004-08-16 08:09:38 UTC
Quoth the xorg.conf: 
 
Section "Device" 
        Identifier  "Videocard0" 
        Driver      "mga" 
        VendorName  "Matrox" 
        BoardName   "Crappy old Mil2/8N" 
        VideoRam    8192 
EndSection 
 
The card is a Millennium II 8 MB PCI. 
 
Should I be filing this under X instead?  Should I go ahead and 
install the new Xorg? 

Comment 3 Ole Holm Nielsen 2004-10-12 11:18:10 UTC
I've seen the same "Bad page state at prep_new_page" on a Fedora Core 2
box with kernel 2.6.8-1.521.  No X-windows at all (runlevel=3).
The kernel error was reported against a user's executable.
Suspecting RAM errors, I ran MEMTEST86 v3.1 for several days, but
wihout any RAM errors whatsoever.
We've only seen these errors after upgrading to the Fedora kernel 2.6
series, so I suspect a kernel bug is biting us here.

Comment 4 Ole Holm Nielsen 2004-10-12 12:02:50 UTC
Another piece of info:  The error occurred under heavy CPU load
(single process using close to 100%).

I've now found a second instance of the error on a different machine
with identical software setup.  This machine also passes the MEMTEST86
test perfectly.  The "Bad page state at prep_new_page" occurred for 
the /usr/bin/updatedb executable.  Obviously, the error has nothing 
to do with specific application's errors, but it seems to be related 
to the kernel.

Comment 5 Dave Jones 2005-01-17 08:10:17 UTC
are you still getting this with the latest update kernels ?

Comment 6 Ole Holm Nielsen 2005-01-17 08:25:09 UTC
I haven't seen this error again in the past 2 months, even though I'm
still running the same old kernel 2.6.8-1.521 on many nodes of a cluster.
Some of my IBM ThinkCentre S50 PCs (Intel i865 chipset) do freeze
occasionally, but *nothing* is logged to the syslog, no output on
the screen.
Maybe the problem is related to hardware errors (RAM ?), even though
MEMTEST86 was never able to find any bad RAM modules.  Or maybe the problem
is with ACPI ?  I currently have no idea what's causing these crashes...

Comment 7 Dave Jones 2005-01-17 08:31:27 UTC
bad page states have been indicative of faulty hardware a number of times, so
its a possibility.  Given that you were running the cpu at 100% for a long
period of time, it's likely the system was running hot.  Is your cooling
adequate ? Strong enough power supply ?


Comment 8 Ole Holm Nielsen 2005-01-17 08:52:58 UTC
We're running these hundreds of PCs on shelves.  Cooling should be OK,
inlet temperature is between 20-28 Centigrade.  If the CPU runs hot,
the P4 CPU should do thermal throttling in stead of freezing, right ?
Power supply is a good question - only the IBM engineers would know...
Hopefully professional office PCs are designed to be reliable enough.
Maybe some of the PCs have marginal specs on some components.

Comment 9 Dave Jones 2005-10-06 00:31:38 UTC
You mentioned above you hadn't seen this in 2 months, that was January..
Any reoccurance, or is it safe to close this ?


Comment 10 Ellen Shull 2005-10-06 01:18:30 UTC
Since I (original reporter, not the guy you were querying) reported, I have 
changed a lot of stuff in my system; in particular I'm now running a GeForce 
6600 using the closed-source nvidia drivers, so my data points are likely no 
longer of use to you; just hit delete now. 
 
I went many months without seeing this error again (or any kind of 
oops/panic/crash), including roughly a month with the new card using kernel 
build 1369 (which shipped with FC4).  However I have seen the "Bad page state" 
a couple of times in the last month, on two different FC4 kernels (figured 
there was really no point in tracking the rawhide kernels with the binary 
drivers) 
 
2.6.12-1.1447_FC4: 
Sep 17 17:08:57 ip68-110-7-34 kernel: Bad page state at prep_new_page (in 
process 'mplayer', page c134db20) 
Sep 17 17:08:57 ip68-110-7-34 kernel: flags:0x20020008 mapping:0000e200 
mapcount:0 count:0 (Tainted: P     ) 
Sep 17 17:08:57 ip68-110-7-34 kernel: Backtrace: 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c0156eed>] bad_page+0x8c/0xc1 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c0157473>] prep_new_page+0x19/0x48 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c0157c03>] 
buffered_rmqueue+0xb8/0x31b 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<f0830b43>] 
do_get_write_access+0x32f/0x691 [jbd] 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c017fece>] __getblk+0x2c/0x52 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c0157fc4>] __alloc_pages+0xd3/0x3ef 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c015a7e9>] 
__do_page_cache_readahead+0xc7/0x118 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c015a94d>] 
blockable_page_cache_readahead+0x53/0xbc 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c015aa11>] 
make_ahead_window+0x5b/0x98 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c015aad3>] 
page_cache_readahead+0x85/0x161 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c015363e>] 
do_generic_mapping_read+0x3bc/0x4b0 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c01538be>] 
__generic_file_aio_read+0xb2/0x1fe 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c0153732>] file_read_actor+0x0/0xda 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c01f41fb>] avc_has_perm+0x4e/0x63 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c0153a48>] 
generic_file_aio_read+0x3e/0x4f 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c017bf6e>] do_sync_read+0x9e/0xec 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c025ec8d>] opost_block+0x7a/0x128 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c01f7f2b>] 
selinux_file_permission+0xe0/0x152 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c0140512>] 
autoremove_wake_function+0x0/0x37 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c017bed0>] do_sync_read+0x0/0xec 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c017c058>] vfs_read+0x9c/0x10e 
Sep 17 17:08:57 ip68-110-7-34 kernel:  [<c017c307>] sys_read+0x41/0x6a 
Sep 17 17:08:58 ip68-110-7-34 kernel:  [<c0103a61>] syscall_call+0x7/0xb 
Sep 17 17:08:58 ip68-110-7-34 kernel: Trying to fix it up, but a reboot is 
needed 
 
2.6.13-1.1524_FC4: (from FC4 updates-testing) 
Sep 25 23:19:40 ip68-110-7-34 kernel: Bad page state at prep_new_page (in 
process 'mplayer', page c155df00) 
Sep 25 23:19:40 ip68-110-7-34 kernel: flags:0x4000a208 mapping:00000000 
mapcount:0 count:0 (Tainted: P     ) 
Sep 25 23:19:40 ip68-110-7-34 kernel: Backtrace: 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c016f17d>] bad_page+0x8c/0xc1 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c016f87c>] prep_new_page+0x1a/0x60 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c0170293>] 
buffered_rmqueue+0xb8/0x43b 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c0170788>] __alloc_pages+0xe7/0x3ff 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c0173e10>] 
__do_page_cache_readahead+0xc9/0x11a 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c0173f74>] 
blockable_page_cache_readahead+0x53/0xbc 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c0174038>] 
make_ahead_window+0x5b/0x98 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c01740fa>] 
page_cache_readahead+0x85/0x162 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c016af70>] file_read_actor+0x77/0xdf 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c016ae05>] 
do_generic_mapping_read+0x3c9/0x4bd 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c016b08a>] 
__generic_file_aio_read+0xb2/0x1fe 
Sep 25 23:19:40 ip68-110-7-34 kernel:  [<c016aef9>] file_read_actor+0x0/0xdf 
Sep 25 23:19:41 ip68-110-7-34 kernel:  [<c016b225>] generic_file_read+0x0/0xbd 
Sep 25 23:19:41 ip68-110-7-34 kernel:  [<c016b2c4>] 
generic_file_read+0x9f/0xbd 
Sep 25 23:19:41 ip68-110-7-34 kernel:  [<c01519c2>] 
autoremove_wake_function+0x0/0x37 
Sep 25 23:19:41 ip68-110-7-34 kernel:  [<c01a0db3>] vfs_read+0xa0/0x158 
Sep 25 23:19:41 ip68-110-7-34 kernel:  [<c01a1120>] sys_read+0x41/0x6a 
Sep 25 23:19:41 ip68-110-7-34 kernel:  [<c0104465>] syscall_call+0x7/0xb 
Sep 25 23:19:41 ip68-110-7-34 kernel: Trying to fix it up, but a reboot is 
needed 
 
Other reasons you probably shouldn't even be reading this report: 
* The first occurrence after each boot happened in process 'mplayer', 
suspiciously video-related app 
* Unless I'm interpreting this wrong, /proc/mtrr indicates the page addresses 
where the problem occurred are in video memory: 
reg02: base=0xc0000000 (3072MB), size= 256MB: write-combining, count=1 
 
Have been running kernel 1526 since Sep 30, haven't seen it yet on that build. 

Comment 11 Dave Jones 2005-10-06 02:36:13 UTC
hmm, call it a cop out, but I really don't trust that driver, and bad page
states have been seen in other cases with that loaded in the past too.
I'm going to close this I think.

If you manage to reproduce without having had it loaded, I'm all ears.



Note You need to log in before you can comment on or make changes to this bug.