Description of problem: System was running normally, when I suddenly got a wall from the kernel. System locked up shortly thereafter; X was totally unresponsive, could not ctrl-alt-f1 to console, but was able to alt-sysreq-e/s/u/b (or at least the b worked, couldn't see any output); hopefully that tells you something about how locked it was. The complete message from the kernel, from logs: kernel: Bad page state at prep_new_page (in process 'X', page 03007560) kernel: flags:0x00000004 mapping:00001c00 mapcount:0 count:0 kernel: Backtrace: kernel: [<0212ddaf>] bad_page+0x56/0x80 kernel: [<0212e107>] prep_new_page+0x23/0x39 kernel: [<0212e541>] buffered_rmqueue+0x11c/0x144 kernel: [<0212e60c>] __alloc_pages+0xa3/0x282 kernel: [<02136725>] do_anonymous_page+0x59/0x133 kernel: [<0212e541>] buffered_rmqueue+0x11c/0x144 kernel: [<02136858>] do_no_page+0x59/0x249 kernel: [<02105faf>] do_IRQ+0x168/0x174 kernel: [<02136b66>] handle_mm_fault+0x70/0xe8 kernel: [<0211358c>] do_page_fault+0x170/0x4a0 kernel: [<021380d1>] do_mmap_pgoff+0x3af/0x60f kernel: [<02103859>] __switch_to+0x17e/0x199 kernel: [<0228f5fd>] schedule+0x371/0x38e kernel: [<0211341c>] do_page_fault+0x0/0x4a0 kernel: Trying to fix it up, but a reboot is needed Version-Release number of selected component (if applicable): kernel-2.6.7-1.517.i686.rpm all other packages current rawhide as of 20040814 (i.e. haven't installed xorg-x11*6.7.99.2-5.i386.rpm yet) How reproducible: This is the first occurrence Steps to Reproduce: No idea yet. CPU was running 100% by md5crk (http://md5crk.com) and Azureus (bittorrent client) was busy downloading, so lots of disk and net activity. But I put the same load on the system on a daily basis. Upgraded to kernel build 517 from 503 just yesterday, so I'm suspecting the new kernel build for now.
which video driver is this?
Quoth the xorg.conf: Section "Device" Identifier "Videocard0" Driver "mga" VendorName "Matrox" BoardName "Crappy old Mil2/8N" VideoRam 8192 EndSection The card is a Millennium II 8 MB PCI. Should I be filing this under X instead? Should I go ahead and install the new Xorg?
I've seen the same "Bad page state at prep_new_page" on a Fedora Core 2 box with kernel 2.6.8-1.521. No X-windows at all (runlevel=3). The kernel error was reported against a user's executable. Suspecting RAM errors, I ran MEMTEST86 v3.1 for several days, but wihout any RAM errors whatsoever. We've only seen these errors after upgrading to the Fedora kernel 2.6 series, so I suspect a kernel bug is biting us here.
Another piece of info: The error occurred under heavy CPU load (single process using close to 100%). I've now found a second instance of the error on a different machine with identical software setup. This machine also passes the MEMTEST86 test perfectly. The "Bad page state at prep_new_page" occurred for the /usr/bin/updatedb executable. Obviously, the error has nothing to do with specific application's errors, but it seems to be related to the kernel.
are you still getting this with the latest update kernels ?
I haven't seen this error again in the past 2 months, even though I'm still running the same old kernel 2.6.8-1.521 on many nodes of a cluster. Some of my IBM ThinkCentre S50 PCs (Intel i865 chipset) do freeze occasionally, but *nothing* is logged to the syslog, no output on the screen. Maybe the problem is related to hardware errors (RAM ?), even though MEMTEST86 was never able to find any bad RAM modules. Or maybe the problem is with ACPI ? I currently have no idea what's causing these crashes...
bad page states have been indicative of faulty hardware a number of times, so its a possibility. Given that you were running the cpu at 100% for a long period of time, it's likely the system was running hot. Is your cooling adequate ? Strong enough power supply ?
We're running these hundreds of PCs on shelves. Cooling should be OK, inlet temperature is between 20-28 Centigrade. If the CPU runs hot, the P4 CPU should do thermal throttling in stead of freezing, right ? Power supply is a good question - only the IBM engineers would know... Hopefully professional office PCs are designed to be reliable enough. Maybe some of the PCs have marginal specs on some components.
You mentioned above you hadn't seen this in 2 months, that was January.. Any reoccurance, or is it safe to close this ?
Since I (original reporter, not the guy you were querying) reported, I have changed a lot of stuff in my system; in particular I'm now running a GeForce 6600 using the closed-source nvidia drivers, so my data points are likely no longer of use to you; just hit delete now. I went many months without seeing this error again (or any kind of oops/panic/crash), including roughly a month with the new card using kernel build 1369 (which shipped with FC4). However I have seen the "Bad page state" a couple of times in the last month, on two different FC4 kernels (figured there was really no point in tracking the rawhide kernels with the binary drivers) 2.6.12-1.1447_FC4: Sep 17 17:08:57 ip68-110-7-34 kernel: Bad page state at prep_new_page (in process 'mplayer', page c134db20) Sep 17 17:08:57 ip68-110-7-34 kernel: flags:0x20020008 mapping:0000e200 mapcount:0 count:0 (Tainted: P ) Sep 17 17:08:57 ip68-110-7-34 kernel: Backtrace: Sep 17 17:08:57 ip68-110-7-34 kernel: [<c0156eed>] bad_page+0x8c/0xc1 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c0157473>] prep_new_page+0x19/0x48 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c0157c03>] buffered_rmqueue+0xb8/0x31b Sep 17 17:08:57 ip68-110-7-34 kernel: [<f0830b43>] do_get_write_access+0x32f/0x691 [jbd] Sep 17 17:08:57 ip68-110-7-34 kernel: [<c017fece>] __getblk+0x2c/0x52 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c0157fc4>] __alloc_pages+0xd3/0x3ef Sep 17 17:08:57 ip68-110-7-34 kernel: [<c015a7e9>] __do_page_cache_readahead+0xc7/0x118 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c015a94d>] blockable_page_cache_readahead+0x53/0xbc Sep 17 17:08:57 ip68-110-7-34 kernel: [<c015aa11>] make_ahead_window+0x5b/0x98 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c015aad3>] page_cache_readahead+0x85/0x161 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c015363e>] do_generic_mapping_read+0x3bc/0x4b0 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c01538be>] __generic_file_aio_read+0xb2/0x1fe Sep 17 17:08:57 ip68-110-7-34 kernel: [<c0153732>] file_read_actor+0x0/0xda Sep 17 17:08:57 ip68-110-7-34 kernel: [<c01f41fb>] avc_has_perm+0x4e/0x63 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c0153a48>] generic_file_aio_read+0x3e/0x4f Sep 17 17:08:57 ip68-110-7-34 kernel: [<c017bf6e>] do_sync_read+0x9e/0xec Sep 17 17:08:57 ip68-110-7-34 kernel: [<c025ec8d>] opost_block+0x7a/0x128 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c01f7f2b>] selinux_file_permission+0xe0/0x152 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c0140512>] autoremove_wake_function+0x0/0x37 Sep 17 17:08:57 ip68-110-7-34 kernel: [<c017bed0>] do_sync_read+0x0/0xec Sep 17 17:08:57 ip68-110-7-34 kernel: [<c017c058>] vfs_read+0x9c/0x10e Sep 17 17:08:57 ip68-110-7-34 kernel: [<c017c307>] sys_read+0x41/0x6a Sep 17 17:08:58 ip68-110-7-34 kernel: [<c0103a61>] syscall_call+0x7/0xb Sep 17 17:08:58 ip68-110-7-34 kernel: Trying to fix it up, but a reboot is needed 2.6.13-1.1524_FC4: (from FC4 updates-testing) Sep 25 23:19:40 ip68-110-7-34 kernel: Bad page state at prep_new_page (in process 'mplayer', page c155df00) Sep 25 23:19:40 ip68-110-7-34 kernel: flags:0x4000a208 mapping:00000000 mapcount:0 count:0 (Tainted: P ) Sep 25 23:19:40 ip68-110-7-34 kernel: Backtrace: Sep 25 23:19:40 ip68-110-7-34 kernel: [<c016f17d>] bad_page+0x8c/0xc1 Sep 25 23:19:40 ip68-110-7-34 kernel: [<c016f87c>] prep_new_page+0x1a/0x60 Sep 25 23:19:40 ip68-110-7-34 kernel: [<c0170293>] buffered_rmqueue+0xb8/0x43b Sep 25 23:19:40 ip68-110-7-34 kernel: [<c0170788>] __alloc_pages+0xe7/0x3ff Sep 25 23:19:40 ip68-110-7-34 kernel: [<c0173e10>] __do_page_cache_readahead+0xc9/0x11a Sep 25 23:19:40 ip68-110-7-34 kernel: [<c0173f74>] blockable_page_cache_readahead+0x53/0xbc Sep 25 23:19:40 ip68-110-7-34 kernel: [<c0174038>] make_ahead_window+0x5b/0x98 Sep 25 23:19:40 ip68-110-7-34 kernel: [<c01740fa>] page_cache_readahead+0x85/0x162 Sep 25 23:19:40 ip68-110-7-34 kernel: [<c016af70>] file_read_actor+0x77/0xdf Sep 25 23:19:40 ip68-110-7-34 kernel: [<c016ae05>] do_generic_mapping_read+0x3c9/0x4bd Sep 25 23:19:40 ip68-110-7-34 kernel: [<c016b08a>] __generic_file_aio_read+0xb2/0x1fe Sep 25 23:19:40 ip68-110-7-34 kernel: [<c016aef9>] file_read_actor+0x0/0xdf Sep 25 23:19:41 ip68-110-7-34 kernel: [<c016b225>] generic_file_read+0x0/0xbd Sep 25 23:19:41 ip68-110-7-34 kernel: [<c016b2c4>] generic_file_read+0x9f/0xbd Sep 25 23:19:41 ip68-110-7-34 kernel: [<c01519c2>] autoremove_wake_function+0x0/0x37 Sep 25 23:19:41 ip68-110-7-34 kernel: [<c01a0db3>] vfs_read+0xa0/0x158 Sep 25 23:19:41 ip68-110-7-34 kernel: [<c01a1120>] sys_read+0x41/0x6a Sep 25 23:19:41 ip68-110-7-34 kernel: [<c0104465>] syscall_call+0x7/0xb Sep 25 23:19:41 ip68-110-7-34 kernel: Trying to fix it up, but a reboot is needed Other reasons you probably shouldn't even be reading this report: * The first occurrence after each boot happened in process 'mplayer', suspiciously video-related app * Unless I'm interpreting this wrong, /proc/mtrr indicates the page addresses where the problem occurred are in video memory: reg02: base=0xc0000000 (3072MB), size= 256MB: write-combining, count=1 Have been running kernel 1526 since Sep 30, haven't seen it yet on that build.
hmm, call it a cop out, but I really don't trust that driver, and bad page states have been seen in other cases with that loaded in the past too. I'm going to close this I think. If you manage to reproduce without having had it loaded, I'm all ears.