Description of problem: Although seems harmless, and it can still successful get a vmcore later, there were numerous warnings while capture kernel was booting up, and RHEL 5.1 version of kexec-tools (1.101-194.el5) does not have this problem. ... Freeing initrd memory: 5664kB freed Bad page state in process 'swapper' page:e0000000110eeed8 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0 (Not tainted) Trying to fix it up, but a reboot is needed Backtrace: Call Trace: [<a000000100013ae0>] show_stack+0x40/0xa0 sp=e000000015697b40 bsp=e0000000156912a8 [<a000000100013b70>] dump_stack+0x30/0x60 sp=e000000015697d10 bsp=e000000015691290 [<a00000010010a260>] bad_page+0xe0/0x160 sp=e000000015697d10 bsp=e000000015691248 [<a00000010010aa30>] free_hot_cold_page+0x110/0x320 sp=e000000015697d20 bsp=e000000015691200 [<a00000010010ad70>] free_hot_page+0x30/0x60 sp=e000000015697d20 bsp=e0000000156911d8 [<a00000010010d010>] __free_pages+0xb0/0x100 sp=e000000015697d20 bsp=e0000000156911b0 [<a00000010010d1e0>] free_pages+0x180/0x1a0 sp=e000000015697d20 bsp=e000000015691188 [<a000000100760dc0>] free_initrd_mem+0x1e0/0x2e0 sp=e000000015697d20 bsp=e000000015691160 [<a000000100753410>] free_initrd+0x130/0x180 sp=e000000015697d30 bsp=e000000015691128 [<a000000100756460>] populate_rootfs+0x1e0/0x200 sp=e000000015697d30 bsp=e0000000156910f8 [<a0000001007487d0>] init+0x3d0/0x780 sp=e000000015697d30 bsp=e0000000156910c8 [<a0000001000121b0>] kernel_thread_helper+0x30/0x60 sp=e000000015697e30 bsp=e0000000156910a0 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e000000015697e30 bsp=e0000000156910a0 Bad page state in process 'swapper' page:e0000000110eef10 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0 (Tainted: G B) Trying to fix it up, but a reboot is needed ... Full log: https://bugzilla.redhat.com/attachment.cgi?id=296597 Version-Release number of selected component (if applicable): kexec-tools-1.102pre-10.el5 with the patch from BZ 434927#28 kernel-2.6.18-83.el5 RHEL5.2-Server-20080224.nightly How reproducible: Always. You can try hp-lp1.rhts.boston.redhat.com or hp-rx1620-01.rhts.boston.redhat.com. Steps to Reproduce: 1. configure kdump and crashkernel=512M@256M. 2. echo c >/proc/sysrq-trigger
They are harmless, as long as they can be fixed up. I'm not sure that I'll be able to get to this by 5.2, but I'll try.
Note to self: So, I'm looking a little deeper into this, and it seems these calls are occuring because our page tables are perhaps running off the edge of memory. Not sure why this is happening all of a sudden. A bisect of the kernel may be in order here. ..
Cai, can you try the --noio option on this bug as well, with kexec-tools-1.102 without my kexec patch? This may be closeable as well. Thanks!
Those warnings ONLY happens when adding the patch from BZ #434927 to kexec-tools. If without that patch, even without "--noio" option for most of IA64 systems, we can't see those warnings, but we will have a zero-size vmcore there.
I tried as an experiment changing my crashkernel param from 512M@256M to 1024M@256M and I get different behavior. Not sure if this sheds any light on the problem or not: ... Kernel command line: BOOT_IMAGE=scsi0:EFI\redhat\vmlinuz-2.6.18-prep root=/dev/VolGroup00/LogVol00 ro irqpoll maxcpus=1 reset_devices machvec=dig machvec=dig verbose elfcorehdr=5242768K max_addr=5120M min_addr=4096M Misrouted IRQ fixup and polling support enabled This may significantly impact system performance PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour VGA+ 80x25 low bootmem alloc of 67108864 bytes failed! Kernel panic - not syncing: Out of low memory
I have been doing a bunch of digging on this. So far I am unable to find a culprit, everything appears to be getting done properly by kexec but _something_ obviously is wrong. I know the following so far: 1. the old kexec-tools from RHEL5.1 works just fine (with either this kernel or the RHEL5.1 kernel) 2. I can reproduce the same issues with the stock kexec tool without any Red Hat patches. So, because of #2 I think the best debug method at this point is to try to figure out what broke kexec upstream by using git-bisect.
Thanks to doug, we've found the upstream change that causes this regression, and it co-incides with the change that we need to revert to fix the zero-size vmcore on ia64 bug. Closing this as a dup of 434927 *** This bug has been marked as a duplicate of 434927 ***