436475 – [5.2][kdump] "Bad page state in process 'swapper'" warnings

Bug 436475 - [5.2][kdump] "Bad page state in process 'swapper'" warnings

Summary: [5.2][kdump] "Bad page state in process 'swapper'" warnings

Keywords:
Status:	CLOSED DUPLICATE of bug 434927
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	5.2
Hardware:	ia64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-03-07 14:32 UTC by Qian Cai
Modified:	2008-03-26 13:32 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-03-26 13:32:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Qian Cai 2008-03-07 14:32:34 UTC

Description of problem:
Although seems harmless, and it can still successful get a vmcore later, there
were numerous warnings while capture kernel was booting up, and RHEL 5.1 version
of kexec-tools (1.101-194.el5) does not have this problem.

...
Freeing initrd memory: 5664kB freed
Bad page state in process 'swapper'
page:e0000000110eeed8 flags:0x0000000000000000 mapping:0000000000000000
mapcount:1 count:0 (Not tainted)
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace:
 [<a000000100013ae0>] show_stack+0x40/0xa0
                                sp=e000000015697b40 bsp=e0000000156912a8
 [<a000000100013b70>] dump_stack+0x30/0x60
                                sp=e000000015697d10 bsp=e000000015691290
 [<a00000010010a260>] bad_page+0xe0/0x160
                                sp=e000000015697d10 bsp=e000000015691248
 [<a00000010010aa30>] free_hot_cold_page+0x110/0x320
                                sp=e000000015697d20 bsp=e000000015691200
 [<a00000010010ad70>] free_hot_page+0x30/0x60
                                sp=e000000015697d20 bsp=e0000000156911d8
 [<a00000010010d010>] __free_pages+0xb0/0x100
                                sp=e000000015697d20 bsp=e0000000156911b0
 [<a00000010010d1e0>] free_pages+0x180/0x1a0
                                sp=e000000015697d20 bsp=e000000015691188
 [<a000000100760dc0>] free_initrd_mem+0x1e0/0x2e0
                                sp=e000000015697d20 bsp=e000000015691160
 [<a000000100753410>] free_initrd+0x130/0x180
                                sp=e000000015697d30 bsp=e000000015691128
 [<a000000100756460>] populate_rootfs+0x1e0/0x200
                                sp=e000000015697d30 bsp=e0000000156910f8
 [<a0000001007487d0>] init+0x3d0/0x780
                                sp=e000000015697d30 bsp=e0000000156910c8
 [<a0000001000121b0>] kernel_thread_helper+0x30/0x60
                                sp=e000000015697e30 bsp=e0000000156910a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e000000015697e30 bsp=e0000000156910a0
Bad page state in process 'swapper'
page:e0000000110eef10 flags:0x0000000000000000 mapping:0000000000000000
mapcount:1 count:0 (Tainted: G    B)
Trying to fix it up, but a reboot is needed 
...

Full log:
https://bugzilla.redhat.com/attachment.cgi?id=296597

Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-10.el5 with the patch from BZ 434927#28
kernel-2.6.18-83.el5
RHEL5.2-Server-20080224.nightly

How reproducible:
Always. You can try hp-lp1.rhts.boston.redhat.com or
hp-rx1620-01.rhts.boston.redhat.com.

Steps to Reproduce:
1. configure kdump and crashkernel=512M@256M.
2. echo c >/proc/sysrq-trigger

Comment 1 Neil Horman 2008-03-07 16:43:54 UTC

They are harmless, as long as they can be fixed up.  I'm not sure that I'll be
able to get to this by 5.2, but I'll try.

Comment 2 Neil Horman 2008-03-17 12:25:20 UTC

Note to self: So, I'm looking a little deeper into this, and it seems these
calls are occuring because our page tables are perhaps running off the edge of
memory.  Not sure why this is happening all of a sudden.  A bisect of the kernel
may be in order here. ..

Comment 3 Neil Horman 2008-03-19 18:32:43 UTC

Cai, can you try the --noio option on this bug as well, with kexec-tools-1.102
without my kexec patch?  This may be closeable as well.  Thanks!

Comment 4 Qian Cai 2008-03-19 22:49:09 UTC

Those warnings ONLY happens when adding the patch from BZ #434927 to
kexec-tools. If without that patch, even without "--noio" option for most of
IA64 systems, we can't see those warnings, but we will have a zero-size vmcore
there.

Comment 5 Doug Chapman 2008-03-25 17:32:45 UTC

I tried as an experiment changing my crashkernel param from 512M@256M to
1024M@256M and I get different behavior.  Not sure if this sheds any light on
the problem or not:

...
Kernel command line: BOOT_IMAGE=scsi0:EFI\redhat\vmlinuz-2.6.18-prep
root=/dev/VolGroup00/LogVol00  ro irqpoll maxcpus=1 reset_devices machvec=dig
machvec=dig verbose elfcorehdr=5242768K max_addr=5120M min_addr=4096M
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
low bootmem alloc of 67108864 bytes failed!
Kernel panic - not syncing: Out of low memory

Comment 6 Doug Chapman 2008-03-26 00:22:33 UTC

I have been doing a bunch of digging on this.  So far I am unable to find a
culprit, everything appears to be getting done properly by kexec but _something_
obviously is wrong.

I know the following so far:

1. the old kexec-tools from RHEL5.1 works just fine (with either this kernel or
the RHEL5.1 kernel)

2. I can reproduce the same issues with the stock kexec tool without any Red Hat
patches.

So, because of #2 I think the best debug method at this point is to try to
figure out what broke kexec upstream by using git-bisect.

Comment 7 Neil Horman 2008-03-26 13:32:55 UTC

Thanks to doug, we've found the upstream change that causes this regression, and
it co-incides with the change that we need to revert to fix the zero-size vmcore
on ia64 bug.  Closing this as a dup of 434927

*** This bug has been marked as a duplicate of 434927 ***

Note You need to log in before you can comment on or make changes to this bug.