Hide Forgot
Description of problem: KDump kernel panics on specific host running 2.6.32-131.0.10.el6 Version-Release number of selected component (if applicable): 2.6.32-131.0.10.el6 How reproducible: See details in following comment. Actual results: System PANICS while booting into Kdump kernel. Expected results: Kdump kernel should successfully boot. Additional info: This issue was seen during Kdump testing for 2.6.32-131.0.10.el6. See details in following comment.
Larry, Does this sound like some sort of free pages list corruption in kdump kenrel
Paul, is this bug reproducible
> BUG: unable to handle kernel paging request at ffffea0000deddb0 The crash happens when the page structure at ffffea0000deddb0 is accessed by the list_del(&page->lru) call below: static inline struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, int migratetype) { unsigned int current_order; struct free_area * area; struct page *page; /* Find a page of the appropriate size in the preferred list */ for (current_order = order; current_order < MAX_ORDER; ++current_order) { area = &(zone->free_area[current_order]); if (list_empty(&area->free_list[migratetype])) continue; page = list_entry(area->free_list[migratetype].next, struct page, lru); list_del(&page->lru); rmv_page_order(page); area->nr_free--; expand(zone, page, order, current_order, area, migratetype); return page; } return NULL; } It's vmemmap'd page structure address, where the vmmemmap region starts at ffffea0000000000, so ffffea0000deddb0 would reference the page at 0xdeddb0/sizeof(struct page): crash> eval deddb0 / 56 hexadecimal: 3fad0 decimal: 260816 octal: 775320 binary: 0000000000000000000000000000000000000000000000111111101011010000 crash> eval 260816 * 4k hexadecimal: 3fad0000 (1043264KB) decimal: 1068302336 octal: 7753200000 binary: 0000000000000000000000000000000000111111101011010000000000000000 crash> eval 3fad0000 / 1m hexadecimal: 3fa decimal: 1018 octal: 1772 binary: 0000000000000000000000000000000000000000000000000000001111111010 crash> And physical address 3fad0000 is almost at the 1GB physical address. I'm not clear on what all the "memmap=" parameters are for, but the one that reads memmap=261488K@33404K would seemingly imply that it was crashkernel=256M@32M. So there shouldn't be any pages up at the 1GB region AFAIK. But I may be completely wrong, because I don't know that all of the other "memmap=" are there for. Are they also regions that are used by the second kernel?
Vivek, See bottom of comment #1 for reproducer. -pbunyan
(In reply to comment #5) > And physical address 3fad0000 is almost at the 1GB physical address. > > I'm not clear on what all the "memmap=" parameters are for, but > the one that reads memmap=261488K@33404K would seemingly imply > that it was crashkernel=256M@32M. So there shouldn't be any > pages up at the 1GB region AFAIK. > > But I may be completely wrong, because I don't know that all of the > other "memmap=" are there for. Are they also regions that are used > by the second kernel? Dave, you are right that it looks like we used 256M@32M. It is interesting that we are trying to access physical page at 1GB, which is not even present in memmap passed to second kernel. BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fe8cc00 (usable) BIOS-e820: 000000003fe8cc00 - 000000003fe8ec00 (ACPI NVS) BIOS-e820: 000000003fe8ec00 - 000000003fe90c00 (ACPI data) BIOS-e820: 000000003fe90c00 - 0000000040000000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved) BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) last_pfn = 0x3fe8c max_arch_pfn = 0x400000000 user-defined physical RAM map: user: 0000000000000000 - 0000000000001000 (reserved) user: 0000000000001000 - 00000000000a0000 (usable) user: 00000000000f0000 - 0000000000100000 (reserved) user: 000000000209f000 - 0000000011ffb000 (usable) user: 000000003fe8cc00 - 000000003fe90c00 (ACPI data) user: 000000003fe90c00 - 0000000040000000 (reserved) user: 00000000e0000000 - 00000000f0000000 (reserved) user: 00000000fec00000 - 00000000fed00400 (reserved) user: 00000000fed20000 - 00000000feda0000 (reserved) user: 00000000fee00000 - 00000000fef00000 (reserved) user: 00000000ffb00000 - 0000000100000000 (reserved) Notice that BIOS memory map says that there is physical memory at 1GB. We should have marked it as reserved (by kexec-tools) in user defined memory map but it does not seem to be the case. So that sounds little bit fishy. I see following memmap entries which ask second kernel to mark some ranges as reserved. memmap=1469K$1047107K memmap=262144K$3670016K memmap=1025K$4173824K memmap=512K$4174976K memmap=1024K$4175872K memmap=5120K$4189184K So we have not asked second kernel to mark some memory as reserved. That's why it is not marked as reserved. It might not necessarily be a bug because Neil mentioned that marking some ranges as reserved was introduced primarily because some ACPI data or some other data was present there in some HP systems. So generally in regular RAM one would not put that data. So while it might be desirable to fix it for regular RAM also, but it might not necessarily be a bug. The bigger question first would be to figure out why are we putting a page out of reserved region in free list in second kernel.
> Dave, you are right that it looks like we used 256M@32M I wonder if it's reproducible on the primary kernel by passing "mem=288M" on the boot command line?
(In reply to comment #8) > I wonder if it's reproducible on the primary kernel by passing "mem=288M" > on the boot command line? I booted first kernel with mem=256M and it boots fine. So reserving 256MB does not seem to be an issue.
I am trying to reproduce the issue but it seems to be failing in different ways. This time it failed because it can't find root in second kernel. Initalizing network drop monitor service md: Waiting for all devices to be available before autodetect md: If you don't use raid, use raid=noautodetect md: Autodetecting RAID arrays. md: Scanned 0 and added 0 devices. md: autorun ... md: ... autorun DONE. VFS: Cannot open root device "mapper/vg_dellpesc142001-lv_root" or unknown-block(0,0) Please append a correct "root=" boot option; here are the available partitions: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) Pid: 1, comm: swapper Not tainted 2.6.32-131.0.12.el6.i686 #1 Call Trace: [<c2821fde>] ? panic+0x42/0xf9 [<c2a60dba>] ? mount_block_root+0x1ce/0x263 [<c2a60ff4>] ? prepare_namespace+0x14b/0x191 [<c25265ef>] ? sys_access+0x1f/0x30 [<c2a6043b>] ? kernel_init+0x227/0x235 [<c2a60214>] ? kernel_init+0x0/0x235 [<c240a03f>] ? kernel_thread_helper+0x7/0x10
The original issue was reported against 2.6.32-131.0.10.el6.x86_64 I would recommend staying with the kernel and arch that this issue was reported against. By changing both the kernel and arch you may be hitting a new issue that we have not seen yet.
(In reply to comment #7) > > Notice that BIOS memory map says that there is physical memory at 1GB. We > should have marked it as reserved (by kexec-tools) in user defined memory > map but it does not seem to be the case. So that sounds little bit fishy. Yes, this sounds like a bug. > > I see following memmap entries which ask second kernel to mark some ranges as > reserved. > > memmap=1469K$1047107K memmap=262144K$3670016K memmap=1025K$4173824K > memmap=512K$4174976K memmap=1024K$4175872K memmap=5120K$4189184K > > So we have not asked second kernel to mark some memory as reserved. That's why > it is not marked as reserved. I would like to see some debugging information on that machine, please enable DEBUG (-DDEBUG) and recompile kexec-tools srpm, let's see what we will get. > > The bigger question first would be to figure out why are we putting a page out > of reserved region in free list in second kernel. This is interesting, I think the kernel might consider these memory as RAM? otherwise we don't have this bug, right?
(In reply to comment #12) > > The bigger question first would be to figure out why are we putting a page out > > of reserved region in free list in second kernel. > > This is interesting, I think the kernel might consider these memory as RAM? > otherwise we don't have this bug, right? Er, we use memmap=exactmap in the second kernel, so the kernel should only use what we specified via memmap=.
This bug seems to be duplicate of 690301. There also we see some vm data structure corruption and see a trace [<ffffffff814de462>] ? do_general_protection+0x152/0x160 [<ffffffff814ddc35>] ? general_protection+0x25/0x30 [<ffffffff81272f20>] ? list_del+0x10/0xa0 [<ffffffff8111cc85>] ? __rmqueue+0xc5/0x490 [<ffffffff8111eb08>] ? get_page_from_freelist+0x598/0x820 I am not closing it a duplicate of that bz yet.
Yes, probably. I am still trying to get some debugging info on that machine.
Ok, freshly installed this system and reproduced the issue very first time I tried it. I had reserved 128MB physical memory at 32MB physical address. That means second kernel should not have accessed a physical memory beyond 160MB. Uploading the dmesg.
Created attachment 500661 [details] console logs of the crash of second kernel
Did another test where I enabled "bootmem_debug" and "debug" kernel command line options in kdump kernel. Looking at bootmem debug output it looks like that bootmem allocator released right amount of memory. There are also two WARN() messages in __list_add() which tell that some list is corrupted. I think bootmem allocator did its job right. It is later when things got corrupted. (While freeing some slab/slub cache etc). Attaching the boot log
Created attachment 500676 [details] boot logs with "bootmem_debug" and "debug" kernel command line options
So to me it looks like that some how freelist got corrupted. How did we reach there, no clue yet.
Vivek, that still doesn't address Dave's concern, can you add "mminit_loglevel=4" too?