Bug 700902 - [RHEL6.1] Kdump kernel panics on specific host running 2.6.32-131.0.10.el6
Summary: [RHEL6.1] Kdump kernel panics on specific host running 2.6.32-131.0.10.el6
Keywords:
Status: CLOSED DUPLICATE of bug 689026
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-29 19:04 UTC by PaulB
Modified: 2011-08-04 17:28 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-29 15:11:29 UTC
Target Upstream Version:


Attachments (Terms of Use)
console logs of the crash of second kernel (19.76 KB, text/plain)
2011-05-24 17:11 UTC, Vivek Goyal
no flags Details
boot logs with "bootmem_debug" and "debug" kernel command line options (39.18 KB, text/plain)
2011-05-24 19:23 UTC, Vivek Goyal
no flags Details

Description PaulB 2011-04-29 19:04:56 UTC
Description of problem:
KDump kernel panics on specific host running 2.6.32-131.0.10.el6

Version-Release number of selected component (if applicable):
2.6.32-131.0.10.el6

How reproducible:
See details in following comment.
  
Actual results:
System PANICS while booting into Kdump kernel.

Expected results:
Kdump kernel should successfully boot.

Additional info:
This issue was seen during Kdump testing for 2.6.32-131.0.10.el6.

See details in following comment.

Comment 3 Vivek Goyal 2011-05-02 14:04:08 UTC
Larry, 

Does this sound like some sort of free pages list corruption in kdump kenrel

Comment 4 Vivek Goyal 2011-05-02 14:25:41 UTC
Paul, is this bug reproducible

Comment 5 Dave Anderson 2011-05-02 14:35:25 UTC
> BUG: unable to handle kernel paging request at ffffea0000deddb0

The crash happens when the page structure at ffffea0000deddb0
is accessed by the list_del(&page->lru) call below:
  
  static inline
  struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
                                                  int migratetype)
  {
          unsigned int current_order;
          struct free_area * area;
          struct page *page;
  
          /* Find a page of the appropriate size in the preferred list */
          for (current_order = order; current_order < MAX_ORDER; ++current_order) {
                  area = &(zone->free_area[current_order]);
                  if (list_empty(&area->free_list[migratetype]))
                          continue;
  
                  page = list_entry(area->free_list[migratetype].next,
                                                          struct page, lru);
                  list_del(&page->lru);
                  rmv_page_order(page);
                  area->nr_free--;
                  expand(zone, page, order, current_order, area, migratetype);
                  return page;
          }
  
          return NULL;
  }
  
It's vmemmap'd page structure address, where the vmmemmap region
starts at ffffea0000000000, so ffffea0000deddb0 would reference
the page at 0xdeddb0/sizeof(struct page):
  
  crash> eval deddb0 / 56
  hexadecimal: 3fad0  
      decimal: 260816  
        octal: 775320
       binary: 0000000000000000000000000000000000000000000000111111101011010000
  crash> eval 260816 * 4k
  hexadecimal: 3fad0000  (1043264KB)
      decimal: 1068302336  
        octal: 7753200000
       binary: 0000000000000000000000000000000000111111101011010000000000000000
  crash> eval 3fad0000 / 1m
  hexadecimal: 3fa  
      decimal: 1018  
        octal: 1772
       binary: 0000000000000000000000000000000000000000000000000000001111111010
  crash> 
  
And physical address 3fad0000 is almost at the 1GB physical address.

I'm not clear on what all the "memmap=" parameters are for, but
the one that reads memmap=261488K@33404K would seemingly imply
that it was crashkernel=256M@32M.  So there shouldn't be any
pages up at the 1GB region AFAIK.  

But I may be completely wrong, because I don't know that all of the
other "memmap=" are there for.  Are they also regions that are used
by the second kernel?

Comment 6 PaulB 2011-05-02 15:30:15 UTC
Vivek,
See bottom of comment #1 for reproducer.

-pbunyan

Comment 7 Vivek Goyal 2011-05-02 15:44:10 UTC
(In reply to comment #5)
> And physical address 3fad0000 is almost at the 1GB physical address.
> 
> I'm not clear on what all the "memmap=" parameters are for, but
> the one that reads memmap=261488K@33404K would seemingly imply
> that it was crashkernel=256M@32M.  So there shouldn't be any
> pages up at the 1GB region AFAIK.  
> 
> But I may be completely wrong, because I don't know that all of the
> other "memmap=" are there for.  Are they also regions that are used
> by the second kernel?

Dave, you are right that it looks like we used 256M@32M. It is interesting that we are trying to access physical page at 1GB, which is not even present in memmap passed to second kernel.

BIOS-provided physical RAM map: 
 BIOS-e820: 0000000000000100 - 00000000000a0000 (usable) 
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) 
 BIOS-e820: 0000000000100000 - 000000003fe8cc00 (usable) 
 BIOS-e820: 000000003fe8cc00 - 000000003fe8ec00 (ACPI NVS) 
 BIOS-e820: 000000003fe8ec00 - 000000003fe90c00 (ACPI data) 
 BIOS-e820: 000000003fe90c00 - 0000000040000000 (reserved) 
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) 
 BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved) 
 BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved) 
 BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) 
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) 
last_pfn = 0x3fe8c max_arch_pfn = 0x400000000 
user-defined physical RAM map: 
 user: 0000000000000000 - 0000000000001000 (reserved) 
 user: 0000000000001000 - 00000000000a0000 (usable) 
 user: 00000000000f0000 - 0000000000100000 (reserved) 
 user: 000000000209f000 - 0000000011ffb000 (usable) 
 user: 000000003fe8cc00 - 000000003fe90c00 (ACPI data) 
 user: 000000003fe90c00 - 0000000040000000 (reserved) 
 user: 00000000e0000000 - 00000000f0000000 (reserved) 
 user: 00000000fec00000 - 00000000fed00400 (reserved) 
 user: 00000000fed20000 - 00000000feda0000 (reserved) 
 user: 00000000fee00000 - 00000000fef00000 (reserved) 
 user: 00000000ffb00000 - 0000000100000000 (reserved) 

Notice that BIOS memory map says that there is physical memory at 1GB. We should have marked it as reserved (by kexec-tools) in user defined memory
map but it does not seem to be the case. So that sounds little bit fishy.

I see following memmap entries which ask second kernel to mark some ranges as reserved.

memmap=1469K$1047107K memmap=262144K$3670016K memmap=1025K$4173824K memmap=512K$4174976K memmap=1024K$4175872K memmap=5120K$4189184K 

So we have not asked second kernel to mark some memory as reserved. That's why it is not marked as reserved.

It might not necessarily be a bug because Neil mentioned that marking some ranges as reserved was introduced primarily because some ACPI data or some other data was present there in some HP systems. So generally in regular RAM
one would not put that data.

So while it might be desirable to fix it for regular RAM also, but it might not necessarily be a bug.

The bigger question first would be to figure out why are we putting a page out of reserved region in free list in second kernel.

Comment 8 Dave Anderson 2011-05-02 15:57:12 UTC
> Dave, you are right that it looks like we used 256M@32M

I wonder if it's reproducible on the primary kernel by passing "mem=288M"
on the boot command line?

Comment 9 Vivek Goyal 2011-05-02 21:16:43 UTC
(In reply to comment #8)
 
> I wonder if it's reproducible on the primary kernel by passing "mem=288M"
> on the boot command line?

I booted first kernel with mem=256M and it boots fine. So reserving 256MB does not seem to be an issue.

Comment 10 Vivek Goyal 2011-05-02 21:35:22 UTC
I am trying to reproduce the issue but it seems to be failing in different ways. This time it failed because it can't find root in second kernel.

Initalizing network drop monitor service
md: Waiting for all devices to be available before autodetect
md: If you don't use raid, use raid=noautodetect
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
VFS: Cannot open root device "mapper/vg_dellpesc142001-lv_root" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
Pid: 1, comm: swapper Not tainted 2.6.32-131.0.12.el6.i686 #1
Call Trace:
 [<c2821fde>] ? panic+0x42/0xf9
 [<c2a60dba>] ? mount_block_root+0x1ce/0x263
 [<c2a60ff4>] ? prepare_namespace+0x14b/0x191
 [<c25265ef>] ? sys_access+0x1f/0x30
 [<c2a6043b>] ? kernel_init+0x227/0x235
 [<c2a60214>] ? kernel_init+0x0/0x235
 [<c240a03f>] ? kernel_thread_helper+0x7/0x10

Comment 11 Jeff Burke 2011-05-03 00:40:33 UTC
The original issue was reported against 2.6.32-131.0.10.el6.x86_64 I would recommend staying with the kernel and arch that this issue was reported against. By changing both the kernel and arch you may be hitting a new issue that we have not seen yet.

Comment 12 Cong Wang 2011-05-03 05:00:44 UTC
(In reply to comment #7)
> 
> Notice that BIOS memory map says that there is physical memory at 1GB. We
> should have marked it as reserved (by kexec-tools) in user defined memory
> map but it does not seem to be the case. So that sounds little bit fishy.

Yes, this sounds like a bug.

> 
> I see following memmap entries which ask second kernel to mark some ranges as
> reserved.
> 
> memmap=1469K$1047107K memmap=262144K$3670016K memmap=1025K$4173824K
> memmap=512K$4174976K memmap=1024K$4175872K memmap=5120K$4189184K 
> 
> So we have not asked second kernel to mark some memory as reserved. That's why
> it is not marked as reserved.


I would like to see some debugging information on that machine, please enable DEBUG (-DDEBUG) and recompile kexec-tools srpm, let's see what we will get.

> 
> The bigger question first would be to figure out why are we putting a page out
> of reserved region in free list in second kernel.

This is interesting, I think the kernel might consider these memory as RAM? otherwise we don't have this bug, right?

Comment 13 Cong Wang 2011-05-03 05:04:56 UTC
(In reply to comment #12)
> > The bigger question first would be to figure out why are we putting a page out
> > of reserved region in free list in second kernel.
> 
> This is interesting, I think the kernel might consider these memory as RAM?
> otherwise we don't have this bug, right?

Er, we use memmap=exactmap in the second kernel, so the kernel should only use what we specified via memmap=.

Comment 14 Vivek Goyal 2011-05-16 22:22:13 UTC
This bug seems to be duplicate of 690301. There also we see some vm data structure corruption and see a trace 

 [<ffffffff814de462>] ? do_general_protection+0x152/0x160
 [<ffffffff814ddc35>] ? general_protection+0x25/0x30
 [<ffffffff81272f20>] ? list_del+0x10/0xa0
 [<ffffffff8111cc85>] ? __rmqueue+0xc5/0x490
 [<ffffffff8111eb08>] ? get_page_from_freelist+0x598/0x820

I am not closing it a duplicate of that bz yet.

Comment 15 Cong Wang 2011-05-17 02:51:04 UTC
Yes, probably. I am still trying to get some debugging info on that machine.

Comment 16 Vivek Goyal 2011-05-24 17:09:53 UTC
Ok, freshly installed this system and reproduced the issue very first time I tried it. I had reserved 128MB physical memory at 32MB physical address. That means second kernel should not have accessed a physical memory beyond 160MB.

Uploading the dmesg.

Comment 17 Vivek Goyal 2011-05-24 17:11:15 UTC
Created attachment 500661 [details]
console logs of the crash of second kernel

Comment 18 Vivek Goyal 2011-05-24 19:22:34 UTC
Did another test where I enabled "bootmem_debug" and "debug" kernel command line options in kdump kernel.

Looking at bootmem debug output it looks like that bootmem allocator released right amount of memory.

There are also two WARN() messages in __list_add() which tell that some list
is corrupted. 

I think bootmem allocator did its job right. It is later when things got corrupted. (While freeing some slab/slub cache etc).

Attaching the boot log

Comment 19 Vivek Goyal 2011-05-24 19:23:40 UTC
Created attachment 500676 [details]
boot logs with "bootmem_debug" and "debug" kernel command line options

Comment 20 Vivek Goyal 2011-05-24 19:25:57 UTC
So to me it looks like that some how freelist got corrupted. How did we reach there, no clue yet.

Comment 21 Cong Wang 2011-06-10 09:23:09 UTC
Vivek, that still doesn't address Dave's concern, can you add "mminit_loglevel=4" too?


Note You need to log in before you can comment on or make changes to this bug.