Bug 239791

Summary: kdump incomplete if you set "mem=" parameter on x86_64
Product: Red Hat Enterprise Linux 5 Reporter: masanari iida <masanari_iida>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: medium    
Version: 5.0CC: duck, dzickus, jarod, mgahagan, phan, qcai, reto.kisseleff
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
During a core dump, kexec creates a header to reference all the memory blocks in the system. Previously, the kdump kernel command line supported a "mem=" parameter that limited the memory that was dumped. When this parameter was set, kexec could not read past the limit when creating the header, and the dump would result in an I/O error. The "mem=" parameter has been removed from kexec to ensure that core dumps succeed. Users of kexec should use makedumpfile filtering to hide superfluous results.
Story Points: ---
Clone Of:
: 600584 (view as bug list) Environment:
Last Closed: 2010-03-30 07:47:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to limit vmcoreinfo size when mem= kernel parameter is used
none
test rpm
none
new patch to fix this problem
none
version 3 of teh honor mem= patch
none
new approach to honoring mem= none

Description masanari iida 2007-05-11 12:26:19 UTC
Description of problem:
During my kdump test, in order to speed up the dump test, I set "mem="
parameter in /etc/grub.conf.
After that, all type of kdump on x86_64 started to fail.
Always the file is "vmcore-incomplete" and the actual size of the 
vmcore-incomplete is smaller than expect.

This symptom reproducible with any type of kdump (network or local disk)
on x86_64 box.
This symptom can not be reproducible on IA32 box.

Version-Release number of selected component (if applicable):
kerel 2.6.18-8
kexec-tools-1.101-164.el5
Architecture x86_64 

How reproducible:
Always

Steps to Reproduce:
1. Set "mem=" in /etc/grub.conf
The sysmtem has 5GB of RAM.

--------- /etc/grub.conf --------
default=0
timeout=5
title RHEL5-Server-x8664 (2.6.18-8.el5)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.18-8.el5 ro root=LABEL=/ rhgb quiet
crashkernel=128M@16M console=tty0 console=ttyS0,115200 mem=2000m
        initrd /boot/initrd-2.6.18-8.el5.img
--------------------------------------

2. Configure kdump.
3. Crash the system.
  
Actual results:

# cd /var/crash/
 # ls
 127.0.0.1-2007-05-09-09:58:36  127.0.0.1-2007-05-09-10:31:57  2007-05-09-09:51
 127.0.0.1-2007-05-09-10:05:09  127.0.0.1-2007-05-09-14:30:17  2007-05-09-14:31
 # cd 2007-05-09-14\:31/
 # ls -al
 total 298264
 drwxr-xr-x 2 root root       4096 May  9 14:31 .
 drwxr-xr-x 8 root root       4096 May  9 14:31 ..
 -r-------- 1 root root 1967460352 May  9 14:31 vmcore-incomplete
 # ls -s
 total 298252
 298252 vmcore-incomplete
 #

Expected results:
kdump complete with "mem=" parameter.


Additional info:

Comment 1 Dave Anderson 2007-08-01 12:50:50 UTC
> # cd /var/crash/
> # ls
> 127.0.0.1-2007-05-09-09:58:36  127.0.0.1-2007-05-09-10:31:57  2007-05-09-09:51
> 127.0.0.1-2007-05-09-10:05:09  127.0.0.1-2007-05-09-14:30:17  2007-05-09-14:31
> # cd 2007-05-09-14\:31/
> # ls -al
> total 298264
> drwxr-xr-x 2 root root       4096 May  9 14:31 .
> drwxr-xr-x 8 root root       4096 May  9 14:31 ..
> -r-------- 1 root root 1967460352 May  9 14:31 vmcore-incomplete
> # ls -s
> total 298252
> 298252 vmcore-incomplete
> #

The only way a "vmcore-incomplete" file can be created when the
secondary kernel runs is here in the "kdump.init" script:



function save_core()
{
        coredir="/var/crash/`date +"%Y-%m-%d-%H:%M"`"

        mkdir -p $coredir
        cp /proc/vmcore $coredir/vmcore-incomplete
        if [ $? == 0 ]; then
                mv $coredir/vmcore-incomplete $coredir/vmcore
                $LOGGER "saved a vmcore to $coredir"
        else
                $LOGGER "failed to save a vmcore to $coredir"
        fi
}

So the question is why did the "cp" fail?  Is there enough space
in the /var/crash partition?




Comment 2 Dave Anderson 2007-08-01 15:41:51 UTC
Interesting, on a 4GB system with this grub line:

  kernel /vmlinuz-2.6.18-36.el5 ro root=/dev/VolGroup00/LogVol00
  console=ttyS0,115200 crashkernel=128M@16M mem=2000m

I can reproduce the symptom:
 
  # ls -l
  total 435300
  -r-------- 1 root root 1967558656 Aug  1 10:57 vmcore-incomplete
  #

Although the vmcore-incomplete file is really not "incomplete", and
is quite usable:

  # crash /usr/lib/debug/lib/modules/2.6.18-36.el5/vmlinux vmcore-incomplete

  crash 4.0-4.3.1
  Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005  NEC Corporation
  Copyright (C) 1999, 2002  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.

  GNU gdb 6.1
  Copyright 2004 Free Software Foundation, Inc.
  GDB is free software, covered by the GNU General Public License, and you are
  welcome to change it and/or distribute copies of it under certain conditions.
  Type "show copying" to see the conditions.
  There is absolutely no warranty for GDB.  Type "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...

        KERNEL: /usr/lib/debug/lib/modules/2.6.18-36.el5/vmlinux
      DUMPFILE: vmcore-incomplete
          CPUS: 4
          DATE: Wed Aug  1 10:57:05 2007
        UPTIME: 00:04:47
  LOAD AVERAGE: 0.02, 0.13, 0.07
         TASKS: 118
      NODENAME: nec-em17.rhts.boston.redhat.com
       RELEASE: 2.6.18-36.el5
       VERSION: #1 SMP Fri Jul 20 14:26:46 EDT 2007
       MACHINE: x86_64  (2992 Mhz)
        MEMORY: 1.9 GB
         PANIC: "SysRq : Trigger a crashdump"
           PID: 3033
       COMMAND: "bash"
          TASK: ffff810076f11100  [THREAD_INFO: ffff81006e82e000]
           CPU: 0
         STATE: TASK_RUNNING (SYSRQ)

  crash>

I did capture the size of the /proc/vmcore file as it existed just prior
to the "cp", which shows it as 4GB in size:

  -r-------- 1 root root 4164192032 Aug  1 10:57 /proc/vmcore

I would have thought it would have indicated the restricted size?

Also, I don't know why the "cp" failed -- I'll try capturing the
exit status.



Comment 3 Dave Anderson 2007-08-01 18:08:05 UTC
Well, "cp" just returns 1.  Not much help there...

I also tried putting the mem= before the crashkernel=,
but the behaviour is the same.  

The (usable) vmcore-incomplete file is consistently the same
(correct) size, the /proc/vmcore file is consistently the same
size, and "cp" always returns a 1:

  # ls -l
  -r-------- 1 root root 4164192032 Aug  1 10:57 /proc/vmcore
  #

  # ls -l
  -r-------- 1 root root 1967558656 Aug  1 11:47 vmcore-incomplete
  #

I've put a query on the kexec mailing list.



Comment 4 masanari iida 2007-08-02 09:32:50 UTC
> So the question is why did the "cp" fail?  Is there enough space
> in the /var/crash partition?

Yes.
I had 143GB free for this 2GB vmcore file.
As you have succesfully reproduced this symptom by yourself,
I believe you understood that this symptom really exist even
with enough free space.
Thanks for support.



Comment 6 Neil Horman 2009-06-05 13:43:28 UTC
This looks like it just a problem with the size computation of the /proc/vmcore file.  The information we base that size off of likely doesn't take into account the mem= parameter restriction.  I'll see if I can put together a patch.

Comment 7 Neil Horman 2009-06-05 14:50:45 UTC
Note to self: This looks to be a userspace problem.  /proc/vmcore has its size set based on the elfcorehdr that gets passed in when kexec loads.  The kexec userspace utility builds that data structure, and populates the amount of ram it points to based on /proc/iomem, which does not reflect any mem= kernel commandline settings.  Since /proc/iomem sees all the memory, we build an elf header to grab it, and as such we have a larger than necesscary /proc/vmcore file.  the cp utility tries to copy it, but after we pass the mem= border, we start getting failures.  So we get the whole core, but the cp exits with an error message causing us not to rename the vmcore.  I think the best fix for this is to limit the size of the elfcoreheader that we build by making sure we don't indicate more ram that we see available in /proc/meminfo.  I'll have a patch to test this afternoon I think

Comment 8 Neil Horman 2009-06-05 16:57:28 UTC
Created attachment 346683 [details]
patch to limit vmcoreinfo size when mem= kernel parameter is used

Not tested yet, but this is the patch that should work to limit vmcoreinfo size to that of used memory when mem= parameter is in use.

Comment 9 Neil Horman 2009-06-05 16:59:58 UTC
Created attachment 346684 [details]
test rpm

I'm installing a machine to test this on, but if you could test it on your system while I'm waiting, that would acclerate the fix process significantly.  Thanks!  Let me know what the results are.

Comment 10 Neil Horman 2009-06-05 20:38:10 UTC
scratch the testing, I just tried it out and its busted.  I know whats wrong though, the memory limiting code operates on blocks of ram as they are found in /proc/iomem.  If the first block it finds is larger that the mem= parameter, we won't find _any_ ram, so we need to deal with that on a more fine grained basis.  I'll work on this as soon as I'm able (of course if anyone wants to take a stab at fixing it over the weekend, that would be great :) )

Comment 11 Neil Horman 2009-06-07 01:09:19 UTC
Created attachment 346769 [details]
new patch to fix this problem

Ok, I've not fully tested it yet, but this version of the patch lets us load a kexec kernel and boot to it.  This variant properly picks up all the other info we need from /proc/iomem, even when we have all the needed ram.  I'll finish testing monday, but if you all could try this out as well, that would be great.

Comment 12 Neil Horman 2009-06-08 16:35:15 UTC
Testing is closer on this on.  It fixes the vmcore-incoimplete issue, but truncates the vmcore to too small a size, meaning my math is off on the memory ranges.  I'll have a new patch soon

Comment 13 Neil Horman 2009-06-10 20:03:31 UTC
Created attachment 347284 [details]
version 3 of teh honor mem= patch

Ok, Heres version 3 of the patch.  It works for me properly on x86_64 at the moment, but its going to need lots of testing.  Sepcifically it will need testing on x86_64, x86, and ia64 with and without the mem= parameter on the normal kernel.  I've working on x86, but if you all could test the 4 cases of x86 w & w/o mem= and ia64 w and w/o mem= that would be a big help to me.  Thanks!

Comment 14 Neil Horman 2009-06-10 20:03:55 UTC
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1837884

Heres the build to test with.  Thanks!

Comment 15 Qian Cai 2009-06-11 14:31:05 UTC
Have not tested with "mem=" option yet, but test results did not look good without the option. The generated VMCores seem broken.

crash 4.0-8.9.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
NOTE: stdin: not a tty

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

crash: read error: kernel virtual address: ffff81022d68ea84  type: "array cache limit"
crash: unable to initialize kmem slab cache subsystem


WARNING: cannot access vmalloc'd module memory


crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link

crash: cannot read pid_hash node pid_link
crash: read error: kernel virtual address: ffff81022fc38000  type: "fill_thread_info"
crash: read error: kernel virtual address: ffff81022fc16000  type: "fill_thread_info"
crash: read error: kernel virtual address: ffff81022ff7d7a0  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022f0b07e0  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ffd9820  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff72820  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022f0fe7a0  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022f0ad100  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022c11c7a0  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022fa347e0  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff70040  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff720c0  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022fc0e000  type: "fill_thread_info"
crash: read error: kernel virtual address: ffff81022ff7d040  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022f0af040  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022f0c3100  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff71080  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022f0ad860  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff707a0  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff7e080  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff79860  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022fc0c000  type: "fill_thread_info"
crash: read error: kernel virtual address: ffff81022ffcd080  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022f0c3860  type: "fill_task_struct"
crash: read error: kernel virtual address: ffff81022ff717e0  type: "fill_task_struct"
WARNING: active task ffff81022914a080 on cpu 0 not found in PID hash

crash: read error: kernel virtual address: ffff81022914a080  type: "fill_task_struct"

crash: task does not exist: ffff81022914a080


crash 4.0-8.9.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
NOTE: stdin: not a tty

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "ia64-unknown-linux-gnu"...

WARNING: cannot read ia64_boot_param: memory verification will not be performed

crash: read error: kernel virtual address: e0000040f1b2b484  type: "array cache limit"
crash: unable to initialize kmem slab cache subsystem

crash: read error: kernel virtual address: e0000040ffc6c000  type: "pmd page"

Comment 17 Neil Horman 2009-06-11 15:11:56 UTC
dang it, I see the problem.  Unfortunately, I'm not sure how to fix it.  The needed space that we have to add back can only be computed after we know information that we gather during the /proc/iomem interrogation.  Grr, we'd need two passes through that code.  

I have another approach we can try.  Gimmie a bit

Comment 18 Neil Horman 2009-06-12 20:02:42 UTC
Created attachment 347659 [details]
new approach to honoring mem=

Ok, heres a new approach to honoring the mem= patch.  Since computing the real size of the /proc/vmcore file is somewhat difficult, and very invasive to kexec, I'd like to go with this safer approach.  Its not a great solution, but I think its safer than all the rewriting we'd have to do in kexec.  It basically tells mkdumprd how big at minimum /proc/vmcore should be based on the size of memory in /proc/meminfo.  As long as we copy more than that amount of data, we ignore read errors from cp.  I've tested this and it works well for me.  Cai, can you please confirm?  Thanks!

Comment 19 Qian Cai 2009-06-15 07:48:46 UTC
Applying the patch on the top of -73.el5 version, it is getting interesting,

* if configured kdump to use makedumpfile (-c -d 31) to capture VMCores,
  it took hours during capturing process. Normally, it should only took a 
  few minutes on the ia64 system.

* even if without using makedumpfile, the cp command is still trying to copy
  the full VMCore,

  EXT3 FS on dm-0, internal journal
  EXT3-fs: mounted filesystem with ordered data mode.
  Copied 12910.7 MB / 97768.2 MB
  ...

  although the first kernel showed that "mem=" option was working,

  # cat /proc/cmdline 
  BOOT_IMAGE=scsi0:EFI\redhat\vmlinuz-2.6.18-153.el5 root=/dev/VolGroup00/LogVol01
  crashkernel=512M@256M mem=1024M ro

  # free -m
             total       used       free     shared    buffers     cached
Mem:           425        417          7          0         12        229
-/+ buffers/cache:        175        249
Swap:         4095          0       4095

Comment 20 Qian Cai 2009-06-15 07:51:53 UTC
Please provide a brew build for any patch you would like me to test if the future
if possible, so it is easier for me to run the automated tests for all platforms.

Comment 21 Neil Horman 2009-06-15 10:52:26 UTC
What your seeing isn't suprising at all.  The size argument below:
Copied 12910.7 MB / 97768.2 MB
results from the fact that the second value is taken directly from the file size.  I'll adjust that accordingly in a bit.  Currently the number is off though, as you'll see since the copy ends early.

As for the slowness issue, this patch isn't going to have any affect on how fast makedumpfile runs, that code hasn't changed at all.  You might be experiencing bz 493127, as makedumpfile may be building a big memory map to decide which pages to filter, but makedumpfile should run in exactly the same length of time that it does without this patch given the same options.  If you can demonstrate that makedumpfile is running more slowly with and without this patch, lets track that in a new bug, as you've almost certainly found a different problem there.

I'll make the additional changes noted above, and will check this in shortly.  Thanks!

Comment 22 Qian Cai 2009-06-15 13:23:36 UTC
Neil, the size isn't normal, as you can see from the above -- mem=1024M was used.

So it means that copying of VMCores did not honour "mem" option at all.

Copied 12910.7 MB / 97768.2 MB

the above was just a snapshot of the capturing process. In other words, it was still copying, and had not yet finished beyond 12910.7 MB.

Comment 23 Neil Horman 2009-06-15 14:10:15 UTC
Cai, Please read my response more closely.  I specifically from the fact the the size of /proc/vmcore is incorrect.  The code in the kexec utility that is responsible for determining that size is convoluted at best, and is used for several things that no longer really track.  The short version of the story is that I can't easily change the code that tells us how big /proc/vmcore should be without potentially breaking several other things.  So instead I came up with the patch above, which computes independently at run time what the size of /proc/vmcore _should_ be, so that it knows when it gets a full vmcore, even if it encounters errors in the copy process.  It honors the mem= option in that you get a vmcore file (rather than vmcore_incomplete) when you specify a mem= line on the kernel.  I tested this, it works, in that (as I said above) we get a vmcore file.  As for the copy sizes above, I explained that I'm going to correct those.  If there is something beyond those two items, please elaborate.

Comment 24 Qian Cai 2009-06-15 15:11:32 UTC
OK. The problem I was saying beyond those two items are,

If I specified mem=1024M, kdump kernel still capture the FULL VMCore using cp command. I have just checked the kdump kernel serial output, and it was still copying after the line I mentioned above,

Copied 12910.7 MB / 97768.2 MB

until,

Copied 97521.6 MB / 97768.2 MB
Saving core complete

Isn't the cp command supposed to get an error after saved 1024M data?

If the patch was working for you, I suppose the problem I met might be something else on that ia64 system. Anyway, do you have a brew build with the patch? I would like to test it more throughout on all platforms in RHTS tomorrow when I am in the office. Thanks!

Comment 25 Neil Horman 2009-06-15 15:42:45 UTC
Ok, that explains our discrepancy, what you describe as what should have happened for you, is exactly how it did work for me.  I'm not sure why it wouldn't, although I did my testing on the hp system you pointed me to, and it worked fine for me, copying only enough data to satisfy the mem= command line.  heres a berw build with the patch:
 http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1844737
I'm not sure what else is going wrong, let me know what your test results are

Comment 26 Neil Horman 2009-06-15 16:39:40 UTC
Wait a second, Cai, in comment #19, you showed this:

>although the first kernel showed that "mem=" option was working,
>
>  # cat /proc/cmdline 
>  BOOT_IMAGE=scsi0:EFI\redhat\vmlinuz-2.6.18-153.el5
>root=/dev/VolGroup00/LogVol01
>  crashkernel=512M@256M mem=1024M ro
>
>  # free -m
>             total       used       free     shared    buffers     cached
>Mem:           425        417          7          0         12        229
>-/+ buffers/cache:        175        249
>Swap:         4095          0       4095  

But that free command looks like it came from a point in time where the kdump kernel was booted (total free mem shows 425M which matches up to the 512M crashkernel reserve - whats used by resident kernel + initramfs).  Do you have /proc/cmdline and free output from the kernel prior to the kdump kernel booting?  Lets be sure that the mem= line was working properly there before we go looking for problems.

Comment 27 Qian Cai 2009-06-16 02:47:30 UTC
(In reply to comment #26)
> 
> But that free command looks like it came from a point in time where the kdump
> kernel was booted (total free mem shows 425M which matches up to the 512M
> crashkernel reserve - whats used by resident kernel + initramfs).  Do you have
> /proc/cmdline and free output from the kernel prior to the kdump kernel
> booting?  Lets be sure that the mem= line was working properly there before we
> go looking for problems.  

Neil, the above was taken from the first kernel, as you can see from /proc/cmdline, since the file in the second kernel will normally include something like maxcpus, irqpoll etc like,

BOOT_IMAGE=scsi0:EFI\redhat\vmlinuz-2.6.18-153.el5 root=/dev/VolGroup00/LogVol00  mem=1024m ro irqpoll maxcpus=1 reset_devices machvec=dig machvec=dig  elfcorehdr=655248K max_addr=640M min_addr=128M

Comment 28 Neil Horman 2009-06-16 11:01:12 UTC
Nm, my bad, I forgot to account for the crashkernel memory in the normal kernel.  If you mem=1024m and then set crashkernel=512M you're only left with 512M usable total memory anyway, I'm just not used to seeing an ia64 system run nominally on 512M of ram.

regardless I tried this again on an x86_64 system and it worked well, so whatever this is must be some ia64 ideosyncracy. 

Something else also occured to me.  There should be no reason that we are unable to reach memory beyond what mem= boundaries (even if we don't need to).  I wonder if the use of the mem=1024M on the kdump kernel commandline is interfering with the operation of kdump here.  Understanding that reading that extra memory might not be always desireous, Can you the current kdump on a system in which mem= is set on the nominal command line, but not on the kdump commandline?  If we can do that without getting read errors from kdump, then we probably need to consider just leaving this all alone, since I can see reasons why people might want to copy that memory.

Comment 29 Qian Cai 2009-06-16 12:00:30 UTC
Here are the results using kexec-tools-1.102pre-73.el5.bz239791 and "mem=1024m" option.

without using makedumpfile,
---------------------------
i386: UNKNNOWN

x86_64: PASSED but we were getting those warnings during initialization when analysing using the crash utility.
WARNING: /var/crash/127.0.0.1-2009-06-16-04:00:22/vmcore: may be truncated or incomplete
         PT_LOAD p_offset: 21617580
                 p_filesz: 3739877376
           bytes required: 3761494956
            dumpfile size: 944365568

ppc: FAILED. The VMCore is the as same size as without setting "mem=1024m".
-r-------- 1 root root 7861586140 Jun 16 04:02 /var/crash/127.0.0.1-2009-06-16-03:57:40/vmcore

ia64: FAILED. It was still copying after 9502.34 MB, which caused the out of the disk space on the test system.



using makedumpfile -c -d 31,
----------------------------

i386: UNKNOWN, without "mem=" option, the size of the VMCore taken from the full memory (2G) is
11401559
with the option, the size is,
10575798

x86_64: UNKNOWN, it was running out of the disk space.

ppc: UNKNOWN, but the resulted VMCore from 1G memory was much bigger than the one from 3.7G full memory,
1G:   700996036
3.7G: 26951362

ia64: UNKNOWN

Comment 30 Neil Horman 2009-06-16 13:05:09 UTC
Ok, were you using mem=1024m only in the production kerenl, or in both the production and kdump kernels?

Comment 31 Qian Cai 2009-06-16 13:36:34 UTC
Both. I can try not to use "mem" option in kdump kernel tomorrow to see if there is any difference.

Comment 32 Neil Horman 2009-06-16 13:47:07 UTC
yes, please.  That what I was trying to ask you to test in comment #28

Comment 33 Qian Cai 2009-06-17 12:02:33 UTC
Neil, removed "mem" option from the kdump kernel seems did not make any difference on the ia64 system I have just tested -- it still saved the FULL memory without honouring "mem" setting in the first kernel. The test machine I have just used was,

altix4.rhts.bos.redhat.com

Feel free to grab it.

Comment 34 Neil Horman 2009-06-17 12:27:00 UTC
yeah, thats expected.  What I'm more interested in is if it saves the whole core on systems that previously exhibited the problem (i.e. x86_64).  My thought above was that kexec sets up the vmcore elf header to reference all the memory blocks in the system (based on /proc/iomem).  It occured to me that, despite limiting the available seen memory in the os (with mem=), users may still want to record that additional memory (it can be used by devices for DMA, etc).  As such it would be wrong to always exclude it from a vmcore.  But we're still seeing this problem in which when you use mem=, you get I/O errors when you read beyond whats specified with mem=.  My thought is that using mem= limits the page tables such that accesses beyond that range fail.  By not specifying mem= on the kdump kernel command line, we can avoid that error.  Then, should people not want that additional memory beyond mem= (if its truly unused), collecting the core with makedumpfile -d 31 should exclude it all (as zero pages).  sorry, thought I was clear previously

Comment 35 Qian Cai 2009-06-18 08:40:00 UTC
When removed "mem" option from kdump kernel, there was no read error anymore for the x86_64 machine I have just tested. It had captured the full VMCore.

Comment 36 Neil Horman 2009-06-18 10:51:50 UTC
ok, thank you.  I think then the best solution to this problem is to remove the mem= parameter from the kdump kernel command line while loading, and tell people to use agressive makedumpfile filtering if they don't want that memory.  That way, if people want to see the additional memory in the system, they can.

Comment 41 Neil Horman 2009-11-25 02:44:16 UTC
No, thats just an artifact of how makedumpfile works.  we save dumps to vmcore-incomplete and then move it to vmcore when the dump is done and we're sure we have it all.  To do that with makedumpfile, we have to tell makedumpfile to save to the name vmcore-incomplete, and if you crank up the message output level as you've done, you get a message telling you that thats the filename your saving too.  Theres no sane way around that, as makedumpfile is printing out the correct save name.  If you don't want to see that, don't turn up the message output level.

Comment 46 Ruediger Landmann 2010-03-19 04:22:34 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
During a core dump, kexec creates a header to reference all the memory
blocks in the system. Previously, the kdump kernel command line supported a 
"mem=" parameter that limited the memory that was dumped. When this parameter 
was set, kexec could not read past the limit when creating the header, and the
dump would result in an I/O error. The "mem=" parameter has been removed
from kexec to ensure that core dumps succeed. Users of kexec should use
makedumpfile filtering to hide superfluous results.

Comment 47 errata-xmlrpc 2010-03-30 07:47:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0179.html