Bug 458435

Summary: makedumpfile could not compress vmcore
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED WORKSFORME QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: low    
Version: 5.2   
Target Milestone: rc   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-11 20:22:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qian Cai 2008-08-08 11:07:35 UTC
Description of problem:
On a IA64 machine (hp-rx2660-03.rhts.bos.redhat.com), makedumpfile -c seems did not have effect. It is a system with lots of memories. Looks like a few other IA64 boxes I have tried without this problem. However, If vmcore compressing is not working on this machine, it takes a long time and lots of disk spaces to save the vmcore.

[root@hp-rx2660-03 ~]# ls -lh /var/crash/127.0.0.1-2008-08-08-05\:52\:07/vmcore
-rw------- 1 root root 31G Aug  8 06:25 /var/crash/127.0.0.1-2008-08-08-05:52:07/vmcore

[root@hp-rx2660-03 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         31965        640      31325          0         33        270
-/+ buffers/cache:        336      31629
Swap:         1983          0       1983

/etc/kdump.conf,
ext3 /dev/mapper/VolGroup00-LogVol00
core_collector makedumpfile -c

Output from crash by opening this vmcore,
[root@hp-rx2660-03 ~]# crash  /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux /var/crash/127.0.0.1-2008-08-08-05\:52\:07/vmcore 

crash 4.0-5.0.3
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "ia64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2008-08-08-05:52:07/vmcore
        CPUS: 4
        DATE: Fri Aug  8 05:51:09 2008
      UPTIME: 00:05:09
LOAD AVERAGE: 0.33, 0.26, 0.12
       TASKS: 114
    NODENAME: hp-rx2660-03.rhts.bos.redhat.com
     RELEASE: 2.6.18-92.el5
     VERSION: #1 SMP Tue Apr 29 13:18:26 EDT 2008
     MACHINE: ia64  (1594 Mhz)
      MEMORY: 31.4 GB
       PANIC: "SysRq : Trigger a crashdump"
         PID: 4355
     COMMAND: "bash"
        TASK: e0000100e3800000  [THREAD_INFO: e0000100e3801040]
         CPU: 2
       STATE: TASK_RUNNING (SYSRQ)

crash> 


Version-Release number of selected component (if applicable):
kernel-2.6.18-92.el5
kexec-tools-1.102pre-21.el5

How reproducible:
always

Comment 1 Qian Cai 2008-08-08 11:08:52 UTC
readelf could not read it though,

[root@hp-rx2660-03 ~]# readelf -a /var/crash/127.0.0.1-2008-08-08-05\:52\:07/vmcore 
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start

Comment 2 Neil Horman 2008-08-08 11:19:38 UTC
Can you capture the core without the use of makedumpfile, and then run makedumpfile manually to see what, if any errors are produced.

Also, you may want to try the latest kexec-tools from brew.  And IA64 problem was fixed inrelease -25.el5 or so that  may have a bearing on the output of makedumpfile (making this a duplicate of bz 449111).  Thanks!

Comment 3 Qian Cai 2008-08-08 16:29:41 UTC
I logged into the capture Kernel, and ran the following command,

# makedumpfile --message-level 15 -c /proc/vmcore good

and there is no error message,

LOAD (0)
  phys_start : 4000000
  phys_end   : 4638ce0
  virt_start : a000000100000000
  virt_end   : a000000100638ce0
LOAD (1)
  phys_start : 0
  phys_end   : 1000
  virt_start : e000000000000000
  virt_end   : e000000000001000
LOAD (2)
  phys_start : 1000
  phys_end   : a0000
  virt_start : e000000000001000
  virt_end   : e0000000000a0000
LOAD (3)
  phys_start : 100000
  phys_end   : 101000
  virt_start : e000000000100000
  virt_end   : e000000000101000
LOAD (4)
  phys_start : 101000
  phys_end   : 4000000
  virt_start : e000000000101000
  virt_end   : e000000004000000
LOAD (5)
  phys_start : 4000000
  phys_end   : 4db3000
  virt_start : e000000004000000
  virt_end   : e000000004db3000
LOAD (6)
  phys_start : 4db3000
  phys_end   : 8000000
  virt_start : e000000004db3000
  virt_end   : e000000008000000
LOAD (7)
  phys_start : 28000000
  phys_end   : 3e862000
  virt_start : e000000028000000
  virt_end   : e00000003e862000
LOAD (8)
  phys_start : 3eb86000
  phys_end   : 3ee7a000
  virt_start : e00000003eb86000
  virt_end   : e00000003ee7a000
LOAD (9)
  phys_start : 3fc00000
  phys_end   : 3fdd8000
  virt_start : e00000003fc00000
  virt_end   : e00000003fdd8000
LOAD (10)
  phys_start : 3fdd8000
  phys_end   : 3fde4000
  virt_start : e00000003fdd8000
  virt_end   : e00000003fde4000
LOAD (11)
  phys_start : 100000000
  phys_end   : 7ffffe000
  virt_start : e000000100000000
  virt_end   : e0000007ffffe000
LOAD (12)
  phys_start : 10040000000
  phys_end   : 100fea88000
  virt_start : e000010040000000
  virt_end   : e0000100fea88000
LOAD (13)
  phys_start : 100fea88000
  phys_end   : 100fefa0000
  virt_start : e0000100fea88000
  virt_end   : e0000100fefa0000
LOAD (14)
  phys_start : 100fefa0000
  phys_end   : 100feffe000
  virt_start : e0000100fefa0000
  virt_end   : e0000100feffe000
LOAD (15)
  phys_start : 100ff000000
  phys_end   : 100ff052000
  virt_start : e0000100ff000000
  virt_end   : e0000100ff052000
LOAD (16)
  phys_start : 100ff052000
  phys_end   : 100ff801000
  virt_start : e0000100ff052000
  virt_end   : e0000100ff801000
LOAD (17)
  phys_start : 100ff801000
  phys_end   : 100ff8d4000
  virt_start : e0000100ff801000
  virt_end   : e0000100ff8d4000
LOAD (18)
  phys_start : 100ff8d4000
  phys_end   : 100ff8d6000
  virt_start : e0000100ff8d4000
  virt_end   : e0000100ff8d6000
LOAD (19)
  phys_start : 100ff8d6000
  phys_end   : 100ff8d8000
  virt_start : e0000100ff8d6000
  virt_end   : e0000100ff8d8000
LOAD (20)
  phys_start : 100ff8d8000
  phys_end   : 100ffa00000
  virt_start : e0000100ff8d8000
  virt_end   : e0000100ffa00000
LOAD (21)
  phys_start : 100ffa00000
  phys_end   : 100ffc2e000
  virt_start : e0000100ffa00000
  virt_end   : e0000100ffc2e000
LOAD (22)
  phys_start : 100ffc2e000
  phys_end   : 100ffc70000
  virt_start : e0000100ffc2e000
  virt_end   : e0000100ffc70000
LOAD (23)
  phys_start : 100ffcda000
  phys_end   : 100ffe00000
  virt_start : e0000100ffcda000
  virt_end   : e0000100ffe00000
LOAD (24)
  phys_start : 100ffe00000
  phys_end   : 100ffe40000
  virt_start : e0000100ffe00000
  virt_end   : e0000100ffe40000
LOAD (25)
  phys_start : 100ffe80000
  phys_end   : 100fffe4000
  virt_start : e0000100ffe80000
  virt_end   : e0000100fffe4000

max_mapnr    : 403fff9
mem_map (0)
  mem_map    : 0
  pfn_start  : 0
  pfn_end    : 403fff9
[  0 %]

I have also tried with the z-stream version of kexec-tools 1.102pre-21.el5_2.2 which had the fix of the BZ you mentioned, and it had the same problem here.

Comment 4 Neil Horman 2008-08-08 17:31:37 UTC
Ok, so no error, but no reduction in size of the vmcore either? Can you bring up the system you're testing on so I can look at it myself (hp-rx2660 seems to be down ATM).  Thanks

Comment 5 Qian Cai 2008-08-10 11:27:37 UTC
Please make a new reservation of hp-rx2660-03.rhts.bos.redhat.com from RHTS. I could reserve it for you now, but the reservation will likely be expired when you come online.

Comment 6 Neil Horman 2008-08-11 15:16:57 UTC
so I just manually took an uncompressed dump and ran it through makdumpfile with this command:

makedumpfile -c ./vmcore ./vmcore.comp

And I got this result:

-r-------- 1 root root 33808410252 Aug 11 09:38 vmcore
-rw------- 1 root root 32998696809 Aug 11 10:20 vmcore.comp

Thats about a 2% reduction in size, which IIRC is pretty typical for compression in makedumpfile

So I appear unable to reproduce.  I've got hp-rx2660-03.rhts.bos.redhat.com reserved for the next few days where these cores live, if you want to try to reproduce the problem.

Comment 7 Qian Cai 2008-08-11 18:33:53 UTC
OK, I believed that I saw the same thing here. I'd expect compression ratio should be much higher than %2. Otherwise, it does not help much in this case. Anyway, feel free to close this bug if %2 is as expected.

Comment 8 Neil Horman 2008-08-11 20:22:27 UTC
I agree, its not a great savings when you look at it as just two pewrcent, but thats several hundred MB smaller.    Sorry.