Bug 1240497
Summary: | qemu-kvm-rhev: dump-guest-memory creates invalid header with format kdump-{zlib,lzo,snappy} on ppc64 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Zheng <dzheng> |
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | abologna, anderson, dgibson, dyuan, fjin, gsun, jsuchane, lhuang, lilu, lvivier, michen, mrezanin, mzhan, qzhang, rbalakri, thuth, virt-maint, xuhan, xuma, zhengtli |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.6.0-21.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-07 20:27:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1288337 |
Description
Dan Zheng
2015-07-07 06:25:07 UTC
On ppc64le the format of the dump file changes depending on how I've obtained it: run 'virsh dump' -> QEMU suspend to disk image crash the guest -> ELF 64-bit LSB core file 64-bit PowerPC \ or cisco 7500, version 1 (SYSV), SVR4-style I'm trying to check whether the same is true on x86_64 as well, but so far I've been unable to obtain a dump by crashing the guest for some reason... Can you plese run 'file' on both x86_64 dumps and post the output? Below packages are used: kernel-3.10.0-319.el7.x86_64 libvirt-1.2.17-9.el7.x86_64 qemu-kvm-rhev-2.3.0-26.el7.x86_64 kernel-debuginfo-common-x86_64-3.10.0-319.el7.x86_64 kernel-debuginfo-3.10.0-319.el7.x86_64 Guest: 3.10.0-319.el7.x86_64 I can not make crash read the dumpfile from virsh dump on X86. This does not exist before. A new bug may be filed? Guest XML: <on_crash>coredump-restart</on_crash> <devices> ... <panic/> </devices> # virsh dump d2 /tmp/dump6 Domain d2 dumped to /tmp/dump6 # crash /usr/lib/debug/lib/modules/3.10.0-319.el7.x86_64/vmlinux /tmp/dump6 crash 7.1.2-2.el7 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. crash: /tmp/dump6: initialization failed ******************************************** crash the guest. See attachment d2-2015-09-28-17:10:57. # file d2-dump-from-guest d2-dump-from-guest: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style # file dump6 dump6: QEMU suspend to disk image FWIW, I had the same problem today - "crash" did not like the dump that I obtained with "virsh dump". However, after playing a little bit with the parameters, I found out that you can use "virsh dump --format elf --memory-only ..." to create a dump file that is usable with "crash" - so this seems to be just a question of using the right options? There seems be a regression problem. I did multiple versions of libvirt and qemu and found this problem was introduced from qemu-kvm-rhev-2.3.0-17.el7.x86_64. Before qemu-kvm-rhev-2.3.0-17.el7.x86_64, crash tool can read the dumpfile from virsh dump without any option. # virsh dump d2 /tmp/z72-16 Domain d2 dumped to /tmp/z72-16 # crash /usr/lib/debug/lib/modules/3.10.0-319.el7.x86_64/vmlinux /tmp/z72-16 KERNEL: /usr/lib/debug/lib/modules/3.10.0-319.el7.x86_64/vmlinux DUMPFILE: /tmp/z72-16 CPUS: 2 DATE: Tue Sep 29 17:39:21 2015 UPTIME: 00:00:07 LOAD AVERAGE: 0.00, 0.00, 0.00 TASKS: 112 NODENAME: localhost.localdomain RELEASE: 3.10.0-319.el7.x86_64 VERSION: #1 SMP Tue Sep 22 07:30:47 EDT 2015 MACHINE: x86_64 (3392 Mhz) MEMORY: 1 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: ffffffff81951440 (1 of 2) [THREAD_INFO: ffffffff8193c000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) WARNING: panic task not found @Thomas: I believe the code paths used when analyzing a dump created by 'virsh dump' and one produced by a guest crash are different because of the different file formats, as shown in Comment 2 and confirmed in Comment 4. So forcing a different format will change the result, but crash should be able to handle both formats the way it already does for x86. @Dan: can you please file a separate BZ against qemu-kvm-rhev for this regression? This BZ is to track the fact that crash arbitrarily rejects ppc64 dump files. Thanks. To add to the above, kvmdump_init() in kvmdump.c contains the following snippet: if (!machine_type("X86") && !machine_type("X86_64")) { error(FATAL, "invalid or unsupported host architecture for KVM: %s\n", MACHINE_TYPE); return FALSE; } Is there a technical reason why ppc64 dump files are not handled, or is just something that's not been implemented yet? Cheers. > To add to the above, kvmdump_init() in kvmdump.c > contains the following snippet: > > if (!machine_type("X86") && !machine_type("X86_64")) { > error(FATAL, > "invalid or unsupported host architecture for KVM: %s\n", > MACHINE_TYPE); > return FALSE; > } The snippet above is associated with the use of original "virsh dump" facility that used the old qemu migration file format, which for all practical purposes is no longer being supported even on x86/x86-64. That code was written by Paolo Bonzini, supporting only x86 and x86_64, and was pretty much unsuitable for use as a kernel crash dump, primarily because it is not "random-access". And what's worse, the format itself kept changing. Anyway, that format was a "holdover" for use until kdump became supported in KVM guests, and subsequent to that, the newer "virsh dump --memory-only" format, which writes the dumpfile in ELF format. If you're asking whether "virsh dump" in the old qemu migration file format will be developed further, much less supported, the answer is no. I wish it had never been introduced in the first place. I have never seen a ppc64 dumpfile created with "virsh dump --memory-only", so I can't comment on whether it works with the crash utility out of the box -- but it should work since it's ELF format. > Steps to Reproduce:
> 1. Start guest and run # virsh dump guest /tmp/dump. The dump is created.
Don't do this -- use "virsh dump --memory-only". And actually I was not
even aware of the "--format elf" option. I was under the impression that
using just "--memory-only" alone implied that ELF format would be used.
Am I wrong about that?
I see this in the man page for "virsh dump": If --memory-only is specified, the file is elf file, and will only include domain's memory and cpu common register value. OK, so I was correct in that ELF is the default for --memory-only. And then there is this: --format string is used to specify the format of 'memory-only' dump, and string can be one of them: elf, kdump-zlib(kdump-compressed format with zlib-compressed), kdump-lzo(kdump-compressed format with lzo-compressed), kdump-snappy(kdump-compressed format with snappy-compressed). I was under the impression that any of the compressed kdump formats would be generated by running "makedumpfile" on the ELF dumpfile created with "--memory-only". But I see that it can be done automatically by using one of the "--format kdump-xxx" format strings. That being the case, proper QE testing for "virsh dump" should use: virsh dump --memory-only [--format-elf] virsh dump --memory-only --format kdump-zlib virsh dump --memory-only --format kdump-lzo virsh dump --memory-only --format kdump-snappy It makes no sense to continue testing it *without* --memory-only. ELF dumps seem to work just fine[1] on ppc64. I guess the documentation[2] should be updated then, given it currently states --memory-only [...] This option should be used in cases where running a full dump will fail. [1] All I've done is run crash on one and see that some information was reported and no error was raised [2] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Managing_guest_virtual_machines_with_virsh-Domain_Commands.html#sect-Domain_Commands-Creating_a_dump_file_of_a_domains_core (In reply to Andrea Bolognani from comment #12) > ELF dumps seem to work just fine[1] on ppc64. Cool, good, thanks! > > I guess the documentation[2] should be updated then, > given it currently states > > --memory-only > [...] > This option should be used in cases where running > a full dump will fail. I don't know what that means, at least with respect to the crash utility. Are any other customers of "virsh dump"? As far as the crash utility is concerned, "virsh dump --memory-only" *is* a "full" dump, in that it contains all physical memory. All of the other stuff in the old migration file format is just unused junk that serves no purpose -- other than to unexpectedly change the file format, which will eventually cause the crash utility to "fail". In other words, what I'm saying is that ELF format should be used by default -- at least if the output is solely for use by the crash utility. (In reply to Andrea Bolognani from comment #12) > ELF dumps seem to work just fine[1] on ppc64. BTW, can you give me a pointer to your ppc64 dumpfile? I'd like to have a copy of it for further testing. File a bug for the regression problem in comment 6. https://bugzilla.redhat.com/show_bug.cgi?id=1267435 The ELF dumpfile comes up OK. But the snappy dumpfile fails: $ crash vmlinux-3.10.0-320.el7.ppc64le ../../Downloads/abologna-rhel72-0916-le.kdump-snappy.dump crash 7.1.2-2.el7 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-unknown-linux-gnu"... please wait... (gathering module symbol data) crash: seek error: kernel virtual address: c0001e4000000000 type: "pmd page" $ Do the lzo and zlib formats also fail? I see that the snappy dumpfile was written out in the "flat format", which the crash utility can rearrange and read without having to run the dumpfile through "makedumpfile -R". But taking that capability of the crash utility out of the picture, I did the rearrange operation with makedumpfile first, and then ran crash on the resultant file -- with the same result: $ cat abologna-rhel72-0916-le.kdump-snappy.dump | makedumpfile -R snappy.R $ crash /home/boston/anderson/crash_utility/ppc64le_dumpfiles/vmlinux-3.10.0-320.el7.ppc64le snappy.R crash 7.1.2-2.el7 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-unknown-linux-gnu"... please wait... (gathering module symbol data) crash: seek error: kernel virtual address: c0001e4000000000 type: "pmd page" $ I have tested both lzo and zlib, they seem to fail in the same exact way. I'll try to see if I can make any sense out of the dumpfiles, I do see that crash is reading junk from the dumpfile from the get-go, i.e., long before it gets to the eventual failure. Now I'm wondering whether the "--format kdump-xxx" options even work on x86_64? There's clearly a bug with the "virsh dump --format kdump-xxx" facility when run on ppc64le. Here's why I say that. The "virsh dump --format kdump-xxx" options are supposed to simulate/mimic what the "makedumpfile -c" facility does with an ELF vmcore in order to create a compressed kdump. So let's do just that with the ELF vmcore you provided: # makedumpfile -c -x vmlinux abologna-rhel72-0916-le.elf.dump dumpfile.compressed Copying data : [100.0 %] | The dumpfile is saved to dumpfile.compressed. makedumpfile Completed. # That works... And now let's run crash on the resultant compressed dumpfile: # crash vmlinux dumpfile.compressed crash 7.1.3 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-unknown-linux-gnu"... KERNEL: vmlinux DUMPFILE: dumpfile.compressed CPUS: 1 DATE: Thu Oct 1 09:28:49 2015 UPTIME: 00:01:58 LOAD AVERAGE: 2.41, 0.92, 0.34 TASKS: 100 NODENAME: localhost.localdomain RELEASE: 3.10.0-320.el7.ppc64le VERSION: #1 SMP Mon Sep 28 08:12:29 EDT 2015 MACHINE: ppc64le (3425 Mhz) MEMORY: 512 MB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: c0000000010b5190 [THREAD_INFO: c000000001120000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) WARNING: panic task not found crash> I'm going to continue debugging the bogus dumpfiles to see if I can help describe *why* they're bogus, but clearly this is not a crash bug. I wonder whether it's some kind of endian issue -- is the host machine that was used to create these dumpfiles a big-endian ppc64? > I'm going to continue debugging the bogus dumpfiles to see if I can help > describe *why* they're bogus, but clearly this is not a crash bug. OK, here's the problem... If the crash utility is invoked with a -d debug level, the contents of the dumpfile header are displayed. So if I take the compressed ppc64le kdump referenced in comment 24, which was created using "makedumpfile -c" on your provided ELF dumpfile, the relevant part of the header is shown below: # crash -d1 vmlinux dumpfile.compressed ... [ cut ] ... diskdump_data: filename: dumpfile.compressed flags: 6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED) dfd: 3 ofp: 0 machine_type: 21 (EM_PPC64) header: 10020722fe0 signature: "KDUMP " header_version: 6 utsname: sysname: Linux nodename: localhost.localdomain release: 3.10.0-320.el7.ppc64le version: #1 SMP Mon Sep 28 08:12:29 EDT 2015 machine: ppc64le domainname: (none) timestamp: tv_sec: 0 tv_usec: 0 status: 1 (DUMP_DH_COMPRESSED_ZLIB) block_size: 65536 sub_hdr_size: 1 bitmap_blocks: 2 max_mapnr: 8192 total_ram_blocks: 0 device_blocks: 0 written_blocks: 0 current_cpu: 0 nr_cpus: 1 tasks[nr_cpus]: 0 ... The key element above is the "block_size", which is the crashed system's page size. That is a fundamental value which is used, for example, to come up with the "max_mapnr" value of 8192, which basically took the crash system's memory size of 512MB and divided it by the 64K page size. There are other subsequent fields in the "sub_header_kdump" (not shown above) which are also based upon the page size. Now, if I take any of the 3 compressed kdumps, the headers show the problem: # crash -d1 vmlinux abologna-rhel72-0916-le.kdump-zlib.dump ... [ cut ] ... diskdump_data: filename: abologna-rhel72-0916-le.kdump-zlib.dump flags: 6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED) [FLAT] dfd: 3 ofp: 0 machine_type: 21 (EM_PPC64) header: 10034149ab0 signature: "KDUMP " header_version: 6 utsname: sysname: nodename: release: version: machine: Unknown domainname: timestamp: tv_sec: 0 tv_usec: 0 status: 1 (DUMP_DH_COMPRESSED_ZLIB) block_size: 4096 sub_hdr_size: 1 bitmap_blocks: 8 max_mapnr: 131072 total_ram_blocks: 0 device_blocks: 0 written_blocks: 0 current_cpu: 0 nr_cpus: 1 tasks[nr_cpus]: 0 ... Note that while the utsname section is pretty much empty, except for the "Unknown" machine type, that's not the problem. The bug is that it contains a bogus "block_size" value of 4096, which in turn leads to a bogus "max_mapnr" field, and then to other bogus values in the "sub_header_kdump" that follows. The block_size value absolutely must be legitimate or the dumpfile becomes useless. *** This bug has been marked as a duplicate of bug 1304222 *** (In reply to Andrea Bolognani from comment #27) > > *** This bug has been marked as a duplicate of bug 1304222 *** This is not a duplicate of BZ 1304222. BZ 1304222 exists to force users to use the "virsh dump --memory-only" format. That's fine, but has nothing to do with this bugzilla. This bugzilla is targeting the --memory-only format, which fails when used in conjunction with --format on ppc64. And that's because the "virsh dump --memory-only --format" creates a header with an invalid block_size (page size) field in the header as described in comment #26. (In reply to Dave Anderson from comment #28) > This is not a duplicate of BZ 1304222. BZ 1304222 exists to force > users to use the "virsh dump --memory-only" format. That's fine, but > has nothing to do with this bugzilla. > > This bugzilla is targeting the --memory-only format, which fails when > used in conjunction with --format on ppc64. And that's because the > "virsh dump --memory-only --format" creates a header with an invalid > block_size (page size) field in the header as described in comment #26. Dave, sorry for the noise. I didn't go through the BZ properly after not looking at it for a while, hence the error in closing it as a duplicate when it's clearly not one. Thanks for spotting and pointing out my mistake :) *** Bug 1353835 has been marked as a duplicate of this bug. *** I have verified that libvirt is doing nothing more than setting up a file descriptor and call the dump-guest-memory QMP command; additionally, Bug 1353835 shows that the same issue can be reproduced without using libvirt at all, so I'm moving this to QEMU. Some more data: the error message for an invalid RHEL 7.3 dump file (kdump-zlib format) has changed, and now looks like: WARNING: cannot access vmalloc'd module memory crash: cannot determine idle task addresses from init_tasks[] or runqueues[] crash: cannot resolve "init_task_union" elf format dump files still work fine, and so do elf format dump files that have been compressed using 'makedumpfile -c'. Running 'crash -d1' as explained in Comment 26 shows that the problem is still the invalid block size. On the other hand, Fedora 24 dump files result in the old error message: crash: invalid kernel virtual address: 7e470000 type: "pmd page" regardless of the format. Now that I think about it, I'm a bit confused about one detail: if I want to obtain a valid compressed dump file starting from an elf format dump file, I need to call makedumpfile like so: makedumpfile -c -x vmlinux dump.elf dump.elf.zlib where 'vmlinux' is the debug version of the kernel, taken from the kernel-debuginfo package. If I don't pass -x, the resulting dump file can't be consumed by crash: crash: invalid kernel virtual address: 1f701f7 type: "possible" WARNING: cannot read cpu_possible_map crash: invalid kernel virtual address: 2000000100000002 type: "online" WARNING: cannot read cpu_online_map crash: invalid kernel virtual address: 801ef type: "active" WARNING: cannot read cpu_active_map crash: vmlinux and dump.elf.zlib do not match! So how is QEMU supposed to be able to build a valid compressed dump file on its own? I can provide dumps if useful for debugging. Host: kernel-3.10.0-481.el7.ppc64le qemu-kvm-rhev-2.6.0-17.el7.ppc64le gdb-7.6.1-93.el7.ppc64le crash-7.1.5-1.el7.ppc64le RHEL 7.3 guest: kernel-3.10.0-481.el7.ppc64le Fedora 24 guest: kernel-4.6.4-301.fc24.ppc64le I've tested upstream qemu 2.2, 2.3, 2.4 and 2.7.0-rc1 (LE guest + LE host), none of them is able to generate a valid compressed dump. I've tested LE and BE guests on BE host with qemu-kvm-rhev-2.6.0-17, and compressed dump doesn't work either. (In reply to Andrea Bolognani from comment #31) > Now that I think about it, I'm a bit confused about one > detail: if I want to obtain a valid compressed dump file > starting from an elf format dump file, I need to call > makedumpfile like so: > > makedumpfile -c -x vmlinux dump.elf dump.elf.zlib > > where 'vmlinux' is the debug version of the kernel, taken > from the kernel-debuginfo package. If I don't pass -x, the > resulting dump file can't be consumed by crash: The "vmlinux" is not needed to create the compressed dump file, just remove "-x" option to do. The compressed file created by qemu and the compressed file created by makedumpfile differ by their header: QEMU: 00000000 6d 61 6b 65 64 75 6d 70 66 69 6c 65 00 00 00 00 |makedumpfile....| 00000010 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 |................| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 d0 |................| 00001010 4b 44 55 4d 50 20 20 20 06 00 00 00 00 00 00 00 |KDUMP ........| 00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001120 55 6e 6b 6e 6f 77 6e 00 00 00 00 00 00 00 00 00 |Unknown.........| 00001130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| makedumpfile: 00000000 4b 44 55 4d 50 20 20 20 06 00 00 00 00 00 00 00 |KDUMP ........| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001a0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 01 00 |................| 000001b0 01 00 00 00 02 00 00 00 00 00 04 00 00 00 00 00 |................| 000001c0 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| 000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| It seems QEMU wants to use the flatten format: commit fda053875e69120b2fde5fb34975ef5a49290f12 Author: qiaonuohan <qiaonuohan.com> Date: Tue Feb 18 14:11:27 2014 +0800 dump: add API to write header of flatten format flatten format will be used when writing kdump-compressed format. The format is also used by makedumpfile, you can refer to the following URL to get more detailed information about flatten format of kdump-compressed format: http://sourceforge.net/projects/makedumpfile/ The two functions here are used to write start flat header and end flat header to vmcore, and they will be called later when flatten format is used. struct MakedumpfileHeader stored at the head of vmcore is used to indicate the vmcore is in flatten format. struct MakedumpfileHeader { char signature[16]; /* = "makedumpfile" */ int64_t type; /* = 1 */ int64_t version; /* = 1 */ }; And struct MakedumpfileDataHeader, with offset and buf_size set to -1, is used to indicate the end of vmcore in flatten format. struct MakedumpfileDataHeader { int64_t offset; /* = -1 */ int64_t buf_size; /* = -1 */ }; There is something really interesting if we compare a file generated on x86_64: 00000000 6d 61 6b 65 64 75 6d 70 66 69 6c 65 00 00 00 00 |makedumpfile....| 00000010 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 |................| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 d0 |................| 00001010 4b 44 55 4d 50 20 20 20 06 00 00 00 00 00 00 00 |KDUMP ........| 00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001120 78 38 36 5f 36 34 00 00 00 00 00 00 00 00 00 00 |x86_64..........| 00001130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * With the one from ppc64le: 00000000 6d 61 6b 65 64 75 6d 70 66 69 6c 65 00 00 00 00 |makedumpfile....| 00000010 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 |................| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 d0 |................| 00001010 4b 44 55 4d 50 20 20 20 06 00 00 00 00 00 00 00 |KDUMP ........| 00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001120 55 6e 6b 6e 6f 77 6e 00 00 00 00 00 00 00 00 00 |Unknown.........| 00001130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * Where we have "x86_64", we hage "Unknown"... It seems to be the "utsname.machine" field of the header: include/sysemu/dump.h: typedef struct QEMU_PACKED NewUtsname { char sysname[65]; char nodename[65]; char release[65]; char version[65]; char machine[65]; char domainname[65]; } NewUtsname; typedef struct QEMU_PACKED DiskDumpHeader32 { char signature[SIG_LEN]; /* = "KDUMP " */ uint32_t header_version; /* Dump header version */ NewUtsname utsname; /* copy of system_utsname */ ... dump.c: #ifndef ELF_MACHINE_UNAME #define ELF_MACHINE_UNAME "Unknown" #endif ... static void create_header64(DumpState *s, Error **errp) { ... strncpy(dh->utsname.machine, ELF_MACHINE_UNAME, sizeof(dh->utsname.machine)); ... ELF_MACHINE_UNAME is declared in: target-i386/cpu.h target-s390x/cpu.h but not in target-ppc/cpu.h. (In reply to Laurent Vivier from comment #33) > The "vmlinux" is not needed to create the compressed dump file, just remove > "-x" option to do. You're right, -x is not needed... I must have messed up while testing that the first time around. Sorry for the noise. if with the help of "crash -d4" I compare header of compressed dump generated by QEMU and compressed dump generated by makedumpfile from the ELF dump, we can see: makedumpfile block_size: 65536 QEMU block_size: 4096 and in the guest we have: # getconf PAGESIZE 65536 So QEMU doesn't use the good page size. This is because in dump.c: ... if (!s->dump_info.page_size) { s->dump_info.page_size = TARGET_PAGE_SIZE; } ... and in target-ppc/arch_dump.c, cpu_get_dump_info() doesn't set page_size. If I force pages_size to 65536 in cpu_get_dump_info(), the compressed dump can be read: KERNEL: /usr/lib/debug/lib/modules/3.10.0-481.el7.ppc64le/vmlinux DUMPFILE: qemu.dump.zlib-64k [PARTIAL DUMP] CPUS: 1 DATE: Wed Aug 3 04:21:39 2016 UPTIME: 00:00:29 LOAD AVERAGE: 0.83, 0.22, 0.07 TASKS: 116 NODENAME: localhost.localdomain RELEASE: 3.10.0-481.el7.ppc64le VERSION: #1 SMP Wed Jul 27 18:25:08 EDT 2016 MACHINE: ppc64le (3026 Mhz) MEMORY: 16 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: c000000001127960 [THREAD_INFO: c000000001194000] CPU: 0 STATE: TASK_RUNNING WARNING: panic task not found In fact, it was already pointed out in comment 26... I prepare a patch to fix that. qemu-kvm-rhev-2.6.0-17.el7.ppc64le kernel:3.10.0-481.el7.ppc64le crash-7.1.5-1.el7.ppc64le met the same problem with dump file of (zlib ,snappy and lzo) format. there is no the same problem on x86. error message as below: crash 7.1.5-1.el7 Copyright (C) 2002-2016 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-unknown-linux-gnu"... please wait... (gathering module symbol data) WARNING: cannot access vmalloc'd module memory crash: cannot determine idle task addresses from init_tasks[] or runqueues[] crash: cannot resolve "init_task_union" The dump file of zlib and lzo format can be analysed by crash command with above build. But can't create snappy format dump file with error as below: {"execute": "dump-guest-memory", "arguments": { "paging": true, "protocol": "file:/home/dump.snappy", "format": "kdump-snappy"}} {"error": {"class": "GenericError", "desc": "kdump-compressed format doesn't support paging or filter"}} According to the comment in the code: dump.c: /* * kdump-compressed format need the whole memory dumped, so paging or * filter is not supported here. */ And the comment in the original commit: https://lists.nongnu.org/archive/html/qemu-devel/2014-03/msg00233.html Without 'format' being set, it is same as 'elf'. And if non-elf format is specified, paging and filter is not allowed. I think your command line is not supported: { "paging": true, "protocol": "file:/home/dump.snappy", "format": "kdump-snappy"} Andrea, is this what libvirt is doing: paging=true and format=kdump-snappy? No, libvirt always uses paging=false when calling the dump-guest-memory command. http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_monitor_json.c;h=d455adf73db69ff3fdcc7bd77d4c48ed8c68b96f;hb=HEAD#l2976 Fix included in qemu-kvm-rhev-2.6.0-21.el7 Reproduced the issue on old version: Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.6.0-17.el7.ppc64le kernel:3.10.0-495.el7.ppc64le crash-7.1.5-1.el7.ppc64le Steps to Reproduce: 1.Boot up a guest with command: /usr/libexec/qemu-kvm \ -name test \ -smp 4\ -m 2048 \ -monitor stdio \ -vnc :20 \ -qmp tcp:0:4444,server,nowait \ -device virtio-scsi-pci,bus=pci.0 \ -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \ -drive file=/root/RHEL-7.2.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none \ -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \ -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \ 2.create four formats dump file in qmp : {"execute": "dump-guest-memory", "arguments": { "paging": false, "protocol": "file:/home/dump.elf", "format": "elf"}} {"execute": "dump-guest-memory", "arguments": { "paging": false, "protocol": "file:/home/dump.zlib", "format": "kdump-zlib"}} {"execute": "dump-guest-memory", "arguments": { "paging": false, "protocol": "file:/home/dump.lzo", "format": "kdump-lzo"}} {"execute": "dump-guest-memory", "arguments": { "paging": false, "protocol": "file:/home/dump.snappy", "format": "kdump-snappy"}} 3.analysed four formats dump file by crash command crash /usr/lib/debug/lib/modules/3.10.0-495.el7.ppc64le/vmlinux dump.elf crash /usr/lib/debug/lib/modules/3.10.0-495.el7.ppc64le/vmlinux dump.zlib crash /usr/lib/debug/lib/modules/3.10.0-495.el7.ppc64le/vmlinux dump.lzo crash /usr/lib/debug/lib/modules/3.10.0-495.el7.ppc64le/vmlinux dump.snappy Actual results: The snappy,zlib and lzo format dump cann't be analysed by crash command. Verified the issue on the latest build: Version-Release number of selected component (if applicable): kernel-3.10.0-495.el7.ppc64le qemu-kvm-rhev-2.6.0-22.el7.ppc64le crash-7.1.5-1.el7.ppc64le Steps to Reproduce: 1.The same steps as above Actual results: All four formats dump file can be analysed by crash command. Base on the above results ,the bug has been fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |