Bug 507551 - RFE: Implement core dump API for QEMU driver
RFE: Implement core dump API for QEMU driver
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt (Show other bugs)
5.4
All Linux
low Severity medium
: rc
: ---
Assigned To: Paolo Bonzini
Virtualization Bugs
: FutureFeature
Depends On: 510244
Blocks: 507548 510519
  Show dependency treegraph
 
Reported: 2009-06-23 05:46 EDT by Chris Lalancette
Modified: 2010-10-23 06:18 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 510244 510519 (view as bug list)
Environment:
Last Closed: 2010-03-30 04:11:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proof of concept (5.67 KB, application/x-compressed-tar)
2009-07-07 16:51 EDT, Paolo Bonzini
no flags Details
qemu bits (4.90 KB, patch)
2009-07-08 07:30 EDT, Paolo Bonzini
no flags Details | Diff
libvirt bits (for upstream) (3.41 KB, patch)
2009-07-08 07:38 EDT, Paolo Bonzini
no flags Details | Diff
libvirt bits (for RHEL) (3.41 KB, patch)
2009-07-08 08:03 EDT, Paolo Bonzini
no flags Details | Diff
new, simpler patch for upstream (4.02 KB, patch)
2009-07-09 12:36 EDT, Paolo Bonzini
no flags Details | Diff
new, simpler patch for RHEL (4.02 KB, patch)
2009-07-09 12:36 EDT, Paolo Bonzini
no flags Details | Diff
libvirt-0.7.0-qemud-dump.patch (4.91 KB, patch)
2009-08-17 10:54 EDT, Paolo Bonzini
no flags Details | Diff

  None (edit)
Description Chris Lalancette 2009-06-23 05:46:38 EDT
Description of problem:
When using a qemu/kvm guest, currently virsh dump is unimplemented.  That's mostly because we lack a mechanism in qemu-kvm itself to generate a proper corefile.  Once we have that mechanism, implementing virsh dump should be very straightforward.
Comment 1 Dave Anderson 2009-06-23 10:10:14 EDT
With respect to the newly-added blocker and rhel-5.4.0 flags, this
was discussed earlier in the original BZ #505527:

------------------------------------------------------------------

https://bugzilla.redhat.com/show_bug.cgi?id=505527#c18
Comment #18 From  CAI Qian (caiqian@redhat.com)  2009-06-16 18:02:34 EDT

Raise severity/priority to high, and set RHEL5.4 Beta blocker flags. The reason
behind this is that it is a feature that should be in for beta testing.  

------------------------------------------------------------------

https://bugzilla.redhat.com/show_bug.cgi?id=505527#c19
Comment #19 From  Chris Lalancette (clalance@redhat.com)  2009-06-17 04:23:47 EDT   

(In reply to comment #18)
> Raise severity/priority to high, and set RHEL5.4 Beta blocker flags. The reason
> behind this is that it is a feature that should be in for beta testing.  

You have to be careful what you are talking about.  Getting "virsh dumpcore"
working is a significant amount of work that runs across several components,
including qemu, libvirt, and crash.  A reasonable target for that might be
RHEL-6, but certainly not 5.4.

We may, however, be able to do enough bugfixing on KVM to get kdump working in
5.4.  That will have to be looked at in detail.

Chris Lalancette  

------------------------------------------------------------------

https://bugzilla.redhat.com/show_bug.cgi?id=505527#c21
Comment #21 From  CAI Qian (caiqian@redhat.com)  2009-06-17 05:28:42 EDT   

Yes, I was talking about getting kdump working inside KVM guests, not "virsh
dump".  

------------------------------------------------------------------
Comment 2 Paolo Bonzini 2009-06-26 09:41:22 EDT
Don't hold your breath, but I'm looking at qemu. :-)
Comment 3 Paolo Bonzini 2009-06-30 14:12:51 EDT
Generating a proper corefile from qemu-kvm is not as easy as it sounds, and does not completely make sense in a FV environment.

For example, qemu-kvm has no idea of the kernel memory map, or the startup arguments of the kernel.  I wrote something that would generate an ELF file for a running qemu image, vaguely resembling /proc/kcore, but it needs to get the kernel memory map from the user (it's not hard for the user to extract it, but that's not the point).  It may make more sense to have some kind of helper kernel module and then to implement the generation of the core file purely in libvirt, with no need for qemu help.
Comment 4 Chris Lalancette 2009-06-30 16:48:36 EDT
(In reply to comment #3)
> Generating a proper corefile from qemu-kvm is not as easy as it sounds, and
> does not completely make sense in a FV environment.
> 
> For example, qemu-kvm has no idea of the kernel memory map, or the startup
> arguments of the kernel.  I wrote something that would generate an ELF file for

But why would qemu-kvm want to know anything about the guest kernel memory map?  The whole idea is to have qemu-kvm dump out a "raw" version of memory; from there, the crash utility knows (or can be taught) what to do with it.  It's basically equivalent to what we can do for Xen FV guests.  Or am I missing something?

Chris Lalancette
Comment 5 Dave Anderson 2009-06-30 17:14:29 EDT
> But why would qemu-kvm want to know anything about the guest kernel memory map?

If you wanted to debug the core file with gdb, then yes, you'd need 
virtual address references in the PT_LOAD segments.  But the crash
utility only really cares about physical memory.  

That being said, crash does need to know any physical base address 
relocations being done on architectures like x86_64 where kernel unity
mapped addresses do not directly yield the physical address by just
stripping PAGE_OFFSET.  From kdump ELF vmcore files, crash calculates
the physical base address from the kdump's vmcore PT_LOAD segment's
virtual address references.  And when kdump vmcores are transformed
by makedumpfile into its unique "compressed kdump" format, the 
physical base address is shoved into the unique header used by that
format.

For xen kernels, though, x86_64 physical base address is hardwired
to 0x200000 on x86_64 because the xen dumpfile headers have no such
information.  So far, the hard-wiring has "held"...

So Paolo's statement that it "does not completely make sense in
a FV environment" is valid in that respect.  When Chris Smith did
his KVM/dump prototype, he generated a vmcore file from a KVM guest
that ran just fine with the crash utility -- but he "cheated" in
that he did know the particulars of the FV kernel that was running,
and did put the virtual address particulars in the PT_LOAD segments.

(BTW, he never responded to my query as to the status of his KVM
dumpfile prototype that he posted on the qemu-devel list -- maybe he
doesn't work at HP any more?)
Comment 6 Paolo Bonzini 2009-06-30 18:42:06 EDT
Yes, the advantage of doing it in qemu is that you leverage all the arch-dependent code to translate kernel addresses to physical address.  But actually the exact same information that qemu uses, is available in a saved VM file.
Comment 7 Dave Anderson 2009-07-01 09:20:37 EDT
> That being said, crash does need to know any physical base address 
> relocations being done on architectures like x86_64 where kernel unity
> mapped addresses do not directly yield the physical address by just
> stripping PAGE_OFFSET.

Sorry -- I misspoke above -- I was referring to the x86_64 __START_KERNEL_map 
region above, not the PAGE_OFFSET/unity-mapped region.  The x86_64 kernel
has two primary virtual mappings of physical memory, one that is PAGE_OFFSET
(0xffff880000000000) based, where: 

  physaddr = virtaddr - 0xffff880000000000 

and a second mapping of the kernel text and static data, which is based
at __START_KERNEL_map (0xffffffff80000000) -- but has not been "unity-mapped"
since x86_64 kernels became relocatable.  For translating kernel/static-data
virtual addresses into their physical address, the physical base address
must also be applied like so:  

  physaddr = virtaddr - __START_KERNEL_map + physical_base 

So for example, a RHEL5 kdump vmcore have one mapping that references
the __START_KERNEL_map region at 0xffffffff80000000, which maps to
physical address 0x0000000000200000, and then a bunch of PAGE_OFFSET
based unity-mapped regions based from 0xffff810000000000:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  NOTE           0x0000000000000270 0x0000000000000000 0x0000000000000000
                 0x0000000000000f20 0x0000000000000f20         0
  LOAD           0x0000000000001190 0xffffffff80000000 0x0000000000200000
                 0x00000000004e6000 0x00000000004e6000  RWE    0
  LOAD           0x00000000004e7190 0xffff810000000000 0x0000000000000000
                 0x00000000000a0000 0x00000000000a0000  RWE    0
  LOAD           0x0000000000587190 0xffff810000100000 0x0000000000100000
                 0x0000000000f00000 0x0000000000f00000  RWE    0
  LOAD           0x0000000001487190 0xffff810009000000 0x0000000009000000
                 0x00000000953bf000 0x00000000953bf000  RWE    0
  LOAD           0x0000000096846190 0xffff81009e486000 0x000000009e486000
                 0x00000000015ac000 0x00000000015ac000  RWE    0
  LOAD           0x0000000097df2190 0xffff81009fa9a000 0x000000009fa9a000
                 0x000000000000f000 0x000000000000f000  RWE    0
  LOAD           0x0000000097e01190 0xffff81009fb1a000 0x000000009fb1a000
                 0x000000000000b000 0x000000000000b000  RWE    0
  LOAD           0x0000000097e0c190 0xffff81009fb3a000 0x000000009fb3a000
                 0x00000000000c6000 0x00000000000c6000  RWE    0
  LOAD           0x0000000097ed2190 0xffff810100000000 0x0000000100000000
                 0x0000000ee0000000 0x0000000ee0000000  RWE    0

So presuming that KVM FV kernels are given a contiguous block of
"pseudo-physical" memory (probably an invalid assumption), a KVM dump
could be expressed in two PT_LOAD segments, one for the __START_KERNEL_map,
and one for the PAGE_OFFSET unity-mapped region.

> But actually the exact same information that qemu uses, is available in
> a saved VM file.
  
I wondering whether the __START_KERNEL_map-to-physical_base relationship 
can be gleaned from a saved-VM file?  It sounds like the answer is no,
but that's the one missing piece of the puzzle.
Comment 8 Paolo Bonzini 2009-07-01 10:52:41 EDT
> I wondering whether the __START_KERNEL_map-to-physical_base relationship 
> can be gleaned from a saved-VM file?  It sounds like the answer is no,
> but that's the one missing piece of the puzzle.

The saved VM file should have all the necessary pieces of the puzzle: CR3, the GDT base (which actually should not be needed), the page table.  With the kernel debug info, you could also get the kcore_list and regenerate /proc/kcore.
Comment 9 Dave Anderson 2009-07-01 11:36:44 EDT
But /proc/kcore doesn't help much with the phys_base determination.  
On my RHEL5 machine, with a __START_KERNEL_map of ffffffff80000000, and
which has a phys_base of 0x200000 (2MB), here's its /proc/kcore:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  NOTE           0x0000000000000190 0x0000000000000000 0x0000000000000000
                 0x0000000000000974 0x0000000000000000         0
  LOAD           0x0000ffffff601000 0xffffffffff600000 0x0000000000000000
                 0x0000000000800000 0x0000000000800000  RWE    1000
  LOAD           0x0000ffff88001000 0xffffffff88000000 0x0000000000000000
                 0x0000000077f00000 0x0000000077f00000  RWE    1000
  LOAD           0x0000ffff80002000 0xffffffff80001000 0x0000000000000000
                 0x00000000004e4cec 0x00000000004e4cec  RWE    1000
  LOAD           0x0000c20000001000 0xffffc20000000000 0x0000000000000000
                 0x00001fffffffffff 0x00001fffffffffff  RWE    1000
  LOAD           0x0000810000001000 0xffff810000000000 0x0000000000000000
                 0x000000003fe0a000 0x000000003fe0a000  RWE    1000

There is that one PT_LOAD segment at 0xffffffff80001000, which seemingly
says that it maps to physical address 0, which is clearly not the case.

In reality, virtual address 0xffffffff80001000 maps to physical address
0x201000.  Kdump vmcores create a PT_LOAD segment for the __START_KERNEL_map
region that factors in the phys_base offset.
Comment 10 Dave Anderson 2009-07-01 12:50:20 EDT
...and for that matter, even if the kcore_list was useful, accessing *it*
requires the phys_base.
Comment 11 Paolo Bonzini 2009-07-01 13:38:41 EDT
No, the phys_base would be taken from the page table.  I was saying that with the page table (whose physical address is in CR3) would be the missing piece, and with that you can make something like /proc/kcore.
Comment 12 Dave Anderson 2009-07-01 13:59:31 EDT
(In reply to comment #11)
> No, the phys_base would be taken from the page table.  I was saying that with
> the page table (whose physical address is in CR3) would be the missing piece,
> and with that you can make something like /proc/kcore.  

Can you whip up a proof-of-concept?

I don't need to "make something like /proc/kcore" -- I just need to be 
able to translate __START_KERNEL_map-based virtual addresses into their
(pseudo) physical addresses -- as seen by the FV kernel -- and then know
how to find those physical addresses in the saved-VM file.
Comment 14 Paolo Bonzini 2009-07-07 16:51:54 EDT
Created attachment 350861 [details]
proof of concept

Here it is.  The files in the tarball are:

- qemu-load.c: Generic library to load QEMU save VM files.  Users need to know if the host OS was 32- or 64-bit; tested only for 64-bit host.

- qemu-load.h: Matching header file.

- test.c: Example of how to use the library, plus code to actually map virtual addresses to physical.  In the proof of concept, instead of using the __START_KERNEL_map I round the address of the IDT down by 1GB.  The mapping code however is totally independent from this part.
Comment 15 Dave Anderson 2009-07-07 16:57:45 EDT
Can I get a pointer to a vmlinux/saved-vm-file pair to work with?
Comment 17 Paolo Bonzini 2009-07-08 07:30:33 EDT
Created attachment 350915 [details]
qemu bits
Comment 18 Paolo Bonzini 2009-07-08 07:38:25 EDT
Created attachment 350918 [details]
libvirt bits (for upstream)
Comment 19 Paolo Bonzini 2009-07-08 08:03:06 EDT
Created attachment 350920 [details]
libvirt bits (for RHEL)
Comment 20 Paolo Bonzini 2009-07-08 09:23:06 EDT
Comment on attachment 350915 [details]
qemu bits

created bug 510244 to track the qemu bits; moved the qemu-rhel.patch attachment there
Comment 21 Daniel Veillard 2009-07-08 09:31:59 EDT
This wasn't pushed to upstream in time for Update 4, so the best we can do
at this point is reassign this for Update 5 and push the bits upstream where
this wasn't done. I don't see why this wouldn't be accepted in libvirt.

Daniel
Comment 22 Paolo Bonzini 2009-07-09 07:58:21 EDT
Waiting for upstream qemu to accept at least the idea, and decide on the name for the dump command.
Comment 23 Paolo Bonzini 2009-07-09 12:36:04 EDT
Created attachment 351100 [details]
new, simpler patch for upstream
Comment 24 Paolo Bonzini 2009-07-09 12:36:43 EDT
Created attachment 351101 [details]
new, simpler patch for RHEL
Comment 25 Paolo Bonzini 2009-07-21 10:46:20 EDT
Committed upstream at http://libvirt.org/git/?p=libvirt.git;a=commit;h=e1abc448143d83db8aad8962fc24b13465dbc69b

Should this be CLOSED/UPSTREAM?
Comment 26 Daniel Berrange 2009-07-21 10:53:00 EDT
No, the patch should be backported to the RHEL5 version of libvirt, attached to this BZ, and this bug put into POST state ready for RHEL-5.5.
Comment 27 Paolo Bonzini 2009-08-17 10:54:55 EDT
Created attachment 357661 [details]
libvirt-0.7.0-qemud-dump.patch

Backport of upstream e1abc44.
Comment 28 Daniel Veillard 2009-12-10 05:59:36 EST
libvirt-0.6.3-24.el5 has been built in dist-5E-qu-candidate with the fixes

Daniel
Comment 30 Alex Jia 2009-12-30 03:17:54 EST
This bug has been verified with libvirt 0.6.3-24.el5 on RHEL-5.5. Already
fixed, set status to VERIFIED. 

Steps to Verify:
[root@dhcp-66-70-62 libvirt]# ll -h /tmp
total 76K
drwx------ 3 root root 4.0K Dec 30 11:45 gconfd-root
drwx------ 2 root root 4.0K Dec 30 11:45 keyring-4qWfI1
srwxr-xr-x 1 root root    0 Dec 30 11:45 mapping-root
drwx------ 2 root root 4.0K Dec 30 15:00 orbit-root
drwx------ 2 root root 4.0K Dec 30 11:45 ssh-EEyqiF3551
drwx------ 2 root root 4.0K Dec 30 11:45 virtual-root.CZEllr
[root@dhcp-66-70-62 libvirt]# virsh start rhel5u5
Domain rhel5u5 started

[root@dhcp-66-70-62 libvirt]# virsh dump rhel5u5 /tmp/dump
Domain rhel5u5 dumped to /tmp/dump

[root@dhcp-66-70-62 libvirt]# ll -h /tmp
total 445M
-rw-r--r-- 1 root root 445M Dec 30 15:24 dump
drwx------ 3 root root 4.0K Dec 30 11:45 gconfd-root
drwx------ 2 root root 4.0K Dec 30 11:45 keyring-4qWfI1
srwxr-xr-x 1 root root    0 Dec 30 11:45 mapping-root
drwx------ 2 root root 4.0K Dec 30 15:00 orbit-root
drwx------ 2 root root 4.0K Dec 30 11:45 ssh-EEyqiF3551
drwx------ 2 root root 4.0K Dec 30 11:45 virtual-root.CZEllr

[root@dhcp-66-70-62 libvirt]# ssh 192.168.122.77 uname -r
The authenticity of host '192.168.122.77 (192.168.122.77)' can't be established.
RSA key fingerprint is f0:91:5c:a1:88:47:45:87:52:ba:48:75:9c:8e:80:52.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.77' (RSA) to the list of known hosts.
root@192.168.122.77's password: 
2.6.18-183.el5

[root@dhcp-66-70-62 libvirt]# rpm -ivh kernel-debuginfo-2.6.18-183.el5.x86_64.rpm kernel-debuginfo-common-2.6.18-183.el5.x86_64.rpm
Preparing...                ########################################### [100%]
   1:kernel-debuginfo-common########################################### [ 50%]
   2:kernel-debuginfo       ########################################### [100%]
[root@dhcp-66-70-62 libvirt]# crash /usr/lib/debug/lib/modules/2.6.18-183.el5/vmlinux /tmp/dump 

crash 4.1.2-1.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb 6.1                                     
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.18-183.el5/vmlinux
    DUMPFILE: /tmp/dump
        CPUS: 1
        DATE: Wed Dec 30 15:24:09 2009
      UPTIME: 00:22:14
LOAD AVERAGE: 0.02, 0.05, 0.23
       TASKS: 91
    NODENAME: localhost.localdomain
     RELEASE: 2.6.18-183.el5
     VERSION: #1 SMP Mon Dec 21 18:37:42 EST 2009
     MACHINE: x86_64  (2992 Mhz)
      MEMORY: 1 GB
       PANIC: ""
         PID: 0
     COMMAND: "swapper"
        TASK: ffffffff80308b60  [THREAD_INFO: ffffffff803fa000]
         CPU: 0
       STATE: TASK_RUNNING (ACTIVE)
     WARNING: panic task not found

crash> bt
PID: 0      TASK: ffffffff80308b60  CPU: 0   COMMAND: "swapper"
 #0 [ffffffff803fbeb8] schedule at ffffffff80063f96
 #1 [ffffffff803fbec0] thread_return at ffffffff80063ff8
 #2 [ffffffff803fbf68] default_idle at ffffffff8006c3a5
 #3 [ffffffff803fbf90] cpu_idle at ffffffff800497b7
crash> 


Version-Release number of selected component (if applicable):
[root@dhcp-66-70-62 libvirt]# uname -a
Linux dhcp-66-70-62.nay.redhat.com 2.6.18-183.el5 #1 SMP Mon Dec 21 18:37:42 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-66-70-62 libvirt]# lsmod|grep kvm
kvm_intel              86664  0 
kvm                   223648  2 ksm,kvm_intel
[root@dhcp-66-70-62 libvirt]# rpm -qa|grep libvirt
libvirt-0.6.3-24.el5
libvirt-debuginfo-0.6.3-24.el5
libvirt-python-0.6.3-24.el5
[root@dhcp-66-70-62 libvirt]# rpm -q kvm
kvm-83-140.el5
Comment 34 errata-xmlrpc 2010-03-30 04:11:07 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0205.html

Note You need to log in before you can comment on or make changes to this bug.