Bug 1441646

Summary: Level-2 guest boot crashes libvirtd due to NULL vendor field in 'qemu64' CPU model
Product: Red Hat Enterprise Linux 7 Reporter: Kashyap Chamarthy <kchamart>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Jing Qi <jinqi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: dyuan, jsuchane, kchamart, lhuang, mtessun, rbalakri, xuzhang, yalzhang
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-3.2.0-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1441655 (view as bug list) Environment:
Last Closed: 2017-08-02 00:05:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1441655    
Attachments:
Description Flags
GDB traceback of libvirtd during guest boot none

Description Kashyap Chamarthy 2017-04-12 12:04:22 UTC
Description of problem
----------------------

In a nested virtualization environment, when booting a level-2 guest
with CPU mode as 'host-model', libvirt daemon in level-1 guest
crashes (SIGSEGV).


Version
-------

L0:

$ uname -r; rpm -q libvirt qemu-kvm
3.10.0-514.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
qemu-kvm-1.5.3-126.el7.x86_64

L1:

$ uname -r; rpm -q libvirt qemu-kvm-rhev
3.10.0-514.10.2.el7.x86_64
libvirt-2.0.0-10.el7_3.6.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64


How reproducible: Consistently.


Steps to Reproduce
------------------

In a nested KVM environment (instructions are Boot a level-2 RHEL 7.3
guest with:

  <cpu mode='host-model'>
    <model fallback='allow'/>
  </cpu>

Actual results
--------------

Guest fails to boot, libvirtd crashes, with (from GDB analysis):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa3eeb53700 (LWP 177029)]
0x00007fa3ff4da823 in x86DataCpuid (cpuid=0x8, cpuid=0x8, data=data@entry=0x7fa3c4016c88) at cpu/cpu_x86.c:287
287     for (i = 0; i < data->len; i++) {
#0  0x00007fa3ff4da823 in x86DataCpuid (cpuid=0x8, cpuid=0x8, data=data@entry=0x7fa3c4016c88) at cpu/cpu_x86.c:287
#1  virCPUx86DataAddCPUID (data=data@entry=0x7fa3c4016c88, cpuid=0x8) at cpu/cpu_x86.c:355
#2  0x00007fa3ff4dd917 in x86Compute (host=<optimized out>, cpu=0x7fa3c400eea0, guest=0x7fa3eeb52360, message=<optimized out>) at cpu/cpu_x86.c:1604
#3  0x00007fa3d6a07843 in qemuBuildCpuModelArgStr (driver=driver@entry=0x7fa38c0f56c0, def=def@entry=0x7fa3dc00ddc0, buf=buf@entry=0x7fa3eeb524a0, qemuCaps=qemuCaps@entry=0x7fa3c4000bb0, 
[...]

Expected results
----------------

Guest boot suceeds, and libvirtd does not crash.

Comment 2 Kashyap Chamarthy 2017-04-12 12:07:49 UTC
Created attachment 1271121 [details]
GDB traceback of libvirtd during guest boot

Comment 3 Kashyap Chamarthy 2017-04-12 12:12:16 UTC
The root cause is: The cause for the crash is: Upon a guest boot, if you copy host vendor CPUID to the guest CPU, libvirtd would crash if that host CPU had a NULL vendor field.  Indeed, from another GDB session, we could see the 'vendor_id' to be '0x0'

-----
[...]
(gdb) p host
$4 = (virCPUDef *) 0x7fa38c1d2930
(gdb) p* host
$5 = {type = 0, mode = 0, match = 0, arch = VIR_ARCH_X86_64, model = 0x7fa38c1d3510 "qemu64", vendor_id = 0x0, fallback = 0, vendor = 0x7fa38c1d34f0 "AMD", sockets = 4, cores = 1, 
  threads = 1, nfeatures = 25, nfeatures_max = 0, features = 0x7fa38c1d35d0}
(gdb) p *cpu
$6 = {type = 1, mode = 1, match = 1, arch = VIR_ARCH_NONE, model = 0x7fa3c4014ab0 "qemu64", vendor_id = 0x0, fallback = 0, vendor = 0x7fa3c40149f0 "AMD", sockets = 1, cores = 1, 
  threads = 1, nfeatures = 25, nfeatures_max = 25, features = 0x7fa3c400f5b0}
(gdb) down
#2  0x00007fa3ff4dd917 in x86Compute (host=<optimized out>, cpu=0x7fa3c400eea0, guest=0x7fa3eeb52360, message=<optimized out>) at cpu/cpu_x86.c:1604
1604                virCPUx86DataAddCPUID(&guest_model->data,
[...]
-----


After a GDB session with Jiri Denemark (thanks!), he identified the commit that fixed it upstream libvirt:

$ git show 541e9ae6d4
commit 541e9ae6d4290b9004ed73648ea663563b329b3d
Author: Jim Fehlig <jfehlig>
Date:   Fri Aug 5 15:23:47 2016 -0600

    cpu_x86: fix libvirtd crash when host cpu vendor is not available
    
    When starting a guest and copying host vendor cpuid to the guest
    cpu, libvirtd would crash if the host cpu contained a NULL vendor
    field. Avoid the crash by checking for a valid vendor in the host
    cpu before copying the cpuid to the guest cpu.
    
    For completeness, here is a backtrace from the crash
    
    (gdb) bt
    f0  0x00007ffff739bf33 in x86DataCpuid (cpuid=0x8, cpuid=0x8,
        data=data@entry=0x7fffb800ee78) at cpu/cpu_x86.c:287
    f1  virCPUx86DataAddCPUID (data=data@entry=0x7fffb800ee78, cpuid=0x8)
        at cpu/cpu_x86.c:355
    f2  0x00007ffff739ef47 in x86Compute (host=<optimized out>, cpu=0x7fffb8000cc0,
        guest=0x7fffecca7348, message=<optimized out>) at cpu/cpu_x86.c:1580
    f3  0x00007fffd2b38e53 in qemuBuildCpuModelArgStr (migrating=false,
        hasHwVirt=<synthetic pointer>, qemuCaps=0x7fffb8001040, buf=0x7fffecca7360,
        def=0x7fffc400ce20, driver=0x1c) at qemu/qemu_command.c:6283
    f4  qemuBuildCpuCommandLine (cmd=cmd@entry=0x7fffb8002f60,
        driver=driver@entry=0x7fffc80882c0, def=def@entry=0x7fffc400ce20,
        qemuCaps=qemuCaps@entry=0x7fffb8001040, migrating=<optimized out>)
        at qemu/qemu_command.c:6445
    (gdb) f2
    (gdb) p *host_model
    $23 = {name = 0x7fffb800ec50 "qemu64", vendor = 0x0, signature = 0, data = {
        len = 2, data = 0x7fffb800e720}}

diff --git a/src/cpu/cpu_x86.c b/src/cpu/cpu_x86.c
index 670b02e..ee5b57d 100644
--- a/src/cpu/cpu_x86.c
+++ b/src/cpu/cpu_x86.c
@@ -1592,7 +1592,7 @@ x86Compute(virCPUDefPtr host,
         if (!(guest_model = x86ModelCopy(host_model)))
             goto error;
 
-        if (cpu->vendor &&
+        if (cpu->vendor && host_model->vendor &&
             virCPUx86DataAddCPUID(&guest_model->data,
                                   &host_model->vendor->cpuid) < 0)
             goto error;

Comment 4 Peter Krempa 2017-04-12 12:22:22 UTC
*** Bug 1441655 has been marked as a duplicate of this bug. ***

Comment 7 Jiri Denemark 2017-04-12 15:30:35 UTC
Some more details about this bug... libvirt stores its CPU model definitions in cpu_map.xml (installed in /usr/share/libvirt), where some models (usually older or artificial) are not defined with a specific <vendor>...</vendor> element. If libvirt decides to use one of these models as the model which best describes the host CPU, it will crash everytime it tries to start a domain.

So while this can easily be reproduced in a nested environment (it's trivial to change the host CPU nested libvirt will see), it is not completely impossible to hit this bug with a real hardware. Although the CPU would need to be either pretty old or very strange.

Comment 13 Jing Qi 2017-05-12 05:37:28 UTC
Verified with libvirt-3.2.0-4.el7.x86_64 and qemu-kvm-rhev-2.9.0-3.el7.x86_64 in host .
L1 xml is as below-
  <cpu mode='host-passthrough'>
    <model fallback='allow'/>
  </cpu>

L2 xml:

<cpu mode='host-model'>
    <model fallback='allow'/>
 </cpu>

L2 vm can be started successfully.

Comment 14 Jing Qi 2017-05-12 05:38:03 UTC
Verified with libvirt-3.2.0-4.el7.x86_64 and qemu-kvm-rhev-2.9.0-3.el7.x86_64 in host .
L1 xml is as below-
  <cpu mode='host-passthrough'>
    <model fallback='allow'/>
  </cpu>

L2 xml:

<cpu mode='host-model'>
    <model fallback='allow'/>
 </cpu>

L2 vm can be started successfully.

Comment 15 errata-xmlrpc 2017-08-02 00:05:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 16 errata-xmlrpc 2017-08-02 01:30:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846