Bug 1843970

Summary: Crash on GCP nested instance when using qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64
Product: Red Hat Enterprise Linux 8 Reporter: Christophe Fergeau <cfergeau>
Component: qemu-kvmAssignee: Amnon Ilan <ailan>
qemu-kvm sub component: General QA Contact: Wei Shi <wshi>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: ailan, coli, jinzhao, juzhang, prkumar, virt-maint, vkuznets, wshi, ymao
Version: ---Keywords: Regression, Reopened
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-30 12:38:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christophe Fergeau 2020-06-04 14:09:57 UTC
Description of problem:
When trying to start a nested VM on a GCP instance, qemu crashes with:
2020-06-03T14:07:04.871770Z qemu-kvm: error: failed to set MSR 0x48b to 0x59ff00000000
qemu-kvm: /builddir/build/BUILD/qemu-2.12.0/target/i386/kvm.c:2119: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.


This happens with qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64
Downgrading to qemu-kvm-2.12.0-88.module+el8.1.0+5708+85d8e057.3.x86_64 gets rid of the crash

The libvirt VM XML is 

$ sudo virsh dumpxml nested-8x69s-master-0
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>nested-8x69s-master-0</name>
  <uuid>9e9ea8f4-ecdf-4173-b5d7-df239ce87d3c</uuid>
  <memory unit='KiB'>14680064</memory>
  <currentMemory unit='KiB'>14680064</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='host-passthrough' check='none'/>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='volume' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source pool='nested-8x69s' volume='nested-8x69s-master-0'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:9e:a9:70'/>
      <source network='nested-8x69s'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='pty'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </rng>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-fw_cfg'/>
    <qemu:arg value='name=opt/com.coreos/config,file=/var/lib/libvirt/openshift-images/nested-8x69s/nested-8x69s-master.ign'/>
  </qemu:commandline>
</domain>


$ lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping:            0
CPU MHz:             2300.000
BogoMIPS:            4600.00
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities

This is reliably reproducible, so more info can be gathered/more testing can be done.

Comment 1 Qinghua Cheng 2020-06-05 01:42:36 UTC
Hi Christophe,

Which kernel version is on your host ?

Thanks,
Qinghua Cheng

Comment 2 Christophe Fergeau 2020-06-05 13:52:47 UTC
It's 4.18.0-193.1.2.el8_2.x86_64

Comment 3 Qinghua Cheng 2020-06-08 07:47:02 UTC
Hi Christophe,

This crash happens on a Linux L1 guest or a windows L1 guest ?

Thanks,
Qinghua Cheng

Comment 4 Christophe Fergeau 2020-06-08 09:21:51 UTC
The kernel version I gave is for the L1 guest, so it's linux. L2 guest is also a linux guest running RHCOS.

Comment 5 John Ferlan 2020-06-09 14:44:39 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Could be miscategorized in General - perhaps more Machine/CPU related or maybe we need a new sub-component.

Comment 6 Qinghua Cheng 2020-06-11 03:53:02 UTC
I tried to reproduce this bug on our env:

Host:
# uname -r 
4.18.0-193.el8.x86_64
qemu-kvm build: qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64

L1 guest: 
# uname -r 
4.18.0-193.1.2.el8_2.x86_64
qemu-kvm build: qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64

I used the xml file in this bug to boot guest mv, both L1 guest vm and L2 guest vm are booted successfully, and work well. This bug is not reproduced in this environment.

Comment 7 Wei Shi 2020-06-11 06:56:30 UTC
Vitaly,
  It seems this is the same bug as we fixed in RHEL-AV (RHBZ#1822682)

Can NOT reproduce on
qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64
qemu-kvm -cpu host ...

Can reproduce on
qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64
qemu-kvm -cpu host ...

Comment 8 Vitaly Kuznetsov 2020-06-11 10:15:31 UTC
(In reply to Wei Shi from comment #7)
> Vitaly,
>   It seems this is the same bug as we fixed in RHEL-AV (RHBZ#1822682)
> 
> Can NOT reproduce on
> qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64
> qemu-kvm -cpu host ...
> 
> Can reproduce on
> qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64
> qemu-kvm -cpu host ...

Yes, most likely it's the same bug. You can check that it's not
reproducible with qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950

Comment 10 Wei Shi 2020-06-15 03:54:32 UTC
Verified nested VM can be launched successfully with qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950.x86_64 on GCP

*** This bug has been marked as a duplicate of bug 1822682 ***

Comment 11 Christophe Fergeau 2020-06-16 09:45:40 UTC
Aren't qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64 and qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950.x86_64 in different streams? Is there a bug tracking a backport to qemu-kvm-2.12.0 for the fix which was done in qemu-kvm-4.2.0?

Comment 17 Red Hat Bugzilla 2023-09-15 00:32:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days