Bug 1741863 - Guest reboot fail and qemu crash when: adding "default_hugepagesz=1G" > adding "intel_iommu=on" > enable tuned -> reboot guest
Summary: Guest reboot fail and qemu crash when: adding "default_hugepagesz=1G" > addin...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.0
Assignee: Peter Xu
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1771318
TreeView+ depends on / blocked
 
Reported: 2019-08-16 09:47 UTC by Pei Zhang
Modified: 2020-02-18 16:55 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-18 16:55:23 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Guest reboot hang screen (56.30 KB, image/png)
2019-08-16 09:47 UTC, Pei Zhang
no flags Details

Description Pei Zhang 2019-08-16 09:47:58 UTC
Created attachment 1604320 [details]
Guest reboot hang screen

Description of problem:
Boot a guest, then adding "default_hugepagesz=1G","intel_iommu=on",enabling tuned in guest, guest will fails reboot(always) and sometimes qemu crash.

Version-Release number of selected component (if applicable):
4.18.0-132.el8.x86_64
qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Boot a rhel8 guest, refer to [1]

2. Enable hugepage in guest kernel line
# grubby --args="default_hugepagesz=1G" --update-kernel=`grubby --default-kernel`

3. Enable iommu in guest kernel line
# grubby --args="intel_iommu=on" --update-kernel=`grubby --default-kernel`

4. Enable tuned cpu-partition
# tuned-adm profile cpu-partitioning

5. Reboot guest, fails, guest screen refer to attachment. And sometimes after several minutes, qemu crash with below error info:

kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device (28)
2019-08-16 06:59:09.846+0000: shutting down, reason=crashed


Actual results:
Guest fails reboot and qemu crash.

Expected results:
Guest should keep working well and qemu should not crash.

Additional info:
1. Remove step2, others keep same, both guest and qemu work well.

2. Remove step3, others keep same, both guest and qemu work well.

3. Remove step4, others keep same, both guest and qemu work well.

4. Replace "intel_iommu=on" with "iommu=pt intel_iommu=on" in step 3, both guest and qemu work well.

5. I'm not quite sure if this is a kernel bug or qemu bug, so please feel free to change the component if necessary. Thanks.

Reference:
[1]
<domain type='kvm'>
  <name>rhel8.1</name>
  <uuid>c41df6ae-bfa0-11e9-940a-525400cd2f8d</uuid>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>6</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='30'/>
    <vcpupin vcpu='1' cpuset='28'/>
    <vcpupin vcpu='2' cpuset='26'/>
    <vcpupin vcpu='3' cpuset='24'/>
    <vcpupin vcpu='4' cpuset='22'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <emulatorpin cpuset='25,27,29,31'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel8.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <pmu state='off'/>
    <vmport state='off'/>
    <ioapic driver='qemu'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <feature policy='require' name='tsc-deadline'/>
    <numa>
      <cell id='0' cpus='0-5' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='threads'/>
      <source file='/home/images_nfv-virt-rt-kvm/rhel8.1.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='none'/>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='88:66:da:5f:dd:01'/>
      <source bridge='switch'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
    <iommu model='intel'>
      <driver intremap='on' caching_mode='on' iotlb='on'/>
    </iommu>
  </devices>
</domain>

Comment 4 Pei Zhang 2019-08-19 10:51:22 UTC
More info:

1. After guest reboot fails following steps in Description, the guest can never boot again even with simplest command line, it looks like the image is broken.

2. Peter debugged this issue and found seems the guest is keep rebooting.

Comment 5 Pei Zhang 2019-08-19 11:28:17 UTC
More info: (continued)

3. After waiting about 18 minutes, qemu will crash.

4. With patch provided by Peter in Comment 2, qemu will never crash, the only issue is guest fails reboot.

Comment 6 Peter Xu 2019-08-21 10:11:17 UTC
The disk seems to have been broken somehow.  After the bug is triggered we can easily reproduce the reboot even with a simple command line like this:

  qemu-system-x86_64 -enable-kvm -m 2G -smp 8 -nographic rhel8.1_broken.qcow2

rhel8.1_broken.qcow2 was the broken image.

System reboots freqently because of triple fault:

390928  qemu-system-x86-17596 [005]  3564.810087: kvm_entry:            vcpu 0
390929  qemu-system-x86-17596 [005]  3564.810090: kvm_exit:             reason CPUID rip 0xa272 info 0 0
390930  qemu-system-x86-17596 [005]  3564.810090: kvm_cpuid:            func 0 rax d rbx 756e6547 rcx 6c65746e rdx 49656e69, cpuid entry found
390931  qemu-system-x86-17596 [005]  3564.810090: kvm_entry:            vcpu 0
390932  qemu-system-x86-17596 [005]  3564.810250: kvm_exit:             reason CPUID rip 0xa272 info 0 0
390933  qemu-system-x86-17596 [005]  3564.810250: kvm_cpuid:            func 0 rax d rbx 756e6547 rcx 6c65746e rdx 49656e69, cpuid entry found
390934  qemu-system-x86-17596 [005]  3564.810250: kvm_entry:            vcpu 0
390935  qemu-system-x86-17596 [005]  3564.810335: kvm_exit:             reason EXCEPTION_NMI rip 0x15fad info 0 80000306
390936  qemu-system-x86-17596 [005]  3564.810337: kvm_emulate_insn:     0:15fad:dd f4 (prot32)
390937  qemu-system-x86-17596 [005]  3564.810338: kvm_inj_exception:    #UD (0x0)
390938  qemu-system-x86-17596 [005]  3564.810338: kvm_entry:            vcpu 0
390939  qemu-system-x86-17596 [005]  3564.810340: kvm_exit:             reason TRIPLE_FAULT rip 0x15fad info 0 0
390940  qemu-system-x86-17596 [005]  3564.810340: kvm_fpu:              unload
390941  qemu-system-x86-17596 [005]  3564.810341: kvm_userspace_exit:   reason KVM_EXIT_SHUTDOWN (8)

I'm still trying to figure out where that NMI comes.

Comment 7 Peter Xu 2019-08-22 07:06:26 UTC
I think I've got the meaning of the log above. It's simply a #UD because KVM captures #UD in update_exception_bitmap() so that'll be a NMI vmexit.  There should be an #DF too but uncaptured because #DF not set in exception bitmap, then the tripple fault.

So I think what we need to figure out is why the disk, especially the boot section, was broken by those configuration commands.  When I say boot section, it should be after seabios loading the disk and before we can see the grub boot menu.

Comment 8 Ademar Reis 2020-02-05 23:03:17 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 9 Peter Xu 2020-02-17 17:25:37 UTC
Pei, I wanted to reproduce this on my host but failed.  My versions are (both host and guest):

grub2-common-2.02-78.el8.noarch
kernel-4.18.0-147.el8.x86_64
qemu-kvm-4.1.0-14.module+el8.1.0+5710+be701bf6.2.x86_64

I'm not using libvirt, and my cmdline is as simple as:

bin=/usr/libexec/qemu-kvm
$bin -M q35,accel=kvm,kernel-irqchip=split -smp 4 -m 4G -nographic \
     -device intel-iommu,intremap=on /images/rhel8.qcow2

I see that your packages are a bit older than mine, would you mind try the latest packages to see whether it's still reproducable?  I also suggest to try the same simple version of qemu cmdline so we can know whether it matters too.

Comment 10 Pei Zhang 2020-02-18 09:59:12 UTC
(In reply to Peter Xu from comment #9)
> Pei, I wanted to reproduce this on my host but failed.  My versions are
> (both host and guest):
> 
> grub2-common-2.02-78.el8.noarch
> kernel-4.18.0-147.el8.x86_64
> qemu-kvm-4.1.0-14.module+el8.1.0+5710+be701bf6.2.x86_64
> 
> I'm not using libvirt, and my cmdline is as simple as:
> 
> bin=/usr/libexec/qemu-kvm
> $bin -M q35,accel=kvm,kernel-irqchip=split -smp 4 -m 4G -nographic \
>      -device intel-iommu,intremap=on /images/rhel8.qcow2
> 
> I see that your packages are a bit older than mine, would you mind try the
> latest packages to see whether it's still reproducable?  I also suggest to
> try the same simple version of qemu cmdline so we can know whether it
> matters too.

Hi Peter, 

This issue can not be reproduced any more with latest packages.

5/5 PASS.

Versions:
4.18.0-178.el8.x86_64
qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
tuned-2.13.0-5.el8.noarch

I would suggest close this bug as "CurrentRelease". Is this OK? Thank you.

Best regards,

Pei

Comment 11 Peter Xu 2020-02-18 14:17:17 UTC
(In reply to Pei Zhang from comment #10)
> I would suggest close this bug as "CurrentRelease". Is this OK? Thank you.

Yes please.  Thanks!


Note You need to log in before you can comment on or make changes to this bug.