Created attachment 1604320 [details] Guest reboot hang screen Description of problem: Boot a guest, then adding "default_hugepagesz=1G","intel_iommu=on",enabling tuned in guest, guest will fails reboot(always) and sometimes qemu crash. Version-Release number of selected component (if applicable): 4.18.0-132.el8.x86_64 qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot a rhel8 guest, refer to [1] 2. Enable hugepage in guest kernel line # grubby --args="default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` 3. Enable iommu in guest kernel line # grubby --args="intel_iommu=on" --update-kernel=`grubby --default-kernel` 4. Enable tuned cpu-partition # tuned-adm profile cpu-partitioning 5. Reboot guest, fails, guest screen refer to attachment. And sometimes after several minutes, qemu crash with below error info: kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device (28) 2019-08-16 06:59:09.846+0000: shutting down, reason=crashed Actual results: Guest fails reboot and qemu crash. Expected results: Guest should keep working well and qemu should not crash. Additional info: 1. Remove step2, others keep same, both guest and qemu work well. 2. Remove step3, others keep same, both guest and qemu work well. 3. Remove step4, others keep same, both guest and qemu work well. 4. Replace "intel_iommu=on" with "iommu=pt intel_iommu=on" in step 3, both guest and qemu work well. 5. I'm not quite sure if this is a kernel bug or qemu bug, so please feel free to change the component if necessary. Thanks. Reference: [1] <domain type='kvm'> <name>rhel8.1</name> <uuid>c41df6ae-bfa0-11e9-940a-525400cd2f8d</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> <locked/> </memoryBacking> <vcpu placement='static'>6</vcpu> <cputune> <vcpupin vcpu='0' cpuset='30'/> <vcpupin vcpu='1' cpuset='28'/> <vcpupin vcpu='2' cpuset='26'/> <vcpupin vcpu='3' cpuset='24'/> <vcpupin vcpu='4' cpuset='22'/> <vcpupin vcpu='5' cpuset='20'/> <emulatorpin cpuset='25,27,29,31'/> </cputune> <numatune> <memory mode='strict' nodeset='0'/> <memnode cellid='0' mode='strict' nodeset='0'/> </numatune> <os> <type arch='x86_64' machine='pc-q35-rhel8.0.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <pmu state='off'/> <vmport state='off'/> <ioapic driver='qemu'/> </features> <cpu mode='host-passthrough' check='none'> <feature policy='require' name='tsc-deadline'/> <numa> <cell id='0' cpus='0-5' memory='8388608' unit='KiB' memAccess='shared'/> </numa> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='threads'/> <source file='/home/images_nfv-virt-rt-kvm/rhel8.1.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </disk> <controller type='usb' index='0' model='none'/> <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='1' port='0x0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </controller> <controller type='pci' index='2' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='2' port='0x0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </controller> <controller type='pci' index='3' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='3' port='0x0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='pci' index='4' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='4' port='0x0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <controller type='pci' index='5' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='5' port='0x0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <controller type='pci' index='6' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='6' port='0x0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </controller> <controller type='pci' index='7' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='7' port='0x0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <interface type='bridge'> <mac address='88:66:da:5f:dd:01'/> <source bridge='switch'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='cirrus' vram='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </memballoon> <iommu model='intel'> <driver intremap='on' caching_mode='on' iotlb='on'/> </iommu> </devices> </domain>
More info: 1. After guest reboot fails following steps in Description, the guest can never boot again even with simplest command line, it looks like the image is broken. 2. Peter debugged this issue and found seems the guest is keep rebooting.
More info: (continued) 3. After waiting about 18 minutes, qemu will crash. 4. With patch provided by Peter in Comment 2, qemu will never crash, the only issue is guest fails reboot.
The disk seems to have been broken somehow. After the bug is triggered we can easily reproduce the reboot even with a simple command line like this: qemu-system-x86_64 -enable-kvm -m 2G -smp 8 -nographic rhel8.1_broken.qcow2 rhel8.1_broken.qcow2 was the broken image. System reboots freqently because of triple fault: 390928 qemu-system-x86-17596 [005] 3564.810087: kvm_entry: vcpu 0 390929 qemu-system-x86-17596 [005] 3564.810090: kvm_exit: reason CPUID rip 0xa272 info 0 0 390930 qemu-system-x86-17596 [005] 3564.810090: kvm_cpuid: func 0 rax d rbx 756e6547 rcx 6c65746e rdx 49656e69, cpuid entry found 390931 qemu-system-x86-17596 [005] 3564.810090: kvm_entry: vcpu 0 390932 qemu-system-x86-17596 [005] 3564.810250: kvm_exit: reason CPUID rip 0xa272 info 0 0 390933 qemu-system-x86-17596 [005] 3564.810250: kvm_cpuid: func 0 rax d rbx 756e6547 rcx 6c65746e rdx 49656e69, cpuid entry found 390934 qemu-system-x86-17596 [005] 3564.810250: kvm_entry: vcpu 0 390935 qemu-system-x86-17596 [005] 3564.810335: kvm_exit: reason EXCEPTION_NMI rip 0x15fad info 0 80000306 390936 qemu-system-x86-17596 [005] 3564.810337: kvm_emulate_insn: 0:15fad:dd f4 (prot32) 390937 qemu-system-x86-17596 [005] 3564.810338: kvm_inj_exception: #UD (0x0) 390938 qemu-system-x86-17596 [005] 3564.810338: kvm_entry: vcpu 0 390939 qemu-system-x86-17596 [005] 3564.810340: kvm_exit: reason TRIPLE_FAULT rip 0x15fad info 0 0 390940 qemu-system-x86-17596 [005] 3564.810340: kvm_fpu: unload 390941 qemu-system-x86-17596 [005] 3564.810341: kvm_userspace_exit: reason KVM_EXIT_SHUTDOWN (8) I'm still trying to figure out where that NMI comes.
I think I've got the meaning of the log above. It's simply a #UD because KVM captures #UD in update_exception_bitmap() so that'll be a NMI vmexit. There should be an #DF too but uncaptured because #DF not set in exception bitmap, then the tripple fault. So I think what we need to figure out is why the disk, especially the boot section, was broken by those configuration commands. When I say boot section, it should be after seabios loading the disk and before we can see the grub boot menu.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Pei, I wanted to reproduce this on my host but failed. My versions are (both host and guest): grub2-common-2.02-78.el8.noarch kernel-4.18.0-147.el8.x86_64 qemu-kvm-4.1.0-14.module+el8.1.0+5710+be701bf6.2.x86_64 I'm not using libvirt, and my cmdline is as simple as: bin=/usr/libexec/qemu-kvm $bin -M q35,accel=kvm,kernel-irqchip=split -smp 4 -m 4G -nographic \ -device intel-iommu,intremap=on /images/rhel8.qcow2 I see that your packages are a bit older than mine, would you mind try the latest packages to see whether it's still reproducable? I also suggest to try the same simple version of qemu cmdline so we can know whether it matters too.
(In reply to Peter Xu from comment #9) > Pei, I wanted to reproduce this on my host but failed. My versions are > (both host and guest): > > grub2-common-2.02-78.el8.noarch > kernel-4.18.0-147.el8.x86_64 > qemu-kvm-4.1.0-14.module+el8.1.0+5710+be701bf6.2.x86_64 > > I'm not using libvirt, and my cmdline is as simple as: > > bin=/usr/libexec/qemu-kvm > $bin -M q35,accel=kvm,kernel-irqchip=split -smp 4 -m 4G -nographic \ > -device intel-iommu,intremap=on /images/rhel8.qcow2 > > I see that your packages are a bit older than mine, would you mind try the > latest packages to see whether it's still reproducable? I also suggest to > try the same simple version of qemu cmdline so we can know whether it > matters too. Hi Peter, This issue can not be reproduced any more with latest packages. 5/5 PASS. Versions: 4.18.0-178.el8.x86_64 qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64 python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 tuned-2.13.0-5.el8.noarch I would suggest close this bug as "CurrentRelease". Is this OK? Thank you. Best regards, Pei
(In reply to Pei Zhang from comment #10) > I would suggest close this bug as "CurrentRelease". Is this OK? Thank you. Yes please. Thanks!