Bug 2024073

Summary: Latest qemu-kvm does not work
Product: Red Hat Enterprise Linux 8 Reporter: Igor Raits <igor.raits>
Component: qemu-kvmAssignee: Amnon Ilan <ailan>
qemu-kvm sub component: Devices QA Contact: jingzhao <jinzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bstinson, coli, jinzhao, juzhang, jwboyer, ngompa13, virt-maint, yiwei
Version: CentOS StreamKeywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-03 16:54:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Igor Raits 2021-11-17 09:18:27 UTC
Description of problem:
We have upgraded our systems from advanced-virtualization stack to the latest packages provided by CentOS Stream 8 (AppStream).

Essentially:
qemu-kvm-6.0.0-3[13].el8s → qemu-kvm-6.1.0-4.module_el8.6.0+983+a7505f3f
libvirt-7.6.0-[1-6].el8s → libvirt-7.9.0-1.module_el8.6.0+983+a7505f3f

And now start any VM does not actually do anything useful, console keeps being empty and VM does not come up. It shows as running in virsh list, but otherwise it does not do anything.

***Simply downgrading packages back to the previous versions and stop/start VM fixes the issue.***

Version-Release number of selected component (if applicable):
qemu-kvm-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
libvirt-7.9.0-1.module_el8.6.0+983+a7505f3f.x86_64


How reproducible:
Always.

Steps to Reproduce:
1. Spawn a VM
2. Try to check "virsh console" or try to check if a VM is alive (e.g. by ping/ssh or any other means)

Actual results:
VM is not really running.

Expected results:
VM is working as before.

Additional info:
Here's the full cmdline of the qemu-kvm process (spawned by libvirt, created by the OpenStack):
1806324 ?        Sl     0:04 /usr/libexec/qemu-kvm -name guest=instance-0003e769,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-instance-0003e769/master-key.aes"} -machine pc-q35-rhel8.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu host,migratable=on -m 20480 -overcommit mem-lock=off -smp 2,sockets=1,dies=1,cores=1,threads=2 -object {"qom-type":"iothread","id":"iothread1"} -object {"qom-type":"iothread","id":"iothread2"} -object {"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages/libvirt/qemu/1-instance-0003e769","share":true,"prealloc":true,"size":21474836480,"host-nodes":[0],"policy":"bind"} -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid 3f7ed13b-c7ee-4e7c-8651-fbe74234eb70 -smbios type=1,manufacturer=RDO,product=OpenStack Compute,version=0.0.0-1.el8+gdc,serial=3f7ed13b-c7ee-4e7c-8651-fbe74234eb70,uuid=3f7ed13b-c7ee-4e7c-8651-fbe74234eb70,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=37,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 -device pcie-root-port,port=24,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 -device pcie-root-port,port=25,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 -device pcie-root-port,port=26,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 -device pcie-root-port,port=27,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 -device pcie-root-port,port=28,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 -device pcie-root-port,port=29,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 -device pcie-root-port,port=30,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6 -device pcie-root-port,port=31,chassis=16,id=pci.16,bus=pcie.0,addr=0x3.0x7 -device pcie-root-port,port=32,chassis=17,id=pci.17,bus=pcie.0,addr=0x4 -device pcie-pci-bridge,id=pci.18,bus=pci.1,addr=0x0 -device piix3-usb-uhci,id=usb,bus=pci.18,addr=0x1 -blockdev {"driver":"host_device","filename":"/dev/volumes/3f7ed13b-c7ee-4e7c-8651-fbe74234eb70_disk","aio":"native","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-3-storage"} -device virtio-blk-pci,iothread=iothread1,bus=pci.4,addr=0x0,drive=libvirt-3-format,id=virtio-disk0,bootindex=1,write-cache=on -blockdev {"driver":"host_device","filename":"/dev/volumes/3f7ed13b-c7ee-4e7c-8651-fbe74234eb70_disk.eph0","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"} -device virtio-blk-pci,iothread=iothread1,bus=pci.5,addr=0x0,drive=libvirt-2-format,id=virtio-disk1,write-cache=on -blockdev {"driver":"file","filename":"/var/lib/nova/instances/3f7ed13b-c7ee-4e7c-8651-fbe74234eb70/disk.config","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device ide-cd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,write-cache=on -netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=41 -device virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=fa:16:3e:0f:68:e2,bus=pci.2,addr=0x0 -netdev tap,fd=42,id=hostnet1,vhost=on,vhostfd=43 -device virtio-net-pci,host_mtu=9000,netdev=hostnet1,id=net1,mac=fa:16:3e:f3:86:d7,bus=pci.3,addr=0x0 -add-fd set=5,fd=45 -chardev pty,id=charserial0,logfile=/dev/fdset/5,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -audiodev id=audio1,driver=none -vnc 10.12.48.177:0,audiodev=audio1 -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -global virtio-blk-pci.config-wce=off -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

And strace on this process does not show anything useful (in opposite, it is quite silent):
[NA1][CHECK][root@cmp0078 3f7ed13b-c7ee-4e7c-8651-fbe74234eb70]# strace -fff -p 1806324
strace: Process 1806324 attached with 8 threads
[pid 1806385] futex(0x55c35cf14278, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 1806376] futex(0x55c35c9751c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 1806375] futex(0x55c35c9751c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 1806374] restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 1806356] ppoll([{fd=15, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8 <unfinished ...>
[pid 1806354] ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8 <unfinished ...>
[pid 1806349] futex(0x55c35c9941c8, FUTEX_WAIT, 4294967295, NULL <unfinished ...>
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=0, tv_nsec=220919453}, NULL, 8 <unfinished ...>
[pid 1806376] <... futex resumed>)      = 0
[pid 1806375] <... futex resumed>)      = -1 EAGAIN (Resource temporarily unavailable)
[pid 1806376] ioctl(38, KVM_RUN <unfinished ...>
[pid 1806375] futex(0x55c35c9751c0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 1806375] ioctl(36, KVM_RUN <unfinished ...>
[pid 1806324] <... ppoll resumed>)      = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=0, tv_nsec=457839868}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=2, tv_nsec=537702149}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=2, tv_nsec=998020126}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=2, tv_nsec=998577848}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
[pid 1806324] ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}], 6, {tv_sec=1, tv_nsec=448096579}, NULL, 8^Cstrace: Process 1806324 detached
 <detached ...>
strace: Process 1806349 detached
strace: Process 1806354 detached
strace: Process 1806356 detached
strace: Process 1806374 detached
strace: Process 1806375 detached
strace: Process 1806376 detached
strace: Process 1806385 detached

Comment 1 Igor Raits 2021-11-17 09:28:08 UTC
I compared qemu-kvm commandline with new and old versions and it is pretty much the same, nothing suspicious there…

But the strace shows hundreds (thousands?) syscalls per second with the old qemu-kvm while new one just keeps saying that timeout thingie.

Comment 2 Igor Raits 2021-11-17 09:29:46 UTC
And just in case it matters, I've been trying default CentOS kernel (4.18.0-348.el8) and also our custom one (5.15.1) with the same result on both.

Comment 5 Amnon Ilan 2021-11-25 18:25:56 UTC
This seems to be a known issue that is already resolved in qemu upstream and should reach CentOS Stream 8 soon. 
In the meantime please try working with "-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off" option and see if it works for you
You can also track the progress in bug#2001525

Comment 6 Amnon Ilan 2021-11-25 18:27:21 UTC
If you try with 16 or less pcie-root-ports it should work as well

Comment 7 Amnon Ilan 2021-12-21 18:31:02 UTC
Planning to close this bug as a duplicate of bug#2001525

Comment 8 Amnon Ilan 2022-02-03 16:54:41 UTC

*** This bug has been marked as a duplicate of bug 2001525 ***