Description of problem: libvirtd hangs at qemuTPMEmulatorStop() when vm with vTPM device fails to start Version-Release number of selected component: libvirt-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64 swtpm-0.2.0-2.20200127gitff5a83b.module+el8.2.0+5579+d71178e0.x86_64 qemu-kvm-4.2.0-17.module+el8.2.0+6141+0f540f16.x86_64 How reproducible: 100% Steps to Reproduce: 1. Install swtpm swtpm-tools 2. Prepare a vm that can't start successfully, e.g. # virsh dumpxml rhel8.1.0 ... <cpu mode='host-model' check='full'> <topology sockets='2' dies='1' cores='4' threads='2'/> <feature policy='require' name='vme'/> <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> </cpu> ... # virsh start rhel8.1.0 error: Failed to start domain rhel8.1.0 error: operation failed: guest CPU doesn't match specification: missing features: mpx 3. Add vTPM into vm configuration: <tpm model='tpm-crb'> <backend type='emulator' version='2.0'/> </tpm> 4. Start vm again: # virsh start rhel8.1.0 (The command just hangs) Actual results: libvirtd hangs when vm with vTPM device fails to start Expected results: libvirtd doesn't hang Additional info: # gstack `pidof libvirtd` ... Thread 3 (Thread 0x7efd9cb26700 (LWP 6371)): #0 0x00007efda384af21 in poll () from /lib64/libc.so.6 #1 0x00007efda770ddf3 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46 #2 virCommandProcessIO (cmd=<optimized out>) at ../../src/util/vircommand.c:2181 #3 0x00007efda7711efd in virCommandRun (cmd=cmd@entry=0x7efd9003d430, exitstatus=exitstatus@entry=0x0) at ../../src/util/vircommand.c:2415 #4 0x00007efd5da8da67 in qemuTPMEmulatorStop (swtpmStateDir=<optimized out>, shortName=0x7efd90066190 "12-rhel8.1.0") at ../../src/qemu/qemu_tpm.c:696 #5 0x00007efd5da8ecba in qemuExtTPMStop (driver=driver@entry=0x7efd3c0f74e0, vm=vm@entry=0x7efd3c17c2e0) at ../../src/qemu/qemu_tpm.c:907 #6 0x00007efd5d9f9c4d in qemuExtDevicesStop (driver=driver@entry=0x7efd3c0f74e0, vm=vm@entry=0x7efd3c17c2e0) at ../../src/qemu/qemu_extdevice.c:228 #7 0x00007efd5da18bb0 in qemuProcessLaunch (conn=<optimized out>, driver=0x7efd3c0f74e0, vm=0x7efd3c17c2e0, asyncJob=<optimized out>, incoming=<optimized out>, snapshot=<optimized out>, vmop=<optimized out>, flags=<optimized out>) at ../../src/qemu/qemu_process.c:6966 #8 0x00007efd5da1db7e in qemuProcessStart (conn=conn@entry=0x7efd680074e0, driver=driver@entry=0x7efd3c0f74e0, vm=vm@entry=0x7efd3c17c2e0, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at ../../src/qemu/qemu_process.c:7106 #9 0x00007efd5da82934 in qemuDomainObjStart (conn=0x7efd680074e0, driver=0x7efd3c0f74e0, vm=0x7efd3c17c2e0, flags=<optimized out>, asyncJob=QEMU_ASYNC_JOB_START) at ../../src/qemu/qemu_driver.c:7499 #10 0x00007efd5da82f99 in qemuDomainCreateWithFlags (dom=0x7efd900327b0, flags=0) at ../../src/qemu/qemu_driver.c:7550 #11 0x00007efda79240a7 in virDomainCreate (domain=domain@entry=0x7efd900327b0) at ../../src/libvirt-domain.c:6536 #12 0x000055911eb09156 in remoteDispatchDomainCreate (server=0x55911faa71e0, msg=0x55911fae2100, args=<optimized out>, rerr=0x7efd9cb258e0, client=0x55911fb2d960) at ./remote/remote_daemon_dispatch_stubs.h:4894 #13 remoteDispatchDomainCreateHelper (server=0x55911faa71e0, client=0x55911fb2d960, msg=0x55911fae2100, rerr=0x7efd9cb258e0, args=<optimized out>, ret=0x0) at ./remote/remote_daemon_dispatch_stubs.h:4873 #14 0x00007efda785c2b9 in virNetServerProgramDispatchCall (msg=0x55911fae2100, client=0x55911fb2d960, server=0x55911faa71e0, prog=0x55911fae5110) at ../../src/rpc/virnetserverprogram.c:430 #15 virNetServerProgramDispatch (prog=0x55911fae5110, server=server@entry=0x55911faa71e0, client=0x55911fb2d960, msg=0x55911fae2100) at ../../src/rpc/virnetserverprogram.c:302 #16 0x00007efda786147c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55911faa71e0) at ../../src/rpc/virnetserver.c:136 #17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55911faa71e0) at ../../src/rpc/virnetserver.c:153 #18 0x00007efda777d780 in virThreadPoolWorker (opaque=opaque@entry=0x55911fa8cbf0) at ../../src/util/virthreadpool.c:163 #19 0x00007efda777cb0c in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:196 #20 0x00007efda3b242de in start_thread () from /lib64/libpthread.so.0 #21 0x00007efda3855e83 in clone () from /lib64/libc.so.6 ...
After step4: # virsh list (it can't return)
Can you please attach libvirt debugs logs? Thanks.
Created attachment 1686043 [details] libvirtd log
Upstream patch: https://www.redhat.com/archives/libvir-list/2020-May/msg00467.html
Pushed upstream as: commit 0048d9a5b10f9efa8b3a221199cc0080959b85c0 Author: Ján Tomko <jtomko> CommitDate: 2020-05-13 12:46:27 +0200 qemu: only stop external devices after the domain A failure in qemuProcessLaunch would lead to qemuExtDevicesStop being called twice - once in the cleanup section and then again in qemuProcessStop. However, the first one is called while the QEMU process is still running, which is too soon for the swtpm process, because the swtmp_ioctl command can lock up:: https://bugzilla.redhat.com/show_bug.cgi?id=1822523 Remove the first call and only leave the one in qemuProcessStop, which is called after the QEMU process is killed. Signed-off-by: Ján Tomko <jtomko> Reviewed-by: Daniel Henrique Barboza <danielhb413> Reviewed-by: Erik Skultety <eskultet> git describe: v6.3.0-80-g0048d9a5b1
This is the correct commit ID: commit 006782a8bc5a27125211946fcb12a40f7d4ed12a Author: Ján Tomko <jtomko> CommitDate: 2020-05-13 15:29:37 +0200 ... git describe: v6.3.0-85-g006782a8bc
Downstream backport: http://post-office.corp.redhat.com/archives/rhvirt-patches/2020-May/msg00447.html
Can be reproduced on libvirt-daemon-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64. Verify this bug on: libvirt-daemon-6.0.0-22.el8.x86_64 qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64 Steps: 1. date & virsh start avocado-vt-vm1-uefi ; date [1] 53561 Wed Jun 3 23:59:00 EDT 2020 error: Failed to start domain avocado-vt-vm1-uefi error: operation failed: guest CPU doesn't match specification: missing features: mpx [1]+ Done date Wed Jun 3 23:59:03 EDT 2020 # virsh dumpxml avocado-vt-vm1-uefi |grep 'tpm m' -A2 <tpm model='tpm-crb'> <backend type='emulator' version='2.0'/> </tpm> # ps aux|grep swtpm root 54124 0.0 0.0 12108 1052 pts/1 S+ 23:59 0:00 grep --color=auto swtpm # ps aux|grep qemu root 54128 0.0 0.0 12108 1080 pts/1 S+ 23:59 0:00 grep --color=auto qemu # ps aux|grep virsh root 54145 0.0 0.0 12108 1044 pts/1 S+ 00:00 0:00 grep --color=auto virsh Remove unstartable xml config: # virsh edit avocado-vt-vm1-uefi Domain avocado-vt-vm1-uefi XML configuration edited. # virsh start avocado-vt-vm1-uefi Domain avocado-vt-vm1-uefi started # ps aux|grep swtpm -i tss 54216 0.5 0.0 23696 3320 ? Ss 00:01 0:00 /usr/bin/swtpm socket --daemon --ctrl type=unixio,path=/run/libvirt/qemu/swtpm/3-avocado-vt-vm1-uefi-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/904ffe09-3b9f-4193-bbed-12061176512e/tpm2,mode=0600 --log file=/var/log/swtpm/libvirt/qemu/avocado-vt-vm1-uefi-swtpm.log --tpm2 --pid file=/run/libvirt/qemu/swtpm/3-avocado-vt-vm1-uefi-swtpm.pid qemu … -tpmdev emulator,id=tpm-tpm0,chardev=chrtpm -chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/3-avocado-vt-vm1-uefi-swtpm.sock -device tpm-crb,tpmdev=tpm-tpm0,id=tpm0 # virsh console avocado-vt-vm1-uefi Connected to domain avocado-vt-vm1-uefi Escape character is ^] Red Hat Enterprise Linux 8.2 (Ootpa) Kernel 4.18.0-193.el8.x86_64 on an x86_64 localhost login: root Password: Last login: Wed Apr 29 21:31:19 from 192.168.122.1 [root@localhost ~]# ls /dev/tpm0 /dev/tpm0 [root@localhost ~]# tpm2_getrandom 7 0x32 0xE3 0x4F 0x9D 0x68 0xAF 0xC6 Test result is as expected, so mark as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3172