Bug 1822523 - libvirtd hangs at qemuTPMEmulatorStop() when vm with vTPM device fails to start
Summary: libvirtd hangs at qemuTPMEmulatorStop() when vm with vTPM device fails to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.3
Assignee: Ján Tomko
QA Contact: Yanqiu Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-09 09:09 UTC by Fangge Jin
Modified: 2020-07-28 07:13 UTC (History)
8 users (show)

Fixed In Version: libvirt-6.0.0-20.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-28 07:12:15 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirtd log (34.28 KB, application/x-bzip)
2020-05-07 03:50 UTC, Fangge Jin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3172 0 None None None 2020-07-28 07:13:08 UTC

Description Fangge Jin 2020-04-09 09:09:50 UTC
Description of problem:
libvirtd hangs at qemuTPMEmulatorStop() when vm with vTPM device fails to start

Version-Release number of selected component:
libvirt-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64
swtpm-0.2.0-2.20200127gitff5a83b.module+el8.2.0+5579+d71178e0.x86_64
qemu-kvm-4.2.0-17.module+el8.2.0+6141+0f540f16.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install swtpm swtpm-tools

2. Prepare a vm that can't start successfully, e.g.
# virsh dumpxml rhel8.1.0
...
  <cpu mode='host-model' check='full'>
    <topology sockets='2' dies='1' cores='4' threads='2'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
  </cpu>
...

# virsh start rhel8.1.0
error: Failed to start domain rhel8.1.0
error: operation failed: guest CPU doesn't match specification: missing features: mpx

3. Add vTPM into vm configuration:
    <tpm model='tpm-crb'>
      <backend type='emulator' version='2.0'/>
    </tpm>

4. Start vm again:
# virsh start rhel8.1.0
(The command just hangs)

Actual results:
libvirtd hangs when vm with vTPM device fails to start

Expected results:
libvirtd doesn't hang

Additional info:
# gstack `pidof libvirtd`
...
Thread 3 (Thread 0x7efd9cb26700 (LWP 6371)):
#0  0x00007efda384af21 in poll () from /lib64/libc.so.6
#1  0x00007efda770ddf3 in poll (__timeout=-1, __nfds=2, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
#2  virCommandProcessIO (cmd=<optimized out>) at ../../src/util/vircommand.c:2181
#3  0x00007efda7711efd in virCommandRun (cmd=cmd@entry=0x7efd9003d430, exitstatus=exitstatus@entry=0x0) at ../../src/util/vircommand.c:2415
#4  0x00007efd5da8da67 in qemuTPMEmulatorStop (swtpmStateDir=<optimized out>, shortName=0x7efd90066190 "12-rhel8.1.0") at ../../src/qemu/qemu_tpm.c:696
#5  0x00007efd5da8ecba in qemuExtTPMStop (driver=driver@entry=0x7efd3c0f74e0, vm=vm@entry=0x7efd3c17c2e0) at ../../src/qemu/qemu_tpm.c:907
#6  0x00007efd5d9f9c4d in qemuExtDevicesStop (driver=driver@entry=0x7efd3c0f74e0, vm=vm@entry=0x7efd3c17c2e0) at ../../src/qemu/qemu_extdevice.c:228
#7  0x00007efd5da18bb0 in qemuProcessLaunch (conn=<optimized out>, driver=0x7efd3c0f74e0, vm=0x7efd3c17c2e0, asyncJob=<optimized out>, incoming=<optimized out>, snapshot=<optimized out>, vmop=<optimized out>, flags=<optimized out>) at ../../src/qemu/qemu_process.c:6966
#8  0x00007efd5da1db7e in qemuProcessStart (conn=conn@entry=0x7efd680074e0, driver=driver@entry=0x7efd3c0f74e0, vm=vm@entry=0x7efd3c17c2e0, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at ../../src/qemu/qemu_process.c:7106
#9  0x00007efd5da82934 in qemuDomainObjStart (conn=0x7efd680074e0, driver=0x7efd3c0f74e0, vm=0x7efd3c17c2e0, flags=<optimized out>, asyncJob=QEMU_ASYNC_JOB_START) at ../../src/qemu/qemu_driver.c:7499
#10 0x00007efd5da82f99 in qemuDomainCreateWithFlags (dom=0x7efd900327b0, flags=0) at ../../src/qemu/qemu_driver.c:7550
#11 0x00007efda79240a7 in virDomainCreate (domain=domain@entry=0x7efd900327b0) at ../../src/libvirt-domain.c:6536
#12 0x000055911eb09156 in remoteDispatchDomainCreate (server=0x55911faa71e0, msg=0x55911fae2100, args=<optimized out>, rerr=0x7efd9cb258e0, client=0x55911fb2d960) at ./remote/remote_daemon_dispatch_stubs.h:4894
#13 remoteDispatchDomainCreateHelper (server=0x55911faa71e0, client=0x55911fb2d960, msg=0x55911fae2100, rerr=0x7efd9cb258e0, args=<optimized out>, ret=0x0) at ./remote/remote_daemon_dispatch_stubs.h:4873
#14 0x00007efda785c2b9 in virNetServerProgramDispatchCall (msg=0x55911fae2100, client=0x55911fb2d960, server=0x55911faa71e0, prog=0x55911fae5110) at ../../src/rpc/virnetserverprogram.c:430
#15 virNetServerProgramDispatch (prog=0x55911fae5110, server=server@entry=0x55911faa71e0, client=0x55911fb2d960, msg=0x55911fae2100) at ../../src/rpc/virnetserverprogram.c:302
#16 0x00007efda786147c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55911faa71e0) at ../../src/rpc/virnetserver.c:136
#17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55911faa71e0) at ../../src/rpc/virnetserver.c:153
#18 0x00007efda777d780 in virThreadPoolWorker (opaque=opaque@entry=0x55911fa8cbf0) at ../../src/util/virthreadpool.c:163
#19 0x00007efda777cb0c in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:196
#20 0x00007efda3b242de in start_thread () from /lib64/libpthread.so.0
#21 0x00007efda3855e83 in clone () from /lib64/libc.so.6
...

Comment 1 Fangge Jin 2020-04-09 10:48:11 UTC
After step4:
# virsh list
(it can't return)

Comment 2 Jaroslav Suchanek 2020-05-06 08:51:53 UTC
Can you please attach libvirt debugs logs? Thanks.

Comment 3 Fangge Jin 2020-05-07 03:50:42 UTC
Created attachment 1686043 [details]
libvirtd log

Comment 4 Ján Tomko 2020-05-12 11:27:35 UTC
Upstream patch:
https://www.redhat.com/archives/libvir-list/2020-May/msg00467.html

Comment 5 Ján Tomko 2020-05-13 11:38:56 UTC
Pushed upstream as:
commit 0048d9a5b10f9efa8b3a221199cc0080959b85c0
Author:     Ján Tomko <jtomko>
CommitDate: 2020-05-13 12:46:27 +0200

    qemu: only stop external devices after the domain
    
    A failure in qemuProcessLaunch would lead to qemuExtDevicesStop
    being called twice - once in the cleanup section and then again
    in qemuProcessStop.
    
    However, the first one is called while the QEMU process is
    still running, which is too soon for the swtpm process, because
    the swtmp_ioctl command can lock up::
    
       https://bugzilla.redhat.com/show_bug.cgi?id=1822523
    
    Remove the first call and only leave the one in qemuProcessStop,
    which is called after the QEMU process is killed.
    
    Signed-off-by: Ján Tomko <jtomko>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>
    Reviewed-by: Erik Skultety <eskultet>

git describe: v6.3.0-80-g0048d9a5b1

Comment 6 Ján Tomko 2020-05-13 13:40:10 UTC
This is the correct commit ID:
commit 006782a8bc5a27125211946fcb12a40f7d4ed12a
Author:     Ján Tomko <jtomko>
CommitDate: 2020-05-13 15:29:37 +0200

...

git describe: v6.3.0-85-g006782a8bc

Comment 10 Yanqiu Zhang 2020-06-04 04:41:20 UTC
Can be reproduced on libvirt-daemon-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64.

Verify this bug on:
libvirt-daemon-6.0.0-22.el8.x86_64
qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64

Steps:
1. date & virsh start avocado-vt-vm1-uefi ; date
[1] 53561
Wed Jun  3 23:59:00 EDT 2020
error: Failed to start domain avocado-vt-vm1-uefi
error: operation failed: guest CPU doesn't match specification: missing features: mpx

[1]+  Done                    date
Wed Jun  3 23:59:03 EDT 2020

# virsh dumpxml avocado-vt-vm1-uefi |grep 'tpm m' -A2
    <tpm model='tpm-crb'>
      <backend type='emulator' version='2.0'/>
    </tpm>
# ps aux|grep swtpm
root       54124  0.0  0.0  12108  1052 pts/1    S+   23:59   0:00 grep --color=auto swtpm
# ps aux|grep qemu
root       54128  0.0  0.0  12108  1080 pts/1    S+   23:59   0:00 grep --color=auto qemu
# ps aux|grep virsh
root       54145  0.0  0.0  12108  1044 pts/1    S+   00:00   0:00 grep --color=auto virsh

Remove unstartable xml config:
# virsh edit avocado-vt-vm1-uefi 
Domain avocado-vt-vm1-uefi XML configuration edited.

# virsh start avocado-vt-vm1-uefi
Domain avocado-vt-vm1-uefi started

# ps aux|grep swtpm -i
tss        54216  0.5  0.0  23696  3320 ?        Ss   00:01   0:00 /usr/bin/swtpm socket --daemon --ctrl type=unixio,path=/run/libvirt/qemu/swtpm/3-avocado-vt-vm1-uefi-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/904ffe09-3b9f-4193-bbed-12061176512e/tpm2,mode=0600 --log file=/var/log/swtpm/libvirt/qemu/avocado-vt-vm1-uefi-swtpm.log --tpm2 --pid file=/run/libvirt/qemu/swtpm/3-avocado-vt-vm1-uefi-swtpm.pid
qemu … -tpmdev emulator,id=tpm-tpm0,chardev=chrtpm -chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/3-avocado-vt-vm1-uefi-swtpm.sock -device tpm-crb,tpmdev=tpm-tpm0,id=tpm0

# virsh console avocado-vt-vm1-uefi 
Connected to domain avocado-vt-vm1-uefi
Escape character is ^]

Red Hat Enterprise Linux 8.2 (Ootpa)
Kernel 4.18.0-193.el8.x86_64 on an x86_64

localhost login: root
Password:
Last login: Wed Apr 29 21:31:19 from 192.168.122.1
[root@localhost ~]# ls /dev/tpm0
/dev/tpm0
[root@localhost ~]# tpm2_getrandom 7
0x32 0xE3 0x4F 0x9D 0x68 0xAF 0xC6

Test result is as expected, so mark as verified.

Comment 12 errata-xmlrpc 2020-07-28 07:12:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3172


Note You need to log in before you can comment on or make changes to this bug.