Bug 1402921
Summary: | qemu monitor event handlers requiring VM lock should not be run from daemon event loop | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Peter Krempa <pkrempa> |
Component: | libvirt | Assignee: | Daniel Berrangé <berrange> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Han Han <hhan> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.0 | CC: | dm, dyuan, jdenemar, lmen, mprivozn, pkrempa, xuzhang |
Target Milestone: | rc | Keywords: | TestOnly, Triaged |
Target Release: | 8.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-19 16:19:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Peter Krempa
2016-12-08 16:03:13 UTC
Stopping of the qemu process is handled via the event thread now, so the event loop should no-longer be blocked: commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0 Author: Jiri Denemark <jdenemar> Date: Thu Feb 11 15:32:48 2016 +0100 qemu: Process monitor EOF in a job Stopping a domain without a job risks a race condition with another thread which started a job a which does not expect anyone else to be messing around with the same domain object. This should already be fixed since RHEL-7.3. Hi, Peter, I do not understand the bug description clearly, could you please help to provide more detailed steps to verify this bug? Thank you very much. According to deeper investigation from QE it's not enough just to punt the handlers out of the event loop but we also need to guarantee that the callbacks don't get stuck on a lock: Thread 21 (Thread 0x7f6ac5ba1700 (LWP 6658)): #0 0x00007f6ad53f48f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f6ab68ea893 in glfs_init_wait () from /lib64/libgfapi.so.0 #2 0x00007f6ab68ead58 in glfs_init () from /lib64/libgfapi.so.0 #3 0x00007f6ab6b0ffad in virStorageFileBackendGlusterInit (src=0x7f6a80002200) at storage/storage_backend_gluster.c:660 #4 0x00007f6ac2247140 in virStorageFileInitAs (src=src@entry=0x7f6a80002200, uid=uid@entry=4294967295, gid=gid@entry=4294967295) at storage/storage_source.c:153 #5 0x00007f6ac22471f8 in virStorageFileInit (src=src@entry=0x7f6a80002200) at storage/storage_source.c:173 #6 0x00007f6ab53c3b08 in qemuSecurityChownCallback (src=<optimized out>, uid=0, gid=0) at qemu/qemu_driver.c:353 #7 0x00007f6ad82e8046 in virSecurityDACSetOwnershipInternal (priv=priv@entry=0x7f6aac1ad700, src=src@entry=0x7f6aac1b6570, path=path@entry=0x0, uid=uid@entry=0, gid=gid@entry=0) at security/security_dac.c:569 #8 0x00007f6ad82e849b in virSecurityDACRestoreFileLabelInternal (priv=priv@entry=0x7f6aac1ad700, src=src@entry=0x7f6aac1b6570, path=path@entry=0x0) at security/security_dac.c:678 #9 0x00007f6ad82ea315 in virSecurityDACRestoreImageLabelInt (mgr=mgr@entry=0x7f6aac1ad730, def=def@entry=0x7f6aac1b5e50, src=0x7f6aac1b6570, migrated=migrated@entry=false) at security/security_dac.c:793 #10 0x00007f6ad82ea5b9 in virSecurityDACRestoreAllLabel (mgr=0x7f6aac1ad730, def=0x7f6aac1b5e50, migrated=<optimized out>, chardevStdioLogd=<optimized out>) at security/security_dac.c:1553 #11 0x00007f6ad82ed2b1 in virSecurityManagerRestoreAllLabel (mgr=0x7f6aac1ad730, vm=vm@entry=0x7f6aac1b5e50, migrated=migrated@entry=false, chardevStdioLogd=chardevStdioLogd@entry=true) at security/security_manager.c:889 #12 0x00007f6ad82e6c22 in virSecurityStackRestoreAllLabel (mgr=<optimized out>, vm=0x7f6aac1b5e50, migrated=<optimized out>, chardevStdioLogd=<optimized out>) at security/security_stack.c:381 #13 0x00007f6ad82ed2b1 in virSecurityManagerRestoreAllLabel (mgr=0x7f6aac0fb7f0, vm=0x7f6aac1b5e50, migrated=migrated@entry=false, chardevStdioLogd=<optimized out>) at security/security_manager.c:889 #14 0x00007f6ab540642d in qemuSecurityRestoreAllLabel (driver=driver@entry=0x7f6aac1042e0, vm=vm@entry=0x7f6aac1b6af0, migrated=migrated@entry=false) at qemu/qemu_security.c:78 #15 0x00007f6ab5371b11 in qemuProcessStop (driver=driver@entry=0x7f6aac1042e0, vm=0x7f6aac1b6af0, reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=0) at qemu/qemu_process.c:6453 #16 0x00007f6ab53e4aa2 in qemuDomainDestroyFlags (dom=<optimized out>, flags=flags@entry=0) at qemu/qemu_driver.c:2300 #17 0x00007f6ab53e4da3 in qemuDomainDestroy (dom=<optimized out>) at qemu/qemu_driver.c:2322 #18 0x00007f6ad819d2d0 in virDomainDestroy (domain=domain@entry=0x7f6a800025e0) at libvirt-domain.c:475 #19 0x000055a25ec4cb7f in remoteDispatchDomainDestroy (server=0x55a2604b9f90, msg=0x55a2605088d0, args=<optimized out>, rerr=0x7f6ac5ba0bc0, client=0x55a260508a60) at remote_dispatch.h:4615 #20 remoteDispatchDomainDestroyHelper (server=0x55a2604b9f90, client=0x55a260508a60, msg=0x55a2605088d0, rerr=0x7f6ac5ba0bc0, args=<optimized out>, ret=0x7f6a80002510) at remote_dispatch.h:4591 #21 0x00007f6ad824bd5c in virNetServerProgramDispatchCall (msg=0x55a2605088d0, client=0x55a260508a60, server=0x55a2604b9f90, prog=0x55a260505b70) at rpc/virnetserverprogram.c:437 #22 virNetServerProgramDispatch (prog=0x55a260505b70, server=server@entry=0x55a2604b9f90, client=client@entry=0x55a260508a60, msg=msg@entry=0x55a2605088d0) at rpc/virnetserverprogram.c:307 #23 0x000055a25ec670da in virNetServerProcessMsg (srv=srv@entry=0x55a2604b9f90, client=0x55a260508a60, prog=<optimized out>, msg=0x55a2605088d0) at rpc/virnetserver.c:148 #24 0x000055a25ec674e8 in virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a2604b9f90) at rpc/virnetserver.c:169 #25 0x00007f6ad80b4aa1 in virThreadPoolWorker (opaque=opaque@entry=0x55a2604ae4c0) at util/virthreadpool.c:167 #26 0x00007f6ad80b3870 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #27 0x00007f6ad53f0dd5 in start_thread () from /lib64/libpthread.so.0 #28 0x00007f6ad511a94d in clone () from /lib64/libc.so.6 ... Thread 1 (Thread 0x7f6ad8fb48c0 (LWP 6650)): #0 0x00007f6ad53f747d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f6ad53f2d7b in _L_lock_812 () from /lib64/libpthread.so.0 #2 0x00007f6ad53f2c48 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f6ad80b3ac1 in virMutexLock (m=<optimized out>) at util/virthread.c:89 #4 0x00007f6ad8085fdf in virObjectLock (anyobj=anyobj@entry=0x7f6aac1b6af0) at util/virobject.c:435 #5 0x00007f6ab536652c in qemuProcessHandleEvent (mon=<optimized out>, vm=0x7f6aac1b6af0, eventName=0x55a2604bbf90 "SHUTDOWN", seconds=1516949172, micros=523889, details=0x55a2604bbdd0 "{\"guest\":false}", opaque=0x7f6aac1042e0) at qemu/qemu_process.c:640 #6 0x00007f6ab538f9ce in qemuMonitorEmitEvent (mon=mon@entry=0x7f6a980230f0, event=event@entry=0x55a2604bbf90 "SHUTDOWN", seconds=1516949172, micros=523889, details=0x55a2604bbdd0 "{\"guest\":false}") at qemu/qemu_monitor.c:1326 #7 0x00007f6ab53aac43 in qemuMonitorJSONIOProcessEvent (obj=0x55a260507520, mon=0x7f6a980230f0) at qemu/qemu_monitor_json.c:172 #8 qemuMonitorJSONIOProcessLine (mon=mon@entry=0x7f6a980230f0, line=<optimized out>, msg=msg@entry=0x0) at qemu/qemu_monitor_json.c:209 #9 0x00007f6ab53aafa0 in qemuMonitorJSONIOProcess (mon=mon@entry=0x7f6a980230f0, data=0x55a260509000 "{\"timestamp\": {\"seconds\": 1516949172, \"microseconds\": 523889}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, msg=msg@entry=0x0) at qemu/qemu_monitor_json.c:251 #10 0x00007f6ab538dc0e in qemuMonitorIOProcess (mon=0x7f6a980230f0) at qemu/qemu_monitor.c:441 #11 qemuMonitorIO (watch=watch@entry=15, fd=<optimized out>, events=0, events@entry=1, opaque=opaque@entry=0x7f6a980230f0) at qemu/qemu_monitor.c:695 #12 0x00007f6ad803827f in virEventPollDispatchHandles (fds=<optimized out>, nfds=<optimized out>) at util/vireventpoll.c:508 #13 virEventPollRunOnce () at util/vireventpoll.c:657 #14 0x00007f6ad8035f3a in virEventRunDefaultImpl () at util/virevent.c:327 #15 0x00007f6ad8243235 in virNetDaemonRun (dmn=dmn@entry=0x55a2604bbd40) at rpc/virnetdaemon.c:837 #16 0x000055a25ec1357c in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1494 Moving back to assigned. Hello! Looks like we hit this yesterday, could you tell me is it already fixed? Thank you! btw, I think this https://bugzilla.redhat.com/show_bug.cgi?id=1559722 is the same bug. (In reply to Dmitry Melekhov from comment #8) > btw, I think this https://bugzilla.redhat.com/show_bug.cgi?id=1559722 > is the same bug. The above bug tracks the stuck gluster connection, while here we track that a second libvirtd thread can get stuck too. This bug is going to be addressed in next major release. I think this was fixed upstream by the following series: a18f2c52ac qemu: convert agent to use the per-VM event loop 0d62faf62a qemu: fix variable naming in agent code 436a56e37d qemu: convert monitor to use the per-VM event loop ba906ab1c0 tests: start/stop an event thread for QEMU monitor/agent tests 92890fbfa1 qemu: start/stop an event thread for QMP probing e6afacb0fe qemu: start/stop an event loop thread for domains 5c146f6e57 src: introduce an abstraction for running event loops v6.2.0-rc1~236 Moving to POST per comment 13. Libvirt creates per-vm thread for talking on monitor and agent sockets: # gdb -p $(pgrep libvirtd) ... poll 1 $ info threads Id Target Id Frame * 1 Thread 0x7ffacade2d80 (LWP 9466) "libvirtd" 0x00007ffacde26fff in poll () from /lib64/libc.so.6 2 Thread 0x7ffaca70b640 (LWP 9467) "rpc-worker" 0x00007ffacd673732 in __futex_abstimed_wait_common64 () from /lib64/libpthread.so.0 3 Thread 0x7ffac9f0a640 (LWP 9468) "rpc-worker" 0x00007ffacd673732 in __futex_abstimed_wait_common64 () from /lib64/libpthread.so.0 4 Thread 0x7ffac9709640 (LWP 9469) "prio-rpc-worker" 0x00007ffacd673732 in __futex_abstimed_wait_common64 () from /lib64/libpthread.so.0 5 Thread 0x7ffac3fff640 (LWP 9470) "rpc-worker" 0x00007ffacd673732 in __futex_abstimed_wait_common64 () from /lib64/libpthread.so.0 6 Thread 0x7ffac2ffd640 (LWP 9472) "gmain" 0x00007ffacde26fff in poll () from /lib64/libc.so.6 7 Thread 0x7ffac27fc640 (LWP 9473) "gdbus" 0x00007ffacde26fff in poll () from /lib64/libc.so.6 8 Thread 0x7ffac1ffb640 (LWP 9522) "udev-event" 0x00007ffacd673732 in __futex_abstimed_wait_common64 () from /lib64/libpthread.so.0 9 Thread 0x7ffac37fe640 (LWP 9724) "vm-fedora" 0x00007ffacde26fff in poll () from /lib64/libc.so.6 (the thread 9 "vm-fedora" is for processing QMP on guest "fedora") Dan - since the referenced patches from Michal in Comment 13 are yours, I'm assigning to you under the premise that no bug in POST, w/ ITR/ITM set should be assigned to virt-maint According to the commits in comment #13 this was released already in RHEL-AV 8.3.0. Marking as TestOnly for 8.5.0. Unable to reproduce the bug as report on libvirt-6.0.0-36.module+el8.5.0+11222+c889b3f3.x86_64 with these steps: 1. Start VM with gluster network disk 2. Stop gluster cluster(2 nodes) 3. Destroy VM VM is not stucked. I have check the threads of libvirt on libvirt-6.0.0-36.module+el8.5.0+11222+c889b3f3.x86_64 and libvirt-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64 VM xml: <domain> ... <devices> <disk type="network" device="disk"> <driver name="qemu" type="qcow2"/> <source protocol="gluster" name="gv0/hhan.qcow2" index="1"> <host name="10.0.148.14" port="24007"/> </source> <backingStore/> <target dev="vda" bus="virtio"/> <alias name="virtio-disk0"/> <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/> </disk> ... <channel type="unix"> <target type="virtio" name="org.qemu.guest_agent.0"/> <address type="virtio-serial" controller="0" bus="0" port="1"/> </channel> </devices> </domain> After VM started, check threads of libvirtd: Results of libvirt-6.0.0-36.module+el8.5.0+11222+c889b3f3.x86_64 (gdb) info threads Id Target Id Frame * 1 Thread 0x7fc8d62f4c00 (LWP 506213) "libvirtd" 0x00007fc8d1881a41 in poll () from /lib64/libc.so.6 2 Thread 0x7fc8be617700 (LWP 506214) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7fc8bde16700 (LWP 506215) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 0x7fc8bd615700 (LWP 506216) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7fc8bce14700 (LWP 506217) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7fc8bc613700 (LWP 506218) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7fc8bbe12700 (LWP 506219) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7fc8bb611700 (LWP 506220) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 9 Thread 0x7fc8bae10700 (LWP 506221) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x7fc8ba60f700 (LWP 506222) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 11 Thread 0x7fc8b9e0e700 (LWP 506223) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x7fc87efac700 (LWP 506224) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 13 Thread 0x7fc87e7ab700 (LWP 506225) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 14 Thread 0x7fc87dfaa700 (LWP 506226) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 Thread 0x7fc87d7a9700 (LWP 506227) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 Thread 0x7fc87cfa8700 (LWP 506228) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 17 Thread 0x7fc86688b700 (LWP 506276) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 18 Thread 0x7fc86608a700 (LWP 506308) "glfs_sproc0" 0x00007fc8d1b6364a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 19 Thread 0x7fc863b92700 (LWP 506309) "glfs_sproc1" 0x00007fc8d1b6364a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7fc867fff700 (LWP 506353) "libvirtd" 0x00007fc8d1b632fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 Results of libvirt-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64: (gdb) info threads Id Target Id Frame * 1 Thread 0x7f04af9a7b00 (LWP 503626) "libvirtd" 0x00007f04adcfda41 in poll () from /lib64/libc.so.6 2 Thread 0x7f04995cb700 (LWP 503627) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7f0498dca700 (LWP 503628) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 0x7f04985c9700 (LWP 503629) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7f0497dc8700 (LWP 503630) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7f048f5c7700 (LWP 503631) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7f04975c7700 (LWP 503632) "prio-rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7f0496dc6700 (LWP 503633) "prio-rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 9 Thread 0x7f04965c5700 (LWP 503634) "prio-rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x7f0495dc4700 (LWP 503635) "prio-rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 11 Thread 0x7f04955c3700 (LWP 503636) "prio-rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x7f0469a00700 (LWP 503637) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 13 Thread 0x7f04691ff700 (LWP 503638) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 14 Thread 0x7f04689fe700 (LWP 503639) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 Thread 0x7f04532f6700 (LWP 503640) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 Thread 0x7f0452af5700 (LWP 503641) "rpc-worker" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 17 Thread 0x7f0443fff700 (LWP 503643) "gmain" 0x00007f04adcfda41 in poll () from /lib64/libc.so.6 18 Thread 0x7f04437fe700 (LWP 503644) "gdbus" 0x00007f04adcfda41 in poll () from /lib64/libc.so.6 19 Thread 0x7f0442ffd700 (LWP 503691) "udev-event" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7f04408ee700 (LWP 503718) "glfs_sproc0" 0x00007f04ab56064a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 21 Thread 0x7f043bfff700 (LWP 503719) "glfs_sproc1" 0x00007f04ab56064a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 22 Thread 0x7f04522f4700 (LWP 503921) "vm-hhan" 0x00007f04adcfda41 in poll () from /lib64/libc.so.6 23 Thread 0x7f0437fd6700 (LWP 503947) "qemu-event" 0x00007f04ab5602fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 Hi,Peter, is there any other methods to reproduce this bug? I'm sorry I didn't follow the bug too closely any more so I don't have any suggestions for better testing. Michal, do you have any idea to reproduce this bug? I'm sorry. I don't have any clever idea either. I think attaching gdb to a daemon that has a domain running and typing 'info threads' to observe 'vm-$domname" is enough. That's the per domain thread that handles the events. |