Description of problem: Version-Release number of selected component (if applicable): - Current Head (3.6 and 3.7-rc*) How reproducible: - takes a while but close to 100% Steps to Reproduce: - Run concurrent create/start/stop/destroy loops - Openstack reboot --hard - probably others - Different details are at pad.lv/1714254 Actual results: - libvirtd hangs up on a a deadlock Expected results: - libvirtd continues to work Additional info: The change introducing the issue is https://libvirt.org/git/?p=libvirt.git;a=commit;h=ace45e67abbd9a033be54602db71a9dbc71408d7 Author: Michal Privoznik <mprivozn> Date: Tue Aug 1 11:04:22 2017 +0200 virNetDaemonCallInhibit: Call virNetDaemonGotInhibitReply properly Since that there is a race that can happen and cause the execution to lock the same object twice. Since pthread_mutex lock on the default locks behaves works without error check - the second lock is a deadlock. On a stacktrace like the following: Thread 3 (Thread 0x7f07e941e700 (LWP 30193)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f07f4891dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55e8a5ae4aa0) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f07f51d9c45 in virMutexLock (m=m@entry=0x55e8a5ae4aa0) at ../../../src/util/virthread.c:89 #3 0x00007f07f51ba0ca in virObjectLock (anyobj=anyobj@entry=0x55e8a5ae4a90) at ../../../src/util/virobject.c:388 #4 0x00007f07f52f7558 in virNetDaemonGotInhibitReply (pending=0x7f07b8018fc0, opaque=opaque@entry=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:449 #5 0x00007f07f52f83e7 in virNetDaemonCallInhibit (what=0x7f07f53a6419 "shutdown", mode=0x7f07f53d8ce2 "delay", why=0x7f07f53f99a0 "Virtual machines need to be saved", who=0x7f07f53f9b46 "Libvirt", dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:519 #6 virNetDaemonAddShutdownInhibition (dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:541 #7 0x00007f07b2bbe4d2 in qemuProcessInit (driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migration=migration@entry=false, flags=flags@entry=17) at ../../../src/qemu/qemu_process.c:4726 #8 0x00007f07b2bc1313 in qemuProcessStart (conn=conn@entry=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at ../../../src/qemu/qemu_process.c:5850 #9 0x00007f07b2c1f7ff in qemuDomainObjStart (conn=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=0x7f07ac003360, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at ../../../src/qemu/qemu_driver.c:7121 #10 0x00007f07b2c1fef6 in qemuDomainCreateWithFlags (dom=0x7f07b8000b20, flags=0) at ../../../src/qemu/qemu_driver.c:7175 #11 0x00007f07f5295ef1 in virDomainCreateWithFlags (domain=domain@entry=0x7f07b8000b20, flags=0) at ../../../src/libvirt-domain.c:6595 #12 0x000055e8a4e3227a in remoteDispatchDomainCreateWithFlags (server=0x55e8a5ae1200, msg=0x55e8a5ae4b90, ret=0x7f07b8015fe0, args=0x7f07b8000af0, rerr=0x7f07e941dbc0, client=<optimized out>) at ../../../daemon/remote_dispatch.h:4307 #13 remoteDispatchDomainCreateWithFlagsHelper (server=0x55e8a5ae1200, client=<optimized out>, msg=0x55e8a5ae4b90, rerr=0x7f07e941dbc0, args=0x7f07b8000af0, ret=0x7f07b8015fe0) at ../../../daemon/remote_dispatch.h:4282 #14 0x00007f07f52fe599 in virNetServerProgramDispatchCall (msg=0x55e8a5ae4b90, client=0x55e8a5b03840, server=0x55e8a5ae1200, prog=0x55e8a5b27070) at ../../../src/rpc/virnetserverprogram.c:437 #15 virNetServerProgramDispatch (prog=0x55e8a5b27070, server=server@entry=0x55e8a5ae1200, client=0x55e8a5b03840, msg=0x55e8a5ae4b90) at ../../../src/rpc/virnetserverprogram.c:307 #16 0x000055e8a4e53268 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55e8a5ae1200) at ../../../src/rpc/virnetserver.c:148 #17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55e8a5ae1200) at ../../../src/rpc/virnetserver.c:169 #18 0x00007f07f51da71b in virThreadPoolWorker (opaque=opaque@entry=0x55e8a5ad6f60) at ../../../src/util/virthreadpool.c:167 #19 0x00007f07f51d9ac8 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206 #20 0x00007f07f488f6ba in start_thread (arg=0x7f07e941e700) at pthread_create.c:333 #21 0x00007f07f45c53dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 There was a lock at frame #6: 534 virObjectLock(dmn); But under certain conditions (the race) it later on tried to lock the same in frame #4 449 virObjectLock(dmn);
There is a suggested patch we will use as hot,fix but wait for a proper discussion/solution to be picked up eventually. See: https://launchpadlibrarian.net/335315545/avoid-double-locking.patch
Hmm that is actually just the "partial revert" hotfix, the suggestion I actually meant is at http://paste.ubuntu.com/25439049/
I've proposed patch on the upstream list: https://www.redhat.com/archives/libvir-list/2017-September/msg00015.html
Change builds fine and seems to fix the issue. Survived 20 minutes in my stress loop, which it never did before.
And I've pushed the patch: commit 054c6d27214ef90feab3897b052f450492d5ef3f Author: Michal Privoznik <mprivozn> AuthorDate: Fri Sep 1 10:33:12 2017 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Fri Sep 1 13:21:33 2017 +0200 virnetdaemon: Don't deadlock when talking to D-Bus https://bugzilla.redhat.com/show_bug.cgi?id=1487322 In ace45e67abbd I tried to fix a problem that we get the reply to a D-Bus call while we were sleeping. In that case the callback was never set. So I changed the code that the callback is called directly in this case. However, I hadn't realized that since the callback is called out of order it locks the virNetDaemon. Exactly the very same virNetDaemon object that we are dealing with right now and that we have locked already (in virNetDaemonAddShutdownInhibition()) Signed-off-by: Michal Privoznik <mprivozn> v3.7.0-rc2-5-g054c6d272