Bug 727254
Summary: | unable to kill libvirtd in the middle of managedsave | |||
---|---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Eric Blake <eblake> | |
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | |
Status: | CLOSED DEFERRED | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | unspecified | CC: | crobinso, cwei, dyuan, jdenemar, libvirt-maint, mzhan, yafu | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1117142 (view as bug list) | Environment: | ||
Last Closed: | 2020-11-03 16:30:50 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1117142 |
Description
Eric Blake
2011-08-01 16:46:57 UTC
Definitely an independent bug - I was able to reproduce it using libvirt 0.8.8-4.fc14.x86_64 from virt-preview on Fedora 14. Here's the backtrace that I got when using libvirt.git at commit 193cd0f3: Thread 2 (Thread 0x7fb708b65700 (LWP 19323)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007fb710dc0015 in virCondWait (c=0x7fb70015b328, m=0x7fb70015b300) at util/threads-pthread.c:117 #2 0x00000000004ba7c5 in qemuMonitorSend (mon=0x7fb70015b300, msg=0x7fb708b64340) at qemu/qemu_monitor.c:802 #3 0x00000000004c84f2 in qemuMonitorJSONCommandWithFd (mon=0x7fb70015b300, cmd=0x7fb6f0000a80, scm_fd=-1, reply=0x7fb708b64440) at qemu/qemu_monitor_json.c:225 #4 0x00000000004c8629 in qemuMonitorJSONCommand (mon=0x7fb70015b300, cmd=0x7fb6f0000a80, reply=0x7fb708b64440) at qemu/qemu_monitor_json.c:254 #5 0x00000000004ccee0 in qemuMonitorJSONGetMigrationStatus ( mon=0x7fb70015b300, status=0x7fb708b64540, transferred=0x7fb708b64530, remaining=0x7fb708b64528, total=0x7fb708b64520) at qemu/qemu_monitor_json.c:1920 #6 0x00000000004bcef7 in qemuMonitorGetMigrationStatus (mon=0x7fb70015b300, status=0x7fb708b64540, transferred=0x7fb708b64530, remaining=0x7fb708b64528, total=0x7fb708b64520) at qemu/qemu_monitor.c:1532 #7 0x00000000004b2d06 in qemuMigrationUpdateJobStatus (driver=0x7fb70008ef40, vm=0x7fb700091f20, job=0x5435fe "domain save job", asyncJob=QEMU_ASYNC_JOB_SAVE) at qemu/qemu_migration.c:764 #8 0x00000000004b3075 in qemuMigrationWaitForCompletion ( driver=0x7fb70008ef40, vm=0x7fb700091f20, asyncJob=QEMU_ASYNC_JOB_SAVE) at qemu/qemu_migration.c:845 #9 0x00000000004b854d in qemuMigrationToFile (driver=0x7fb70008ef40, vm=0x7fb700091f20, fd=20, offset=4096, path=0x7fb6f0000a40 "/var/lib/libvirt/qemu/save/fedora_12.save", compressor=0x0, is_reg=true, bypassSecurityDriver=true, asyncJob=QEMU_ASYNC_JOB_SAVE) at qemu/qemu_migration.c:2777 #10 0x000000000046ab4c in qemuDomainSaveInternal (driver=0x7fb70008ef40, dom=0x7fb6f00008c0, vm=0x7fb700091f20, path=0x7fb6f0000a40 "/var/lib/libvirt/qemu/save/fedora_12.save", compressed=0, bypass_cache=false, xmlin=0x0) at qemu/qemu_driver.c:2407 #11 0x000000000046b4d2 in qemuDomainManagedSave (dom=0x7fb6f00008c0, flags=0) at qemu/qemu_driver.c:2589 #12 0x00007fb710e5935c in virDomainManagedSave (dom=0x7fb6f00008c0, flags=0) at libvirt.c:15319 #13 0x000000000042875f in remoteDispatchDomainManagedSave (server=0x195b280, client=0x1966a30, hdr=0x19a6ce8, rerr=0x7fb708b64b30, args=0x7fb6f0000900) at remote_dispatch.h:2573 #14 0x0000000000428658 in remoteDispatchDomainManagedSaveHelper ( server=0x195b280, client=0x1966a30, hdr=0x19a6ce8, rerr=0x7fb708b64b30, args=0x7fb6f0000900, ret=0x7fb6f0000930) at remote_dispatch.h:2551 #15 0x0000000000453efe in virNetServerProgramDispatchCall (prog=0x195b250, server=0x195b280, client=0x1966a30, msg=0x1966cd0) at rpc/virnetserverprogram.c:375 #16 0x0000000000453a00 in virNetServerProgramDispatch (prog=0x195b250, server=0x195b280, client=0x1966a30, msg=0x1966cd0) at rpc/virnetserverprogram.c:252 #17 0x0000000000456b21 in virNetServerHandleJob (jobOpaque=0x1960050, opaque=0x195b280) at rpc/virnetserver.c:155 #18 0x00007fb710dc06a4 in virThreadPoolWorker (opaque=0x195b370) at util/threadpool.c:98 #19 0x00007fb710dc01d7 in virThreadHelper (data=0x1965120) at util/threads-pthread.c:157 #20 0x0000003d16606ccb in start_thread (arg=0x7fb708b65700) at pthread_create.c:301 #21 0x0000003d15ee0c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 1 (Thread 0x7fb71042b860 (LWP 19301)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007fb710dc0015 in virCondWait (c=0x195b3f0, m=0x195b398) at util/threads-pthread.c:117 #2 0x00007fb710dc096d in virThreadPoolFree (pool=0x195b370) at util/threadpool.c:172 #3 0x000000000045846f in virNetServerFree (srv=0x195b280) at rpc/virnetserver.c:757 #4 0x0000000000422337 in main (argc=1, argv=0x7fff94dfbf88) at libvirtd.c:1561 The thing is that virThreadPoolFree asks all threads to quit and waits until they do so. If any of the threads is waiting for an I/O event and the I/O thread quits before signaling the waiting thread, that thread will never quit since its condition will never be signaled. This bug is easier to spot when one of the threads is inside qemuMigrationWaitForCompletion, which is sending commands to qemu monitor in a loop so even if the I/O thread signals the right condition before quiting, the migration thread will send another command to qemu and wait for the condition which no-one will ever signal. I can reproduce this bug with "kill -SIGINIT" but cann't reproduce it with "service libvirtd stop". # virsh managedsave dom & sleep 1; service libvirtd stop [1] 19235 Stopping libvirtd daemon: error: Failed to save domain dom state error: End of file while reading data: Input/output error [ OK ] [1]+ Exit 1 virsh managedsave dom ========================== # virsh managedsave dom & sleep 1; kill -SIGINT `pidof libvirtd` [2] 19067 # ps aux|grep libvirtd root 18957 1.3 0.1 639660 14468 ? Sl 06:47 0:00 libvirtd --daemon root 19073 0.0 0.0 103304 876 pts/0 S+ 06:48 0:00 grep libvirtd # service libvirtd status libvirtd (pid 18957) is running... # virsh list --all error: Failed to reconnect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory # ps aux|grep virsh root 19067 0.0 0.0 225332 4324 pts/0 S 06:48 0:00 virsh managedsave dom Has anyone verified this is still an issue? It hasn't been materially updated for over 4 years Yes. Thank you for reporting this issue to the libvirt project. Unfortunately we have been unable to resolve this issue due to insufficient maintainer capacity and it will now be closed. This is not a reflection on the possible validity of the issue, merely the lack of resources to investigate and address it, for which we apologise. If you none the less feel the issue is still important, you may choose to report it again at the new project issue tracker https://gitlab.com/libvirt/libvirt/-/issues The project also welcomes contribution from anyone who believes they can provide a solution. |