Description of problem: (gdb) t a a bt Thread 11 (Thread 0x7f16d4e28700 (LWP 19196)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f16d862f700 (LWP 19189)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1676090) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f16d662b700 (LWP 19193)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1676090) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f16d6e2c700 (LWP 19192)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f16d762d700 (LWP 19191)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1676090) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f16d8e30700 (LWP 19188)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f16d9252840 (LWP 19187)): #0 0x00000030d0ae8e8d in poll () from /lib64/libc.so.6 #1 0x0000003f8cc5d8fb in poll (__timeout=-1, __nfds=9, __fds=<optimized out>) at /usr/include/bits/poll2.h:46 #2 virEventPollRunOnce () at util/event_poll.c:615 #3 0x0000003f8cc5c687 in virEventRunDefaultImpl () at util/event.c:247 #4 0x0000003f8cd4204d in virNetServerRun (srv=0x1684bf0) at rpc/virnetserver.c:751 #5 0x000000000040bebd in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1332 Thread 4 (Thread 0x7f16d5e2a700 (LWP 19194)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f16d7e2e700 (LWP 19190)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f16d4627700 (LWP 19197)): #0 0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, m=m@entry=0x1684d78) at util/threads-pthread.c:117 #2 0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1676090) at util/threadpool.c:103 #3 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #4 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030d0af196d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f16d5629700 (LWP 19195)): #0 qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f16c808ae10, driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f16ac0038e0, job=job@entry=QEMU_JOB_DESTROY, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771 #1 0x00007f16d17d6aaa in qemuDomainObjBeginJobWithDriver ( driver=driver@entry=0x7f16c808ae10, obj=obj@entry=0x7f16ac0038e0, job=job@entry=QEMU_JOB_DESTROY) at qemu/qemu_domain.c:906 #2 0x00007f16d1822afc in qemuDomainDestroyFlags (dom=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:1988 #3 0x0000003f8cce8531 in virDomainDestroyFlags ( domain=domain@entry=0x7f16c4000be0, flags=1) at libvirt.c:2253 #4 0x00000000004133ad in remoteDispatchDomainDestroyFlags ( args=0x7f16c40008c0, rerr=0x7f16d5628c70, client=<optimized out>, server=<optimized out>, msg=<optimized out>) at remote_dispatch.h:1137 #5 remoteDispatchDomainDestroyFlagsHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f16d5628c70, args=0x7f16c40008c0, ret=<optimized out>) at remote_dispatch.h:1115 #6 0x0000003f8cd45686 in virNetServerProgramDispatchCall (msg=0x16ae860, client=0x16ae200, server=0x1684bf0, prog=0x16a9f00) at rpc/virnetserverprogram.c:424 #7 virNetServerProgramDispatch (prog=0x16a9f00, server=server@entry=0x1684bf0, client=0x16ae200, msg=0x16ae860) at rpc/virnetserverprogram.c:297 #8 0x0000003f8cd41691 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x1684bf0) at rpc/virnetserver.c:170 #9 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x1684bf0) at rpc/virnetserver.c:191 #10 0x0000003f8cc6ddde in virThreadPoolWorker (opaque=opaque@entry=0x1676090) at util/threadpool.c:144 #11 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #12 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #13 0x00000030d0af196d in clone () from /lib64/libc.so.6 (gdb) print *priv Cannot access memory at address 0x0 Version-Release number of selected component (if applicable): libvirt-0.10.0-1.fc19.x86_64 How reproducible: ? Happened once. Steps to Reproduce: 1. Unknown.
Created attachment 612123 [details] core file (xz compressed)
Sorry, I omitted the first frame, which is: #0 qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f16c808ae10, driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f16ac0038e0, job=job@entry=QEMU_JOB_DESTROY, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771 771 priv->jobs_queued++;
Still occurs in libvirt-0.10.1-2.fc18.x86_64. The stack trace is essentially identical to above.
For 0.10.1-2: Core was generated by `/usr/sbin/libvirtd --timeout=30'. Program terminated with signal 11, Segmentation fault. #0 qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f11d0097d80, driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f11c4003700, job=job@entry=QEMU_JOB_DESTROY, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771 771 priv->jobs_queued++; (gdb) bt #0 qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f11d0097d80, driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f11c4003700, job=job@entry=QEMU_JOB_DESTROY, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771 #1 0x00007f11d91a6a9a in qemuDomainObjBeginJobWithDriver ( driver=driver@entry=0x7f11d0097d80, obj=obj@entry=0x7f11c4003700, job=job@entry=QEMU_JOB_DESTROY) at qemu/qemu_domain.c:906 #2 0x00007f11d91f2abc in qemuDomainDestroyFlags (dom=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:1960 #3 0x00007f11e1c3e591 in virDomainDestroyFlags ( domain=domain@entry=0x7f11b8000910, flags=1) at libvirt.c:2253 #4 0x00000000004133ad in remoteDispatchDomainDestroyFlags ( args=0x7f11b80008c0, rerr=0x7f11dd710c70, client=<optimized out>, server=<optimized out>, msg=<optimized out>) at remote_dispatch.h:1137 #5 remoteDispatchDomainDestroyFlagsHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f11dd710c70, args=0x7f11b80008c0, ret=<optimized out>) at remote_dispatch.h:1115 #6 0x00007f11e1c9b6e6 in virNetServerProgramDispatchCall (msg=0x2667af0, client=0x2667490, server=0x263edb0, prog=0x2663f40) at rpc/virnetserverprogram.c:424 #7 virNetServerProgramDispatch (prog=0x2663f40, server=server@entry=0x263edb0, client=0x2667490, msg=0x2667af0) at rpc/virnetserverprogram.c:297 #8 0x00007f11e1c976f1 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x263edb0) at rpc/virnetserver.c:170 #9 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x263edb0) at rpc/virnetserver.c:191 #10 0x00007f11e1bc3e0e in virThreadPoolWorker (opaque=opaque@entry=0x262fd80) at util/threadpool.c:144 #11 0x00007f11e1bc3899 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161 #12 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0 #13 0x00000030d0af196d in clone () from /lib64/libc.so.6 (gdb) print *obj $1 = { object = { magic = 3288365072, refs = 32529, klass = 0x7f11c4000078 }, lock = { lock = { __data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = { __prev = 0x0, __next = 0x0 } }, __size = '\000' <repeats 39 times>, __align = 0 } }, pid = 0, state = { state = 0, reason = 0 }, autostart = 0, persistent = 0, updated = 0, def = 0x0, newDef = 0x0, snapshots = 0x0, current_snapshot = 0x0, hasManagedSave = false, privateData = 0x0, privateDataFreeFunc = 0x0, taint = 0 } (gdb) print/x *obj->object.klass $4 = { magic = 0xc400bd50, name = 0x7f11c40041a0, objectSize = 0x7f11c40036f0, dispose = 0x7f11c40041a0 }
Created attachment 612139 [details] core file (xz compressed) for libvirt 0.10.1-2.fc18.x86_64
Hmm, the reference count looks well bogus - I struggle to see any reason why its soo large: (gdb) print *obj $1 = { object = { magic = 3288365072, refs = 32529, klass = 0x7f11c4000078 }, The magic value looks pretty bogus too. Magic values start being allocated at 0xCAFE0000 All round memory bogosity.
Possibly relevant to this is that qemu-kvm segfaults on shutdown, and libvirtd segfaults ~ 2 seconds later.
(In reply to comment #7) > Possibly relevant to this is that qemu-kvm segfaults on shutdown, > and libvirtd segfaults ~ 2 seconds later. The qemu segfault being bug 853408.
I tried to reproduce it simply using virsh & the same libvirt & randomly killing QEMU with SIG_SEGV, but failed. Could you see if you can reproduce it while running libvirtd under valgrind.
I tried running the test suite while libvirtd was running under valgrind, but libvirtd appeared to deadlock. Quite possibly a problem with valgrind though, so this is inconclusive.
I should add that libvirtd run in exactly the same way, but sans valgrind, runs fine (at the moment -- but I did see this bug earlier today). So I guess valgrind is causing the deadlock. Does libvirtd normally run OK under valgrind? Ditto if you run it as non-root with ./run valgrind --logfile=/tmp/log daemon/libvirtd --timeout=30 ?
> Does libvirtd normally run OK under valgrind? Yes, it should run normally - the only known issue is that you can't run LXC guests. Does it deadlock immediately, or only when running specific APIs ?
I'm assuming this was resolved by: commit 25f582e36a1c066b6c82303b5e4f18eec337a25b Author: Daniel P. Berrange <berrange> Date: Wed Sep 26 15:54:58 2012 +0100 Fix (rare) deadlock in QEMU monitor callbacks Some users report (very rarely) seeing a deadlock in the QEMU monitor callbacks But move out of POST if I'm wrong
*** This bug has been marked as a duplicate of bug 859009 ***