Bug 856639

Summary: libvirtd segfaults in qemuDomainObjBeginJobInternal, priv is NULL
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: berrange, clalancette, crobinso, itamar, jforbes, jyang, laine, libvirt-maint, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-27 16:21:54 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
core file (xz compressed)
none
core file (xz compressed) for libvirt 0.10.1-2.fc18.x86_64 none

Description Richard W.M. Jones 2012-09-12 09:49:20 EDT
Description of problem:

(gdb) t a a bt

Thread 11 (Thread 0x7f16d4e28700 (LWP 19196)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f16d862f700 (LWP 19189)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1676090)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f16d662b700 (LWP 19193)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1676090)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f16d6e2c700 (LWP 19192)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f16d762d700 (LWP 19191)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1676090)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f16d8e30700 (LWP 19188)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f16d9252840 (LWP 19187)):
#0  0x00000030d0ae8e8d in poll () from /lib64/libc.so.6
#1  0x0000003f8cc5d8fb in poll (__timeout=-1, __nfds=9, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:46
#2  virEventPollRunOnce () at util/event_poll.c:615
#3  0x0000003f8cc5c687 in virEventRunDefaultImpl () at util/event.c:247
#4  0x0000003f8cd4204d in virNetServerRun (srv=0x1684bf0)
    at rpc/virnetserver.c:751
#5  0x000000000040bebd in main (argc=<optimized out>, argv=<optimized out>)
    at libvirtd.c:1332

Thread 4 (Thread 0x7f16d5e2a700 (LWP 19194)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f16d7e2e700 (LWP 19190)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684da0, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de7b in virThreadPoolWorker (opaque=opaque@entry=0x1675d80)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f16d4627700 (LWP 19197)):
#0  0x00000030d0e0b5e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f8cc6da36 in virCondWait (c=c@entry=0x1684e38, 
    m=m@entry=0x1684d78) at util/threads-pthread.c:117
#2  0x0000003f8cc6de9b in virThreadPoolWorker (opaque=opaque@entry=0x1676090)
    at util/threadpool.c:103
#3  0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#4  0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030d0af196d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f16d5629700 (LWP 19195)):
#0  qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f16c808ae10, 
    driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f16ac0038e0, 
    job=job@entry=QEMU_JOB_DESTROY, 
    asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771
#1  0x00007f16d17d6aaa in qemuDomainObjBeginJobWithDriver (
    driver=driver@entry=0x7f16c808ae10, obj=obj@entry=0x7f16ac0038e0, 
    job=job@entry=QEMU_JOB_DESTROY) at qemu/qemu_domain.c:906
#2  0x00007f16d1822afc in qemuDomainDestroyFlags (dom=<optimized out>, 
    flags=<optimized out>) at qemu/qemu_driver.c:1988
#3  0x0000003f8cce8531 in virDomainDestroyFlags (
    domain=domain@entry=0x7f16c4000be0, flags=1) at libvirt.c:2253
#4  0x00000000004133ad in remoteDispatchDomainDestroyFlags (
    args=0x7f16c40008c0, rerr=0x7f16d5628c70, client=<optimized out>, 
    server=<optimized out>, msg=<optimized out>) at remote_dispatch.h:1137
#5  remoteDispatchDomainDestroyFlagsHelper (server=<optimized out>, 
    client=<optimized out>, msg=<optimized out>, rerr=0x7f16d5628c70, 
    args=0x7f16c40008c0, ret=<optimized out>) at remote_dispatch.h:1115
#6  0x0000003f8cd45686 in virNetServerProgramDispatchCall (msg=0x16ae860, 
    client=0x16ae200, server=0x1684bf0, prog=0x16a9f00)
    at rpc/virnetserverprogram.c:424
#7  virNetServerProgramDispatch (prog=0x16a9f00, 
    server=server@entry=0x1684bf0, client=0x16ae200, msg=0x16ae860)
    at rpc/virnetserverprogram.c:297
#8  0x0000003f8cd41691 in virNetServerProcessMsg (msg=<optimized out>, 
    prog=<optimized out>, client=<optimized out>, srv=0x1684bf0)
    at rpc/virnetserver.c:170
#9  virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x1684bf0)
    at rpc/virnetserver.c:191
#10 0x0000003f8cc6ddde in virThreadPoolWorker (opaque=opaque@entry=0x1676090)
    at util/threadpool.c:144
#11 0x0000003f8cc6d869 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#12 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#13 0x00000030d0af196d in clone () from /lib64/libc.so.6

(gdb) print *priv
Cannot access memory at address 0x0

Version-Release number of selected component (if applicable):

libvirt-0.10.0-1.fc19.x86_64

How reproducible:

?  Happened once.

Steps to Reproduce:
1. Unknown.
Comment 1 Richard W.M. Jones 2012-09-12 09:50:48 EDT
Created attachment 612123 [details]
core file (xz compressed)
Comment 2 Richard W.M. Jones 2012-09-12 09:51:54 EDT
Sorry, I omitted the first frame, which is:

#0  qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f16c808ae10, 
    driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f16ac0038e0, 
    job=job@entry=QEMU_JOB_DESTROY, 
    asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771
771	    priv->jobs_queued++;
Comment 3 Richard W.M. Jones 2012-09-12 10:50:31 EDT
Still occurs in libvirt-0.10.1-2.fc18.x86_64.  The
stack trace is essentially identical to above.
Comment 4 Richard W.M. Jones 2012-09-12 10:55:33 EDT
For 0.10.1-2:

Core was generated by `/usr/sbin/libvirtd --timeout=30'.
Program terminated with signal 11, Segmentation fault.
#0  qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f11d0097d80, 
    driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f11c4003700, 
    job=job@entry=QEMU_JOB_DESTROY, 
    asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771
771	    priv->jobs_queued++;
(gdb) bt
#0  qemuDomainObjBeginJobInternal (driver=driver@entry=0x7f11d0097d80, 
    driver_locked=driver_locked@entry=true, obj=obj@entry=0x7f11c4003700, 
    job=job@entry=QEMU_JOB_DESTROY, 
    asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_domain.c:771
#1  0x00007f11d91a6a9a in qemuDomainObjBeginJobWithDriver (
    driver=driver@entry=0x7f11d0097d80, obj=obj@entry=0x7f11c4003700, 
    job=job@entry=QEMU_JOB_DESTROY) at qemu/qemu_domain.c:906
#2  0x00007f11d91f2abc in qemuDomainDestroyFlags (dom=<optimized out>, 
    flags=<optimized out>) at qemu/qemu_driver.c:1960
#3  0x00007f11e1c3e591 in virDomainDestroyFlags (
    domain=domain@entry=0x7f11b8000910, flags=1) at libvirt.c:2253
#4  0x00000000004133ad in remoteDispatchDomainDestroyFlags (
    args=0x7f11b80008c0, rerr=0x7f11dd710c70, client=<optimized out>, 
    server=<optimized out>, msg=<optimized out>) at remote_dispatch.h:1137
#5  remoteDispatchDomainDestroyFlagsHelper (server=<optimized out>, 
    client=<optimized out>, msg=<optimized out>, rerr=0x7f11dd710c70, 
    args=0x7f11b80008c0, ret=<optimized out>) at remote_dispatch.h:1115
#6  0x00007f11e1c9b6e6 in virNetServerProgramDispatchCall (msg=0x2667af0, 
    client=0x2667490, server=0x263edb0, prog=0x2663f40)
    at rpc/virnetserverprogram.c:424
#7  virNetServerProgramDispatch (prog=0x2663f40, 
    server=server@entry=0x263edb0, client=0x2667490, msg=0x2667af0)
    at rpc/virnetserverprogram.c:297
#8  0x00007f11e1c976f1 in virNetServerProcessMsg (msg=<optimized out>, 
    prog=<optimized out>, client=<optimized out>, srv=0x263edb0)
    at rpc/virnetserver.c:170
#9  virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x263edb0)
    at rpc/virnetserver.c:191
#10 0x00007f11e1bc3e0e in virThreadPoolWorker (opaque=opaque@entry=0x262fd80)
    at util/threadpool.c:144
#11 0x00007f11e1bc3899 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:161
#12 0x00000030d0e07d15 in start_thread () from /lib64/libpthread.so.0
#13 0x00000030d0af196d in clone () from /lib64/libc.so.6

(gdb) print *obj
$1 = {
  object = {
    magic = 3288365072, 
    refs = 32529, 
    klass = 0x7f11c4000078
  }, 
  lock = {
    lock = {
      __data = {
        __lock = 0, 
        __count = 0, 
        __owner = 0, 
        __nusers = 0, 
        __kind = 0, 
        __spins = 0, 
        __list = {
          __prev = 0x0, 
          __next = 0x0
        }
      }, 
      __size = '\000' <repeats 39 times>, 
      __align = 0
    }
  }, 
  pid = 0, 
  state = {
    state = 0, 
    reason = 0
  }, 
  autostart = 0, 
  persistent = 0, 
  updated = 0, 
  def = 0x0, 
  newDef = 0x0, 
  snapshots = 0x0, 
  current_snapshot = 0x0, 
  hasManagedSave = false, 
  privateData = 0x0, 
  privateDataFreeFunc = 0x0, 
  taint = 0
}

(gdb) print/x *obj->object.klass
$4 = {
  magic = 0xc400bd50, 
  name = 0x7f11c40041a0, 
  objectSize = 0x7f11c40036f0, 
  dispose = 0x7f11c40041a0
}
Comment 5 Richard W.M. Jones 2012-09-12 10:57:17 EDT
Created attachment 612139 [details]
core file (xz compressed) for libvirt 0.10.1-2.fc18.x86_64
Comment 6 Daniel Berrange 2012-09-12 11:00:41 EDT
Hmm, the reference count looks well bogus - I struggle to see any reason why its soo large:

(gdb) print *obj
$1 = {
  object = {
    magic = 3288365072, 
    refs = 32529, 
    klass = 0x7f11c4000078
  }, 

The magic value looks pretty bogus too. Magic values start being allocated at 0xCAFE0000

All round memory bogosity.
Comment 7 Richard W.M. Jones 2012-09-12 11:04:14 EDT
Possibly relevant to this is that qemu-kvm segfaults on shutdown,
and libvirtd segfaults ~ 2 seconds later.
Comment 8 Richard W.M. Jones 2012-09-12 11:04:50 EDT
(In reply to comment #7)
> Possibly relevant to this is that qemu-kvm segfaults on shutdown,
> and libvirtd segfaults ~ 2 seconds later.

The qemu segfault being bug 853408.
Comment 9 Daniel Berrange 2012-09-12 11:49:16 EDT
I tried to reproduce it simply using virsh & the same libvirt & randomly killing  QEMU with SIG_SEGV, but failed. Could you see if you can reproduce it while running libvirtd under valgrind.
Comment 10 Richard W.M. Jones 2012-09-17 16:57:50 EDT
I tried running the test suite while libvirtd was running
under valgrind, but libvirtd appeared to deadlock.  Quite
possibly a problem with valgrind though, so this is inconclusive.
Comment 11 Richard W.M. Jones 2012-09-17 17:11:36 EDT
I should add that libvirtd run in exactly the same way,
but sans valgrind, runs fine (at the moment -- but I
did see this bug earlier today).  So I guess valgrind is
causing the deadlock.

Does libvirtd normally run OK under valgrind?

Ditto if you run it as non-root with
./run valgrind --logfile=/tmp/log daemon/libvirtd --timeout=30  ?
Comment 12 Daniel Berrange 2012-09-18 04:52:02 EDT
> Does libvirtd normally run OK under valgrind?

Yes, it should run normally - the only known issue is that you can't run LXC guests.

Does it deadlock immediately, or only when running specific APIs ?
Comment 13 Cole Robinson 2012-10-27 14:19:35 EDT
I'm assuming this was resolved by:

commit 25f582e36a1c066b6c82303b5e4f18eec337a25b
Author: Daniel P. Berrange <berrange@redhat.com>
Date:   Wed Sep 26 15:54:58 2012 +0100

    Fix (rare) deadlock in QEMU monitor callbacks
    
    Some users report (very rarely) seeing a deadlock in the QEMU
    monitor callbacks

But move out of POST if I'm wrong
Comment 14 Cole Robinson 2012-10-27 16:21:54 EDT

*** This bug has been marked as a duplicate of bug 859009 ***