Bug 676770

Summary: libvirt daemon killed with segmentation fault about 30mins after starting libvirtd
Product: [Community] Virtualization Tools Reporter: Steve Yong <akayong>
Component: libvirtAssignee: Daniel Veillard <veillard>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: crobinso, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-20 16:25:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
core files whern libvirtd dead. none

Description Steve Yong 2011-02-11 04:52:18 UTC
Created attachment 478179 [details]
core files whern libvirtd dead.

Description of problem:
libvirtd daemon killed with segmentation fault about 30mins after starting libvirtd.

Version-Release number of selected component (if applicable):
OS : Centos 5.3
Kernel : 2.6.27.29-0.1.1
Arch : x86_64
xen : 3.4.2

daemon : libvirtd 0.8.7

library:
libdevmapper 1.02
libhal 1.0.0
libdbus-1 3.4.0
libaudit 0.0.0
libnuma 1
libgnutls 13.0.6
libcrypt 11.2.3
libsasl2 2.0.22
libxenstore 3.0.0
libxml2 2.7.6
libavahi-common 3.4.3
libavahi-client 3.2.1
libpthread 2.5
libc 2.5
libselinux 1
libsepol 1
libcap 1.10
ld 2.5
libz 1.2.3
libgpg-error 0.3.0
libnsl 2.5
libdl 2.5
libresolv 2.5
libcrypt 2.5
libm 2.5

How reproducible:
Actually, I have no idea.
however, I start libvirtd (service libvirtd start)
after about 30mins, it dead with segmentation fault.

Additional info: 
I checked core dump file.
And I'll show u the result of the core.

(gdb) where
#0  0x0000000000416b9b in virNodeDeviceDefFree ()
#1  0x0000000000416e3a in virNodeDeviceDefFree ()
#2  0x000000000041aa24 in virNodeDeviceDefFree ()
#3  0x00007f43dbf0873d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f43dbc7ef6d in clone () from /lib64/libc.so.6
(gdb) thread apply all bt

Thread 7 (Thread 11811):
#0  0x00007f43dbf0cee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f43ddc1b4f6 in virCondWait () from /usr/lib64/libvirt.so.0
#2  0x000000000041c9bd in virNodeDeviceDefFree ()
#3  0x00007f43dbf0873d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f43dbc7ef6d in clone () from /lib64/libc.so.6

Thread 6 (Thread 3648):
#0  0x00007f43dbf0cee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f43ddc1b4f6 in virCondWait () from /usr/lib64/libvirt.so.0
#2  0x000000000041c9bd in virNodeDeviceDefFree ()
#3  0x00007f43dbf0873d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f43dbc7ef6d in clone () from /lib64/libc.so.6

Thread 5 (Thread 2931):
#0  0x00007f43dbf09b35 in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000041dcc2 in virNodeDeviceDefFree ()
#2  0x00007f43dbbc8994 in __libc_start_main () from /lib64/libc.so.6
#3  0x00000000004164e9 in virNodeDeviceDefFree ()
#4  0x00007fff8e5fe5c8 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 4 (Thread 3736):
#0  0x00007f43dbf0cee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f43ddc1b4f6 in virCondWait () from /usr/lib64/libvirt.so.0
#2  0x000000000041c9bd in virNodeDeviceDefFree ()
#3  0x00007f43dbf0873d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f43dbc7ef6d in clone () from /lib64/libc.so.6

Thread 3 (Thread 11801):
#0  0x00007f43dbf0cee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f43ddc1b4f6 in virCondWait () from /usr/lib64/libvirt.so.0
#2  0x000000000041c9bd in virNodeDeviceDefFree ()
#3  0x00007f43dbf0873d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f43dbc7ef6d in clone () from /lib64/libc.so.6

Thread 2 (Thread 11796):
#0  0x00007f43dbf0cee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f43ddc1b4f6 in virCondWait () from /usr/lib64/libvirt.so.0
#2  0x000000000041c9bd in virNodeDeviceDefFree ()
#3  0x00007f43dbf0873d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f43dbc7ef6d in clone () from /lib64/libc.so.6

Thread 1 (Thread 2932):
#0  0x0000000000416b9b in virNodeDeviceDefFree ()
#1  0x0000000000416e3a in virNodeDeviceDefFree ()
#2  0x000000000041aa24 in virNodeDeviceDefFree ()
#3  0x00007f43dbf0873d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f43dbc7ef6d in clone () from /lib64/libc.so.6
(gdb)



and I attached core files.

Comment 1 Steve Yong 2011-02-18 07:19:08 UTC
I debugged core file with gdb
and follows are the result.




(gdb) frame 0
#0  0x000000000041854b in virEventCleanupHandles () at event.c:528
528     in event.c
(gdb) p i
$1 = 0
(gdb) p eventLoop
$2 = {lock = {lock = {__data = {__lock = 1, __count = 0, __owner = 19517, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
      __size = "\001\000\000\000\000\000\000\000=L\000\000\001", '\000' <repeats 26 times>, __align = 1}}, running = 1, leader = {thread = 1096337728}, wakeupfd = {3, 5},
  handlesCount = 9, handlesAlloc = 0, handles = 0x0, timeoutsCount = 2, timeoutsAlloc = 10, timeouts = 0x7059c0}
(gdb) p eventLoop.handles
$3 = (struct virEventHandle *) 0x0
(gdb) p eventLoop.handlesCount
$4 = 9
(gdb) p eventLoop.handles[0]
Cannot access memory at address 0x0
(gdb) p eventLoop.handles
$5 = (struct virEventHandle *) 0x0
(gdb) p *eventLoop.handles
Cannot access memory at address 0x0
(gdb) p eventLoop.handlesAlloc
$6 = 0
(gdb)



somethings looks like abnormal.
at event.c, in virEventCleanupHandles function.
==================================================================
static int virEventCleanupHandles(void) {
    int i;
    DEBUG("Cleanup %zu", eventLoop.handlesCount);

    /* Remove deleted entries, shuffling down remaining
     * entries as needed to form contiguous series
     */
    for (i = 0 ; i < eventLoop.handlesCount ; ) {
        if (!eventLoop.handles[i].deleted) {
            i++;
            continue;
        }

        if (eventLoop.handles[i].ff)
            (eventLoop.handles[i].ff)(eventLoop.handles[i].opaque);

        if ((i+1) < eventLoop.handlesCount) {
            memmove(eventLoop.handles+i,
                    eventLoop.handles+i+1,
                    sizeof(struct virEventHandle)*(eventLoop.handlesCount-(i+1)));
        }
        eventLoop.handlesCount--;
    }

    /* Release some memory if we've got a big chunk free */
    if ((eventLoop.handlesAlloc - EVENT_ALLOC_EXTENT) > eventLoop.handlesCount) {
        EVENT_DEBUG("Releasing %zu out of %zu handles slots used, releasing %d",
                   eventLoop.handlesCount, eventLoop.handlesAlloc, EVENT_ALLOC_EXTENT);
        VIR_SHRINK_N(eventLoop.handles, eventLoop.handlesAlloc,
                     EVENT_ALLOC_EXTENT);
    }
    return 0;
}

====================================================================

at first if statment,
eventLoop.handles has NULL value, but, eventLoop.handlesCount is 9.

and also,
when I checked eventLoop
handlesCount = 9, handlesAlloc = 0, handles = 0x0, timeoutsCount = 2, timeoutsAlloc = 10

how can be handlesCount = 9 with handlesAlloc = 0

Comment 2 Steve Yong 2011-02-20 16:25:26 UTC
this bug is fixed at 8.8
thanks anyway