Bug 618663 - libvirtd daemon died (caused vdsm to die as well)
Summary: libvirtd daemon died (caused vdsm to die as well)
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt   
(Show other bugs)
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Daniel Veillard
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Keywords: RHELNAK
Depends On:
Blocks: 581275
TreeView+ depends on / blocked
 
Reported: 2010-07-27 13:54 UTC by Haim
Modified: 2014-01-13 00:46 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-18 15:59:32 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backtrace - libvirtd and vdsm (109.47 KB, text/plain)
2010-07-27 13:54 UTC, Haim
no flags Details

Description Haim 2010-07-27 13:54:22 UTC
Created attachment 434698 [details]
backtrace - libvirtd and vdsm

Description of problem:

during usual operation of vm life cycle (nothing in particular) I have noticed that host went down (vdsm service), and when I tries to restart vdsm service, i'v noticed that libvirt process died, then I looked for a core dump, and it seems like libvirt died before vdsm. 

attach is the back-trace extracted by gdb. 

I don't have particular repro steps, but i will try to reproduce. 

libvirt-0.8.1-17.el6.x86_64
vdsm-4.9-10.el6.x86_64
qemu-kvm-0.12.1.2-2.97.el6.x86_64
2.6.32-44.el6.x86_64

Comment 2 Daniel Berrange 2010-07-27 14:04:41 UTC
Two problems here

 - The backtrace is only of VDSM, no backtrace for libvirtd, so we've no idea why libvirtd crashed.

 - Some debuginfo RPMs appear to be missing - it isn't resolving symbols in libvirt python. eg see the '??' here:

#3  0x0000003707275736 in malloc_printerr (action=3, str=0x3707343ae8 "munmap_chunk(): invalid pointer", ptr=<value optimized out>) at malloc.c:6283
        buf = "0000000002063894"
        cp = <value optimized out>
#4  0x00007f1990636433 in ?? () from /usr/lib64/python2.6/site-packages/libvirtmod.so
No symbol table info available.
#5  0x0000003709edeb01 in call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3750


Can you make sure libvirt-debuginfo is installed when generating the backtrace of VDSM. Also can you provide a backtrace for libvirtd

Comment 3 RHEL Product and Program Management 2010-07-27 14:17:56 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 4 Haim 2010-07-27 14:38:41 UTC
reproduced once again:

1) running with 3 hosts (several guests on them)
2) access to storage server via ssh and add iptable rule to block communication 
   to host that runs SPM (only - storage pool manager, vdsm term for logical role 
   that owns by one of the hosts, that manages all the actions regarding the 
   storage to prevent data corruption). 
3) vdsm process and libvirt dies 40 seconds after. 

i will attach gdb to libvirt process and provide more info

Comment 5 Haim 2010-07-27 14:53:18 UTC
reproduced once again:

1) running with 3 hosts (several guests on them)
2) access to storage server via ssh and add iptable rule to block communication 
   to host that runs SPM (only - storage pool manager, vdsm term for logical role 
   that owns by one of the hosts, that manages all the actions regarding the 
   storage to prevent data corruption). 
3) vdsm process and libvirt dies 40 seconds after. 

i will attach gdb to libvirt process and provide more info

Comment 6 Daniel Berrange 2010-07-27 14:57:10 UTC
> 2) access to storage server via ssh and add iptable rule to block communication 
>    to host that runs SPM (only - storage pool manager, vdsm term for logical role
>   that owns by one of the hosts, that manages all the actions regarding the 
>   storage to prevent data corruption). 

So IIUC, you are fencing block I/O from the guests. This should be generating blk IO error notifications from qemu to libvirt to vdsm.

Comment 9 RHEL Product and Program Management 2010-08-18 21:18:51 UTC
Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 10 Dave Allan 2010-11-08 22:04:02 UTC
Haim, have you seen this crash again recently?

Comment 11 Haim 2010-11-18 15:59:32 UTC
no. it's hard to reproduce, and libvirt doesn't dump cores by default, so it's also hard to catch. in case i'll hit again i will reopen


Note You need to log in before you can comment on or make changes to this bug.