Created attachment 434698 [details]
backtrace - libvirtd and vdsm
Description of problem:
during usual operation of vm life cycle (nothing in particular) I have noticed that host went down (vdsm service), and when I tries to restart vdsm service, i'v noticed that libvirt process died, then I looked for a core dump, and it seems like libvirt died before vdsm.
attach is the back-trace extracted by gdb.
I don't have particular repro steps, but i will try to reproduce.
Two problems here
- The backtrace is only of VDSM, no backtrace for libvirtd, so we've no idea why libvirtd crashed.
- Some debuginfo RPMs appear to be missing - it isn't resolving symbols in libvirt python. eg see the '??' here:
#3 0x0000003707275736 in malloc_printerr (action=3, str=0x3707343ae8 "munmap_chunk(): invalid pointer", ptr=<value optimized out>) at malloc.c:6283
buf = "0000000002063894"
cp = <value optimized out>
#4 0x00007f1990636433 in ?? () from /usr/lib64/python2.6/site-packages/libvirtmod.so
No symbol table info available.
#5 0x0000003709edeb01 in call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3750
Can you make sure libvirt-debuginfo is installed when generating the backtrace of VDSM. Also can you provide a backtrace for libvirtd
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.
** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
reproduced once again:
1) running with 3 hosts (several guests on them)
2) access to storage server via ssh and add iptable rule to block communication
to host that runs SPM (only - storage pool manager, vdsm term for logical role
that owns by one of the hosts, that manages all the actions regarding the
storage to prevent data corruption).
3) vdsm process and libvirt dies 40 seconds after.
i will attach gdb to libvirt process and provide more info
> 2) access to storage server via ssh and add iptable rule to block communication
> to host that runs SPM (only - storage pool manager, vdsm term for logical role
> that owns by one of the hosts, that manages all the actions regarding the
> storage to prevent data corruption).
So IIUC, you are fencing block I/O from the guests. This should be generating blk IO error notifications from qemu to libvirt to vdsm.
Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.
If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.
Haim, have you seen this crash again recently?
no. it's hard to reproduce, and libvirt doesn't dump cores by default, so it's also hard to catch. in case i'll hit again i will reopen