Red Hat Bugzilla – Bug 876572
[rhevm] host remains in 'UP' state although vdsm is not functional (getVdsCaps doesn't return) as engine use getVdsStats
Last modified: 2016-02-10 14:40:01 EST
Created attachment 644870 [details]
## Logs vdsm, rhevm
Description of problem: VDSM reports wrong state to engine, host stay in “UP” state, although Libvirt not responding. During deadlock Livbirt, host stay in “UP” state.
Version-Release number of selected component (if applicable):
RHEVM 3.1 - SI24.1
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64
Steps to Reproduce:
Enter Libvirt to deadlock on HSM server
gdb libvirt process
Libvirt enter in deadlock.
VDSM failed respond to “vdsClient -s 0 getVdsCaps”
Engine send “vdsClient -s 0 getVdsStats” command and get respond, and mark that, no problem found on host.
VDSM report status “UP”
Engine need send “vdsClient -s 0 getVdsCaps” command, and if respond failed, move a host to “Non Responsive” state.
And in our case only “vdsClient -s 0 getVdsStats” command send and get respond
Real life scenario:
1. Create iSCSI DC with 2 hosts
2. Create VM with multiple disks on multiple storage domains
3. Run VM on HSM
4. Install OS (RHEL 6.3)
5. Install RHEV Agent (Guest Agent)
6. Create a snapshot
7. Snapshot --> Preview
8. Snapshot --> Commit
9. Snapshot --> Delete snapshot
10. Power-on VM, OS stuck on boot
11. Failed Power OFF
12. Failed power-on a new created VM's in same DC
After discussion with Yaniv.K & Miki we reached a decision that
It seems that there is only one scenario that libvirt is going deadlock.
In case more issues come up we'll reopen this BZ.
for now we CLOSE DEFERRED