Bug 876572

Summary: [rhevm] host remains in 'UP' state although vdsm is not functional (getVdsCaps doesn't return) as engine use getVdsStats
Product: Red Hat Enterprise Virtualization Manager Reporter: vvyazmin <vvyazmin>
Component: ovirt-engineAssignee: Ayal Baron <abaron>
Status: CLOSED DEFERRED QA Contact: vvyazmin <vvyazmin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: bazulay, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul
Target Milestone: ---Keywords: StudentProject
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-18 15:14:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
## Logs vdsm, rhevm none

Description vvyazmin@redhat.com 2012-11-14 13:48:19 UTC
Created attachment 644870 [details]
## Logs vdsm, rhevm

Description of problem: VDSM reports wrong state to engine, host stay in “UP” state, although Libvirt not responding. During deadlock Livbirt, host stay in “UP” state.

Version-Release number of selected component (if applicable):
RHEVM 3.1 - SI24.1

RHEVM: rhevm-3.1.0-28.el6ev.noarch
VDSM: vdsm-4.9.6-42.0.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.5.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64
SANLOCK: sanlock-2.3-4.el6_3.x86_64

How reproducible:
100%

Steps to Reproduce:
Enter Libvirt to deadlock on HSM server
gdb libvirt process 
  
Actual results:
Libvirt enter in deadlock.
VDSM failed respond to “vdsClient -s 0 getVdsCaps”
Engine send “vdsClient -s 0 getVdsStats” command and get respond, and mark that, no problem found on host.
VDSM report status “UP”

Expected results:
Engine need send  “vdsClient -s 0 getVdsCaps” command, and if respond failed, move a host to “Non Responsive” state. 
And in our case only “vdsClient -s 0 getVdsStats” command send and get respond

Additional info:
Real life scenario:
1. Create iSCSI DC with 2 hosts
2. Create VM with multiple disks on multiple storage domains
3. Run VM on HSM
4. Install OS (RHEL 6.3)
5. Install RHEV Agent (Guest Agent)
6. Create a snapshot
7. Snapshot --> Preview
8. Snapshot --> Commit
9. Snapshot --> Delete snapshot 
10. Power-on VM, OS stuck on boot
11. Failed Power OFF
12.  Failed power-on  a new created VM's in same DC

Comment 2 Barak 2012-11-18 15:14:07 UTC
After discussion with Yaniv.K & Miki we reached a decision that
It seems that there is only one scenario that libvirt is going deadlock.
In case more issues come up we'll reopen this BZ.
for now we CLOSE DEFERRED