Bug 876572 - [rhevm] host remains in 'UP' state although vdsm is not functional (getVdsCaps doesn't return) as engine use getVdsStats
[rhevm] host remains in 'UP' state although vdsm is not functional (getVdsCap...
Status: CLOSED DEFERRED
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Ayal Baron
vvyazmin@redhat.com
infra
: StudentProject
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-14 08:48 EST by vvyazmin@redhat.com
Modified: 2016-02-10 14:40 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-18 10:14:07 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
## Logs vdsm, rhevm (1.18 MB, application/x-gzip)
2012-11-14 08:48 EST, vvyazmin@redhat.com
no flags Details

  None (edit)
Description vvyazmin@redhat.com 2012-11-14 08:48:19 EST
Created attachment 644870 [details]
## Logs vdsm, rhevm

Description of problem: VDSM reports wrong state to engine, host stay in “UP” state, although Libvirt not responding. During deadlock Livbirt, host stay in “UP” state.

Version-Release number of selected component (if applicable):
RHEVM 3.1 - SI24.1

RHEVM: rhevm-3.1.0-28.el6ev.noarch
VDSM: vdsm-4.9.6-42.0.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.5.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64
SANLOCK: sanlock-2.3-4.el6_3.x86_64

How reproducible:
100%

Steps to Reproduce:
Enter Libvirt to deadlock on HSM server
gdb libvirt process 
  
Actual results:
Libvirt enter in deadlock.
VDSM failed respond to “vdsClient -s 0 getVdsCaps”
Engine send “vdsClient -s 0 getVdsStats” command and get respond, and mark that, no problem found on host.
VDSM report status “UP”

Expected results:
Engine need send  “vdsClient -s 0 getVdsCaps” command, and if respond failed, move a host to “Non Responsive” state. 
And in our case only “vdsClient -s 0 getVdsStats” command send and get respond

Additional info:
Real life scenario:
1. Create iSCSI DC with 2 hosts
2. Create VM with multiple disks on multiple storage domains
3. Run VM on HSM
4. Install OS (RHEL 6.3)
5. Install RHEV Agent (Guest Agent)
6. Create a snapshot
7. Snapshot --> Preview
8. Snapshot --> Commit
9. Snapshot --> Delete snapshot 
10. Power-on VM, OS stuck on boot
11. Failed Power OFF
12.  Failed power-on  a new created VM's in same DC
Comment 2 Barak 2012-11-18 10:14:07 EST
After discussion with Yaniv.K & Miki we reached a decision that
It seems that there is only one scenario that libvirt is going deadlock.
In case more issues come up we'll reopen this BZ.
for now we CLOSE DEFERRED

Note You need to log in before you can comment on or make changes to this bug.