Red Hat Bugzilla – Bug 973213
RHEV-M shows VM state UP even after host is down
Last modified: 2015-09-22 09:09 EDT
Description of problem: In a RHEV 3.1 environment, if a host on wihch VM is running crashes, in that case virtual machine state in RHEV-M is still shown as "UP". Host failure is detected by RHEV-M and host is moved to "non-responsive" state but VM state is still shown as "UP". If RHEV-M is aware of the fact that host is not-accessible then the virtual machine state should be changed accordingly, because VM is no longer running.
Version-Release number of selected component (if applicable): RHEV-M 3.1
How reproducible: Always
Steps to Reproduce:
1. Pull power cable of host on which VM is running.
Actual results: Host goes into non-operational state. But VM is still in "UP" state even though virtual machine is not running. In such case, no other host can start VM as RHEV-M rejects VM start operation.
Expected results: RHEV-M should show VM state as down and allows other host to start VM. Additionally, RHEV-M should allow a way to forcefully clear VM state and start on any other host. Configure power management does not solve problems in all the situation. In case power management device is also down along with host, VM will remain in down state and in worst case unrecoverable state.
We opened case for this but it got closed case #00823899. Log and details are attached in case history
This bugzilla is filled for getting this issue fixed.
RHEV-M cannot determine in this case if the host is really down, or if it only lost connectivity to the host.
rhev-m needs to know the host is "really down", which can be done automatically by rhev-m fencing the host, or by the admin manually confirming to the engine that the host is down, and rhev-m can safely release the resources associated with it.
does this (manual fencing by admin/rest api) cover your use case?
 a future item is planned around monitoring via another host if the host is maintaining its sanlock lease, allowing to determine if its down or not.
feel free to reopen once there are answers