Created attachment 709977 [details] virtual machine tab Description of problem: ======================= VM status is still green when the hypervisor were brought down which causes the Data centre to be non-responsive. Version-Release number of selected component (if applicable): ============================================================= RHEVH: ====== Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130306.2.el6_4) Setup: ====== 1. rhs-client43.lab.eng.blr.redhat.com (RHEV-H PosixFS datacenter) 2. rhs-client17.lab.eng.blr.redhat.com 3. rhs-client18.lab.eng.blr.redhat.com Setup Involves: 1. client43: Installed rhev-h 2. Created DataCentre and added hypervisor 3. Formatted client17 and 18 with rhel 6.4 4. Created 2 RHS VM's(10.70.37.147 and 10.70.37.219) one on client17 and one on client18 5. Create 1*2 volume from 10.70.37.147 and 10.70.37.219 6. Added the above volume in DC 7. Rhevm: mario.lab.eng.blr.redhat.com Steps Carried: =============== 1. Powered off the hypervisor 2. Data centre went to non-responsive state. Storage domain was in unknown state. 3. VM's were not accessible. But the VM status were in green light on the RHEVM virtual machines tab. Actual results: =============== VM status is Up though VM are not accessible Expected results: ================= VM status should go to paused state
Created attachment 709978 [details] Data centre
the VMs should go to pause state only on EIO issue. are you sure they tried to perform any write activity?
(In reply to comment #3) > the VMs should go to pause state only on EIO issue. are you sure they tried > to perform any write activity? they should go to unknown state, adding additional question: 1) could be related to UI refresh time? we had similar issue with data-center status. 2) did eventually they turned to unknown?
(In reply to comment #4) > (In reply to comment #3) > > the VMs should go to pause state only on EIO issue. are you sure they tried > > to perform any write activity? > > they should go to unknown state, adding additional question: > > 1) could be related to UI refresh time? we had similar issue with > data-center status. > 2) did eventually they turned to unknown? The host was shutdown when it had running VMs and VMs statuses did not update properly, this has nothing to do with storage. The VMs are not and should not be in PAUSED as they are no longer running at all (the processes are dead since the host was shut down). They should indeed be in UNKNOWN state as Haim stated above since engine cannot know whether it lost communications to the host or the host shutdown.
(In reply to comment #4) > (In reply to comment #3) > > the VMs should go to pause state only on EIO issue. are you sure they tried > > to perform any write activity? > > they should go to unknown state, adding additional question: > > 1) could be related to UI refresh time? we had similar issue with > data-center status. > 2) did eventually they turned to unknown? They were not turned to unknown state. Also tried refreshing the UI, as well as signed out and logged in again. VM's were always green (up) in status
When I stop the VDSM, the communication is not available... 2013-04-08 10:29:20,903 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-3) [6c279b77] Failed to refresh VDS , vds = 6c8ab668-3cab-4af4-a111-4b02dc694393 : XXXXXX.redhat.com, VDS Network Error, continuing. java.net.ConnectException: Connection refused ... 2 possible log items detected after a while: - VdsNotRespondingTreatmentCommand (handles migrating VMs in current code only and sets them to Unknown status) 2013-04-08 10:30:18,783 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178 2013-04-08 10:30:18,976 WARN [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-10-thread-49) [3de1b0d2] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCE_DISABLED - VdsManager (different test) 2013-04-08 11:07:46,843 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-10) Server failed to respond, vds_id = 6c8ab668-3cab-4af4-a111-4b02dc694393, vds_name = XXXXXX.redhat.com, error = java.net.ConnectException: Connection refused 2013-04-08 11:07:46,892 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178 Proposed solution: - when the host is set to NON-RESPONSIVE after a timeout, all running VMs on the hosts are set to UNKNOWN status (class VdsManager) and a new message about VM status transition to Unknown is inserted into audit log - when the VDSM is available again, the host is Up after a short while and a new message is logged into audit log - 'VM {} status is restored to {}' (class: VdsUpdateRunTimeInfo)
The problem is that VdsNotRespondingTreatment command fails for hosts with disabled PM, and doesn't call the HandleError method that handle this situation. since this is a regression and severe in my opinion, raising priority.
*** Bug 953546 has been marked as a duplicate of this bug. ***
Merged u/s: a32eb72e62d076393e82fe1e7c908cfd98271ba0
*** Bug 974297 has been marked as a duplicate of this bug. ***
tested on rhevm3.3 is2 VM moved to status Unknown after hypervisor was brought down
Closing - RHEV 3.3 Released