Description of problem: A running vm is not moved to "Unknown" when host becomes non-responsive. Trying to open console of the vm fails with error: "Error while executing action SetVmTicket: Network error during communication with the Host." Maybe this is a new feature I am not aware about, but as far as I know, all VMs should move into "Unknown" state, once the host, the VMs are running on, becomes non-responsive and cannot be fenced. I tried the reproducer twice and the problem persists. Version-Release number of selected component (if applicable): rhevm-3.2.0-11.30.el6ev vdsm-4.10.2-22.0.el6ev (on RHEL host) How reproducible: 100% Steps to Reproduce: 1. Start a VM on host A. Set it as HA (not that important for the test, but that's what I did). 2. ssh to the host A and disconnect network: either with # ifdown ethX or # service network stop Actual results: Host becomes non-responsive. VM is still reported as up and running. Expected results: VM should be reported as unknown. Additional info: After performing "confirm host was rebooted" on the non-responsive host, the vm was automatically started on another host in the cluster.
Created attachment 760989 [details] Engine.log
2013-06-13 17:01:05,013 INFO [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-54) [da6e479] Server failed to respond, vds_id = 32571578-460a-11e2-a75e-00163e758d0e, vds_name = rhevh-4, vm_count = 1, spm_status = None, non-responsive_timeout (seconds) = 60, error = java.net.NoRouteToHostException: No route to host 2013-06-13 17:01:05,023 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) [da6e479] ResourceManager::vdsNotResponding entered for Host 32571578-460a-11e2-a75e-00163e758d0e, rhevh-4.gsslab.rdu2.redhat.com 2013-06-13 17:01:05,039 WARN [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [da6e479] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCE_DISABLED -------- So VdsNotRespondingTreatmentCommand.CanDoAction is in FenceVdsBaseCommand.java. And from what I see ~~~ protected boolean canDoAction() { boolean retValue = false; .... if (getVds().getpm_enabled() && IsPowerManagementLegal(getVds().getStaticData(), getVdsGroup().getcompatibility_version().toString())) { ... // we do not fall in this category, since we don't have FENCING enabled // so we go directly to the last else with retValue set to false but we never get into HandleError method that is actually supposed to move VMs to unknown: if (!retValue) { HandleError(); } else { addCanDoActionMessage(VdcBllMessages.VDS_FENCING_DISABLED); } getReturnValue().setSucceeded(retValue); return retValue; ~~~
*** This bug has been marked as a duplicate of bug 921521 ***