Bug 974297 - vm is not moved to "Unknown" when host becomes non-responsive and has no pm configured
Summary: vm is not moved to "Unknown" when host becomes non-responsive and has no pm c...
Keywords:
Status: CLOSED DUPLICATE of bug 921521
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Omer Frenkel
QA Contact:
URL:
Whiteboard: virt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-13 21:54 UTC by Marina Kalinin
Modified: 2015-10-27 23:57 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-16 10:58:43 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Engine.log (6.76 MB, text/x-log)
2013-06-13 21:59 UTC, Marina Kalinin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 399233 0 None None None Never

Description Marina Kalinin 2013-06-13 21:54:13 UTC
Description of problem:
A running vm is not moved to "Unknown" when host becomes non-responsive.
Trying to open console of the vm fails with error: "Error while executing action SetVmTicket: Network error during communication with the Host."

Maybe this is a new feature I am not aware about, but as far as I know, all VMs should move into "Unknown" state, once the host, the VMs are running on, becomes non-responsive and cannot be fenced.

I tried the reproducer twice and the problem persists.

Version-Release number of selected component (if applicable):
rhevm-3.2.0-11.30.el6ev
vdsm-4.10.2-22.0.el6ev (on RHEL host)

How reproducible:
100%

Steps to Reproduce:
1. Start a VM on host A. Set it as HA 
(not that important for the test, but that's what I did).

2. ssh to the host A and disconnect network: either with 
# ifdown ethX
or
# service network stop


Actual results:
Host becomes non-responsive. VM is still reported as up and running.

Expected results:
VM should be reported as unknown.

Additional info:
After performing "confirm host was rebooted" on the non-responsive host, the vm was automatically started on another host in the cluster.

Comment 2 Marina Kalinin 2013-06-13 21:59:39 UTC
Created attachment 760989 [details]
Engine.log

Comment 4 Marina Kalinin 2013-06-13 22:44:58 UTC
2013-06-13 17:01:05,013 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-54) [da6e479] Server failed to respond, vds_id = 32571578-460a-11e2-a75e-00163e758d0e, vds_name = rhevh-4, vm_count = 1, spm_status = None, non-responsive_timeout (seconds) = 60, error = java.net.NoRouteToHostException: No route to host
2013-06-13 17:01:05,023 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) [da6e479] ResourceManager::vdsNotResponding entered for Host 32571578-460a-11e2-a75e-00163e758d0e, rhevh-4.gsslab.rdu2.redhat.com
2013-06-13 17:01:05,039 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [da6e479] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCE_DISABLED
--------
So VdsNotRespondingTreatmentCommand.CanDoAction is in FenceVdsBaseCommand.java.
And from what I see 
~~~
    protected boolean canDoAction() {
        boolean retValue = false;
         ....
        if (getVds().getpm_enabled()
                && IsPowerManagementLegal(getVds().getStaticData(), getVdsGroup().getcompatibility_version().toString())) {
        ... // we do not fall in this category, since we don't have FENCING enabled
            // so we go directly to the last else with retValue set to false but we never get into HandleError method that is actually supposed to move VMs to unknown:

            if (!retValue) {
                HandleError();
            }
        else {
            addCanDoActionMessage(VdcBllMessages.VDS_FENCING_DISABLED);
        }
        getReturnValue().setSucceeded(retValue);
        return retValue;
~~~

Comment 6 Omer Frenkel 2013-06-16 10:58:43 UTC

*** This bug has been marked as a duplicate of bug 921521 ***


Note You need to log in before you can comment on or make changes to this bug.