Bug 974297 - vm is not moved to "Unknown" when host becomes non-responsive and has no pm configured
vm is not moved to "Unknown" when host becomes non-responsive and has no pm c...
Status: CLOSED DUPLICATE of bug 921521
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.2.0
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: Omer Frenkel
virt
: Regression, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-13 17:54 EDT by Marina
Modified: 2015-10-27 19:57 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-16 06:58:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Engine.log (6.76 MB, text/x-log)
2013-06-13 17:59 EDT, Marina
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 399233 None None None Never

  None (edit)
Description Marina 2013-06-13 17:54:13 EDT
Description of problem:
A running vm is not moved to "Unknown" when host becomes non-responsive.
Trying to open console of the vm fails with error: "Error while executing action SetVmTicket: Network error during communication with the Host."

Maybe this is a new feature I am not aware about, but as far as I know, all VMs should move into "Unknown" state, once the host, the VMs are running on, becomes non-responsive and cannot be fenced.

I tried the reproducer twice and the problem persists.

Version-Release number of selected component (if applicable):
rhevm-3.2.0-11.30.el6ev
vdsm-4.10.2-22.0.el6ev (on RHEL host)

How reproducible:
100%

Steps to Reproduce:
1. Start a VM on host A. Set it as HA 
(not that important for the test, but that's what I did).

2. ssh to the host A and disconnect network: either with 
# ifdown ethX
or
# service network stop


Actual results:
Host becomes non-responsive. VM is still reported as up and running.

Expected results:
VM should be reported as unknown.

Additional info:
After performing "confirm host was rebooted" on the non-responsive host, the vm was automatically started on another host in the cluster.
Comment 2 Marina 2013-06-13 17:59:39 EDT
Created attachment 760989 [details]
Engine.log
Comment 4 Marina 2013-06-13 18:44:58 EDT
2013-06-13 17:01:05,013 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-54) [da6e479] Server failed to respond, vds_id = 32571578-460a-11e2-a75e-00163e758d0e, vds_name = rhevh-4, vm_count = 1, spm_status = None, non-responsive_timeout (seconds) = 60, error = java.net.NoRouteToHostException: No route to host
2013-06-13 17:01:05,023 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) [da6e479] ResourceManager::vdsNotResponding entered for Host 32571578-460a-11e2-a75e-00163e758d0e, rhevh-4.gsslab.rdu2.redhat.com
2013-06-13 17:01:05,039 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [da6e479] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCE_DISABLED
--------
So VdsNotRespondingTreatmentCommand.CanDoAction is in FenceVdsBaseCommand.java.
And from what I see 
~~~
    protected boolean canDoAction() {
        boolean retValue = false;
         ....
        if (getVds().getpm_enabled()
                && IsPowerManagementLegal(getVds().getStaticData(), getVdsGroup().getcompatibility_version().toString())) {
        ... // we do not fall in this category, since we don't have FENCING enabled
            // so we go directly to the last else with retValue set to false but we never get into HandleError method that is actually supposed to move VMs to unknown:

            if (!retValue) {
                HandleError();
            }
        else {
            addCanDoActionMessage(VdcBllMessages.VDS_FENCING_DISABLED);
        }
        getReturnValue().setSucceeded(retValue);
        return retValue;
~~~
Comment 6 Omer Frenkel 2013-06-16 06:58:43 EDT

*** This bug has been marked as a duplicate of bug 921521 ***

Note You need to log in before you can comment on or make changes to this bug.