Bug 921521 - VM status is still green when the hypervisor were brought down which causes the Data centre to be non-responsive.
Summary: VM status is still green when the hypervisor were brought down which causes t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-webadmin-portal
Version: 3.1.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: 3.3.0
Assignee: Omer Frenkel
QA Contact: Lukas Svaty
URL:
Whiteboard: virt
: 953546 974297 (view as bug list)
Depends On:
Blocks: 984943
TreeView+ depends on / blocked
 
Reported: 2013-03-14 11:13 UTC by Rahul Hinduja
Modified: 2015-09-22 13:09 UTC (History)
13 users (show)

Fixed In Version: is1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 984943 (view as bug list)
Environment:
Last Closed:
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
virtual machine tab (140.44 KB, image/png)
2013-03-14 11:13 UTC, Rahul Hinduja
no flags Details
Data centre (104.90 KB, image/png)
2013-03-14 11:14 UTC, Rahul Hinduja
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 13706 0 None None None Never

Description Rahul Hinduja 2013-03-14 11:13:56 UTC
Created attachment 709977 [details]
virtual machine tab

Description of problem:
=======================

VM status is still green when the hypervisor were brought down which causes the Data centre to be non-responsive.

Version-Release number of selected component (if applicable):
=============================================================

RHEVH:
======

Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130306.2.el6_4)


Setup:
======

    1. rhs-client43.lab.eng.blr.redhat.com (RHEV-H PosixFS datacenter)
    2. rhs-client17.lab.eng.blr.redhat.com
    3. rhs-client18.lab.eng.blr.redhat.com

Setup Involves:

1. client43: Installed rhev-h

2. Created DataCentre and added hypervisor

3. Formatted client17 and 18 with rhel 6.4

4. Created 2 RHS VM's(10.70.37.147 and 10.70.37.219) one on client17 and one on client18

5. Create 1*2 volume from 10.70.37.147 and 10.70.37.219

6. Added the above volume in DC

7. Rhevm: mario.lab.eng.blr.redhat.com


Steps Carried:
===============
1. Powered off the hypervisor
2. Data centre went to non-responsive state. Storage domain was in unknown state.
3. VM's were not accessible. But the VM status were in green light on the RHEVM virtual machines tab.
  
Actual results:
===============

VM status is Up though VM are not accessible

Expected results:
=================

VM status should go to paused state

Comment 1 Rahul Hinduja 2013-03-14 11:14:47 UTC
Created attachment 709978 [details]
Data centre

Comment 3 Itamar Heim 2013-03-16 20:08:11 UTC
the VMs should go to pause state only on EIO issue. are you sure they tried to perform any write activity?

Comment 4 Haim 2013-03-17 18:54:26 UTC
(In reply to comment #3)
> the VMs should go to pause state only on EIO issue. are you sure they tried
> to perform any write activity?

they should go to unknown state, adding additional question:

1) could be related to UI refresh time? we had similar issue with data-center status.
2) did eventually they turned to unknown?

Comment 5 Ayal Baron 2013-03-17 19:33:24 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > the VMs should go to pause state only on EIO issue. are you sure they tried
> > to perform any write activity?
> 
> they should go to unknown state, adding additional question:
> 
> 1) could be related to UI refresh time? we had similar issue with
> data-center status.
> 2) did eventually they turned to unknown?

The host was shutdown when it had running VMs and VMs statuses did not update properly, this has nothing to do with storage.
The VMs are not and should not be in PAUSED as they are no longer running at all (the processes are dead since the host was shut down).
They should indeed be in UNKNOWN state as Haim stated above since engine cannot know whether it lost communications to the host or the host shutdown.

Comment 6 Rahul Hinduja 2013-03-18 09:56:35 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > the VMs should go to pause state only on EIO issue. are you sure they tried
> > to perform any write activity?
> 
> they should go to unknown state, adding additional question:
> 
> 1) could be related to UI refresh time? we had similar issue with
> data-center status.
> 2) did eventually they turned to unknown?

They were not turned to unknown state. Also tried refreshing the UI, as well as  signed out and logged in again. VM's were always green (up) in status

Comment 7 Libor Spevak 2013-04-08 13:46:45 UTC
When I stop the VDSM, the communication is not available...

2013-04-08 10:29:20,903 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-3) [6c279b77] Failed to refresh VDS , vds = 6c8ab668-3cab-4af4-a111-4b02dc694393 : XXXXXX.redhat.com, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
...

2 possible log items detected after a while:
- VdsNotRespondingTreatmentCommand (handles migrating VMs in current code only and sets them to Unknown status)

2013-04-08 10:30:18,783 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178
2013-04-08 10:30:18,976 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-10-thread-49) [3de1b0d2] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCE_DISABLED

- VdsManager (different test)

2013-04-08 11:07:46,843 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-10) Server failed to respond, vds_id = 6c8ab668-3cab-4af4-a111-4b02dc694393, vds_name = XXXXXX.redhat.com, error = java.net.ConnectException: Connection refused
2013-04-08 11:07:46,892 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178

Proposed solution:
- when the host is set to NON-RESPONSIVE after a timeout, all running VMs on the hosts are set to UNKNOWN status (class VdsManager) and a new message about VM status transition to Unknown is inserted into audit log
- when the VDSM is available again, the host is Up after a short while and a new message is logged into audit log - 'VM {} status is restored to {}' (class: VdsUpdateRunTimeInfo)

Comment 8 Omer Frenkel 2013-04-22 12:11:29 UTC
The problem is that VdsNotRespondingTreatment command fails for hosts with disabled PM, and doesn't call the HandleError method that handle this situation.
since this is a regression and severe in my opinion, raising priority.

Comment 10 Omer Frenkel 2013-04-22 12:21:03 UTC
*** Bug 953546 has been marked as a duplicate of this bug. ***

Comment 11 Libor Spevak 2013-04-24 15:12:22 UTC
Merged u/s: a32eb72e62d076393e82fe1e7c908cfd98271ba0

Comment 13 Omer Frenkel 2013-06-16 10:58:43 UTC
*** Bug 974297 has been marked as a duplicate of this bug. ***

Comment 14 Lukas Svaty 2013-06-27 09:03:13 UTC
tested on rhevm3.3 is2

VM moved to status Unknown after hypervisor was brought down

Comment 17 Itamar Heim 2014-01-21 22:26:57 UTC
Closing - RHEV 3.3 Released

Comment 18 Itamar Heim 2014-01-21 22:27:33 UTC
Closing - RHEV 3.3 Released

Comment 19 Itamar Heim 2014-01-21 22:30:12 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.