Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 921521

Summary: VM status is still green when the hypervisor were brought down which causes the Data centre to be non-responsive.
Product: Red Hat Enterprise Virtualization Manager Reporter: Rahul Hinduja <rhinduja>
Component: ovirt-engine-webadmin-portalAssignee: Omer Frenkel <ofrenkel>
Status: CLOSED CURRENTRELEASE QA Contact: Lukas Svaty <lsvaty>
Severity: medium Docs Contact:
Priority: high    
Version: 3.1.0CC: acathrow, cpelland, dron, ecohen, iheim, jkt, michal.skrivanek, mkalinin, ofrenkel, Rhev-m-bugs, scohen, sputhenp, yeylon
Target Milestone: ---Keywords: Regression, ZStream
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: is1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 984943 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 984943    
Attachments:
Description Flags
virtual machine tab
none
Data centre none

Description Rahul Hinduja 2013-03-14 11:13:56 UTC
Created attachment 709977 [details]
virtual machine tab

Description of problem:
=======================

VM status is still green when the hypervisor were brought down which causes the Data centre to be non-responsive.

Version-Release number of selected component (if applicable):
=============================================================

RHEVH:
======

Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130306.2.el6_4)


Setup:
======

    1. rhs-client43.lab.eng.blr.redhat.com (RHEV-H PosixFS datacenter)
    2. rhs-client17.lab.eng.blr.redhat.com
    3. rhs-client18.lab.eng.blr.redhat.com

Setup Involves:

1. client43: Installed rhev-h

2. Created DataCentre and added hypervisor

3. Formatted client17 and 18 with rhel 6.4

4. Created 2 RHS VM's(10.70.37.147 and 10.70.37.219) one on client17 and one on client18

5. Create 1*2 volume from 10.70.37.147 and 10.70.37.219

6. Added the above volume in DC

7. Rhevm: mario.lab.eng.blr.redhat.com


Steps Carried:
===============
1. Powered off the hypervisor
2. Data centre went to non-responsive state. Storage domain was in unknown state.
3. VM's were not accessible. But the VM status were in green light on the RHEVM virtual machines tab.
  
Actual results:
===============

VM status is Up though VM are not accessible

Expected results:
=================

VM status should go to paused state

Comment 1 Rahul Hinduja 2013-03-14 11:14:47 UTC
Created attachment 709978 [details]
Data centre

Comment 3 Itamar Heim 2013-03-16 20:08:11 UTC
the VMs should go to pause state only on EIO issue. are you sure they tried to perform any write activity?

Comment 4 Haim 2013-03-17 18:54:26 UTC
(In reply to comment #3)
> the VMs should go to pause state only on EIO issue. are you sure they tried
> to perform any write activity?

they should go to unknown state, adding additional question:

1) could be related to UI refresh time? we had similar issue with data-center status.
2) did eventually they turned to unknown?

Comment 5 Ayal Baron 2013-03-17 19:33:24 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > the VMs should go to pause state only on EIO issue. are you sure they tried
> > to perform any write activity?
> 
> they should go to unknown state, adding additional question:
> 
> 1) could be related to UI refresh time? we had similar issue with
> data-center status.
> 2) did eventually they turned to unknown?

The host was shutdown when it had running VMs and VMs statuses did not update properly, this has nothing to do with storage.
The VMs are not and should not be in PAUSED as they are no longer running at all (the processes are dead since the host was shut down).
They should indeed be in UNKNOWN state as Haim stated above since engine cannot know whether it lost communications to the host or the host shutdown.

Comment 6 Rahul Hinduja 2013-03-18 09:56:35 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > the VMs should go to pause state only on EIO issue. are you sure they tried
> > to perform any write activity?
> 
> they should go to unknown state, adding additional question:
> 
> 1) could be related to UI refresh time? we had similar issue with
> data-center status.
> 2) did eventually they turned to unknown?

They were not turned to unknown state. Also tried refreshing the UI, as well as  signed out and logged in again. VM's were always green (up) in status

Comment 7 Libor Spevak 2013-04-08 13:46:45 UTC
When I stop the VDSM, the communication is not available...

2013-04-08 10:29:20,903 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-3) [6c279b77] Failed to refresh VDS , vds = 6c8ab668-3cab-4af4-a111-4b02dc694393 : XXXXXX.redhat.com, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
...

2 possible log items detected after a while:
- VdsNotRespondingTreatmentCommand (handles migrating VMs in current code only and sets them to Unknown status)

2013-04-08 10:30:18,783 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178
2013-04-08 10:30:18,976 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-10-thread-49) [3de1b0d2] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCE_DISABLED

- VdsManager (different test)

2013-04-08 11:07:46,843 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-10) Server failed to respond, vds_id = 6c8ab668-3cab-4af4-a111-4b02dc694393, vds_name = XXXXXX.redhat.com, error = java.net.ConnectException: Connection refused
2013-04-08 11:07:46,892 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178

Proposed solution:
- when the host is set to NON-RESPONSIVE after a timeout, all running VMs on the hosts are set to UNKNOWN status (class VdsManager) and a new message about VM status transition to Unknown is inserted into audit log
- when the VDSM is available again, the host is Up after a short while and a new message is logged into audit log - 'VM {} status is restored to {}' (class: VdsUpdateRunTimeInfo)

Comment 8 Omer Frenkel 2013-04-22 12:11:29 UTC
The problem is that VdsNotRespondingTreatment command fails for hosts with disabled PM, and doesn't call the HandleError method that handle this situation.
since this is a regression and severe in my opinion, raising priority.

Comment 10 Omer Frenkel 2013-04-22 12:21:03 UTC
*** Bug 953546 has been marked as a duplicate of this bug. ***

Comment 11 Libor Spevak 2013-04-24 15:12:22 UTC
Merged u/s: a32eb72e62d076393e82fe1e7c908cfd98271ba0

Comment 13 Omer Frenkel 2013-06-16 10:58:43 UTC
*** Bug 974297 has been marked as a duplicate of this bug. ***

Comment 14 Lukas Svaty 2013-06-27 09:03:13 UTC
tested on rhevm3.3 is2

VM moved to status Unknown after hypervisor was brought down

Comment 17 Itamar Heim 2014-01-21 22:26:57 UTC
Closing - RHEV 3.3 Released

Comment 18 Itamar Heim 2014-01-21 22:27:33 UTC
Closing - RHEV 3.3 Released

Comment 19 Itamar Heim 2014-01-21 22:30:12 UTC
Closing - RHEV 3.3 Released