Bug 1316692 - [HC] vm high availability is not working against a glusterfs storage domain
Summary: [HC] vm high availability is not working against a glusterfs storage domain
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 3.6.3.3
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ovirt-4.1.0-alpha
: 4.1.1.2
Assignee: Sahina Bose
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On: 1205641 1316358 1361115
Blocks: Gluster-HC-2 1422341
TreeView+ depends on / blocked
 
Reported: 2016-03-10 19:36 UTC by Paul Cuzner
Modified: 2019-04-28 14:35 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
High availability could previously not be enabled for virtual machines in Hyper-converged mode. Previous fencing policies ignored Gluster processes. But in Hyper-converged mode, fencing policies are required to ensure that a host is not fenced if there is a brick process running, or to ensure no loss of quorum when shutting down the host with an active brick. The following fencing policies have been added to Hyper-converged clusters: - SkipFencingIfGlusterBricksUp: Fencing will be skipped if bricks are running and can be reached from other peers. - SkipFencingIfGlusterQuorumNotMet: Fencing will be skipped if bricks are running and shutting down the host will cause loss of quorum Virtual machine high availability can be tested by enabling power management on hyper-converged nodes.
Clone Of: 1316358
Environment:
Last Closed: 2017-04-27 09:36:35 UTC
oVirt Team: Gluster
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: planning_ack+
rule-engine: devel_ack+
sasundar: testing_ack+


Attachments (Terms of Use)
engine log (1.40 MB, application/x-gzip)
2016-03-10 19:37 UTC, Paul Cuzner
no flags Details
vdsm log from the host that was powered off during the test (847.06 KB, application/x-gzip)
2016-03-10 19:38 UTC, Paul Cuzner
no flags Details

Description Paul Cuzner 2016-03-10 19:36:12 UTC
+++ This bug was initially created as a clone of Bug #1316358 +++

Description of problem:
While testing a hyperconverged set up, I set vm's to highly available and have defined the fencing agent (idrac7) to the hosts. When a host running vm's is powered off through the DRAC, the vm's do not restart on one of the other nodes.

When the host down is detected, the event is shown in the UI and log as "User shutdown from within the guest" - which is clearly wrong.

Version-Release number of selected component (if applicable):
RHEV 3.6.3.4-0.1

How reproducible:
This is observed each time.

Steps to Reproduce:
1. Hyperconverged setup with RHEV 3.6 and Gluster 3.7
2. set vm's to be highly available
3. power off a host running a vm that is tagged as highly available
4. confirm that
   a) message shows that the engine believes the vm to have shut itself down
   b) vm is not restarted

Actual results:
VM's marked as highly available do NOT get restarted on the other nodes in the cluster

Expected results:
VM's with the highly available attribute should be restarted.

Additional info:
This issue was reported informally to Doron Fediuck and Roy Golan a couple of weeks ago.

Attaching engine.log and vdsm.log from the node shutdown for analysis

--- Additional comment from Paul Cuzner on 2016-03-09 22:29 EST ---



--- Additional comment from Yaniv Dary on 2016-03-10 04:09:45 EST ---

If this reproduces on non HC setup, please move it to SLA.

Comment 1 Paul Cuzner 2016-03-10 19:37:35 UTC
Created attachment 1135041 [details]
engine log

Comment 2 Paul Cuzner 2016-03-10 19:38:07 UTC
Created attachment 1135042 [details]
vdsm log from the host that was powered off during the test

Comment 3 Allon Mureinik 2016-03-21 13:56:03 UTC
Paul, can you explain why this was cloned from Bug #1316358 ? Is there an additional action item here?

Comment 4 Paul Cuzner 2016-03-21 19:17:59 UTC
I initially raised the BZ against RHEV, since I was testing against downstream - but Yaniv D requested that the BZ should be opened against ovirt not rhev.

HTH

Comment 5 Sahina Bose 2016-03-31 09:42:22 UTC
Need changes to fencing logic, to consider gluster running on the nodes.

Comment 6 Sahina Bose 2016-10-26 10:54:58 UTC
Fencing policies related to gluster hosts have been merged. HA for VMs can now be tested by enabling power management on HC nodes.

Comment 7 Sandro Bonazzola 2016-12-12 14:00:44 UTC
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 8 RamaKasturi 2017-03-30 11:02:49 UTC
Verified and works fine with build ovirt-engine-4.1.1.2-0.1.el7.noarch

While testing a hyperconverged set up, I set vm's to highly available and have defined the fencing agent  to the hosts. When a host running vm's is powered off the vm's gets restarted on one of the other nodes and the host gets fenced and comes up after a while.

Comment 9 Marina Kalinin 2017-05-24 18:52:27 UTC
Should be included in 4.1 GA d/s.


Note You need to log in before you can comment on or make changes to this bug.