This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1316692 - [HC] vm high availability is not working against a glusterfs storage domain
[HC] vm high availability is not working against a glusterfs storage domain
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: General (Show other bugs)
3.6.3.3
x86_64 Linux
urgent Severity high (vote)
: ovirt-4.1.0-alpha
: 4.1.1.2
Assigned To: Sahina Bose
RamaKasturi
:
Depends On: 1205641 1316358 1361115
Blocks: Gluster-HC-2 1422341
  Show dependency treegraph
 
Reported: 2016-03-10 14:36 EST by Paul Cuzner
Modified: 2017-05-24 14:52 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
High availability could previously not be enabled for virtual machines in Hyper-converged mode. Previous fencing policies ignored Gluster processes. But in Hyper-converged mode, fencing policies are required to ensure that a host is not fenced if there is a brick process running, or to ensure no loss of quorum when shutting down the host with an active brick. The following fencing policies have been added to Hyper-converged clusters: - SkipFencingIfGlusterBricksUp: Fencing will be skipped if bricks are running and can be reached from other peers. - SkipFencingIfGlusterQuorumNotMet: Fencing will be skipped if bricks are running and shutting down the host will cause loss of quorum Virtual machine high availability can be tested by enabling power management on hyper-converged nodes.
Story Points: ---
Clone Of: 1316358
Environment:
Last Closed: 2017-04-27 05:36:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Gluster
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+
rule-engine: planning_ack+
rule-engine: devel_ack+
sasundar: testing_ack+


Attachments (Terms of Use)
engine log (1.40 MB, application/x-gzip)
2016-03-10 14:37 EST, Paul Cuzner
no flags Details
vdsm log from the host that was powered off during the test (847.06 KB, application/x-gzip)
2016-03-10 14:38 EST, Paul Cuzner
no flags Details

  None (edit)
Description Paul Cuzner 2016-03-10 14:36:12 EST
+++ This bug was initially created as a clone of Bug #1316358 +++

Description of problem:
While testing a hyperconverged set up, I set vm's to highly available and have defined the fencing agent (idrac7) to the hosts. When a host running vm's is powered off through the DRAC, the vm's do not restart on one of the other nodes.

When the host down is detected, the event is shown in the UI and log as "User shutdown from within the guest" - which is clearly wrong.

Version-Release number of selected component (if applicable):
RHEV 3.6.3.4-0.1

How reproducible:
This is observed each time.

Steps to Reproduce:
1. Hyperconverged setup with RHEV 3.6 and Gluster 3.7
2. set vm's to be highly available
3. power off a host running a vm that is tagged as highly available
4. confirm that
   a) message shows that the engine believes the vm to have shut itself down
   b) vm is not restarted

Actual results:
VM's marked as highly available do NOT get restarted on the other nodes in the cluster

Expected results:
VM's with the highly available attribute should be restarted.

Additional info:
This issue was reported informally to Doron Fediuck and Roy Golan a couple of weeks ago.

Attaching engine.log and vdsm.log from the node shutdown for analysis

--- Additional comment from Paul Cuzner on 2016-03-09 22:29 EST ---



--- Additional comment from Yaniv Dary on 2016-03-10 04:09:45 EST ---

If this reproduces on non HC setup, please move it to SLA.
Comment 1 Paul Cuzner 2016-03-10 14:37 EST
Created attachment 1135041 [details]
engine log
Comment 2 Paul Cuzner 2016-03-10 14:38 EST
Created attachment 1135042 [details]
vdsm log from the host that was powered off during the test
Comment 3 Allon Mureinik 2016-03-21 09:56:03 EDT
Paul, can you explain why this was cloned from Bug #1316358 ? Is there an additional action item here?
Comment 4 Paul Cuzner 2016-03-21 15:17:59 EDT
I initially raised the BZ against RHEV, since I was testing against downstream - but Yaniv D requested that the BZ should be opened against ovirt not rhev.

HTH
Comment 5 Sahina Bose 2016-03-31 05:42:22 EDT
Need changes to fencing logic, to consider gluster running on the nodes.
Comment 6 Sahina Bose 2016-10-26 06:54:58 EDT
Fencing policies related to gluster hosts have been merged. HA for VMs can now be tested by enabling power management on HC nodes.
Comment 7 Sandro Bonazzola 2016-12-12 09:00:44 EST
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.
Comment 8 RamaKasturi 2017-03-30 07:02:49 EDT
Verified and works fine with build ovirt-engine-4.1.1.2-0.1.el7.noarch

While testing a hyperconverged set up, I set vm's to highly available and have defined the fencing agent  to the hosts. When a host running vm's is powered off the vm's gets restarted on one of the other nodes and the host gets fenced and comes up after a while.
Comment 9 Marina 2017-05-24 14:52:27 EDT
Should be included in 4.1 GA d/s.

Note You need to log in before you can comment on or make changes to this bug.