Description of problem:
In hyperconverged use case with glusterfs, when a node is down but then returns glusterfs will initiate automatic self heal. This operation may take time to bring the bricks back into sync, during which a subsequent maintenance mode action within the same glusterfs replica set could result in a split brain scenario.
This rfe seeks to link the status of the volume to the maintenance mode workflow.
- if the volume is not in self heal, maintenance mode continues as before
- if the volume is healing, and the maintenance mode request is for a node taking part in self heal, the request should be denied with a message to the admin
- if the volume is healing, but the maintnenance mode request is for a node not participating in self heal operations - the request can continue.
background operations like self heal and rebalance need greater visibility in the ovirt UI
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Is there a way to know when self-heal is going on for a volume.
Using "gluster status all tasks" - we know when rebalance/remove-brick is going on. Is there something similar for self-heal?
Added Pranith's reply -
"I think gluster volume heal statistics command tells whether self-heal is going on or not. But once every 10 minutes it will show in-progress."
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
*** Bug 1196438 has been marked as a duplicate of this bug. ***
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
oVirt 4.0 beta has been released, moving to RC milestone.
Will this feature affect the UI in any way? Does this change need to be described in the Administration Guide, possibly in the Gluster chapter:
(In reply to emma heftman from comment #8)
> Hi Ramesh
> Will this feature affect the UI in any way? Does this change need to be
> described in the Administration Guide, possibly in the Gluster chapter:
Yes. It does affects the UI. You will see following new options in the host maintenance dialog box. This will be shown only when a host supports Gluster services.
1. Ignore Gluster Quorum and Self-Heal validations
By default oVirt/RHEV-M will check the gluster quorum is not lost when you move the host to maintenance. Also It checks that there is no self-heal activity which will be affected as part of moving the host to maintenance. User can avoid these checks by checking this option. This should be used only in rare situation when there is no other way to do maintenance activity on the node.
2. Stop Gluster service
This option can be used if the user wants to stop all gluster services while moving the host maintenance.
Verified and works fine with build Red Hat Virtualization Manager Version: 188.8.131.52-0.1.el7
Ovirt does not allow the host to be moved to maintenance if there are any unsynced entries present in the brick. It throws the following error "Error while executing action: Cannot switch the following Host(s) to Maintenance mode: host_name.
Unsynced entries present in following gluster bricks: [<gluster_ip>:/gluster_bricks/data/data, <gluster_ip>:/gluster_bricks/engine/engine, <gluster_ip>:/gluster_bricks/vmstore/vmstore].
When one of the brick in the volume is down and if user tries to move another node to maintenance by stopping glusterd services, ovirt displays an error "Error while executing action: Cannot switch the following Host(s) to Maintenance mode: <hostname>.Gluster quorum will be lost for the following Volumes: data,vmstore,engine.
When one of the node is already in maintenance with glusterd services stopped, Ovirt does not allow you to move another node into maintenance since quourm will be lost for the volumes.
Ovirt allows user to move more than one node to maintenance with out stopping glusterd services as all the bricks will be up on the nodes and quorum for volumes will not be lost in this case.
ovirt allows user to move node to maintenance though self heal is going on if user ignores the quourm and self heal validations by checking "Ignore quorum and self-heal validations"