Description of problem: Upgrades of CNS clusters entails cascading updates from node to node, via openshift-ansible plpaybooks, but those don't inspect the gluster cluster health to ensure there's no healing going on before moving on to upgrade the next node in the cluster. This RFE is about adding heketi-cli and/or openshift-ansible functionality to indicate whether or not the Gluster pool for CNS is healing or not, and allow this to tie into openshift-ansible playbooks that upgrade the gluster nodes, so as to avoid breaking the cluster's consistency. We have experienced a case where a customer's OCP w/ CNS cluster had undergone an upgrade via openshift-ansible and the playbooks don't stop to check for healing state of the gluster pool, so the upgrade continues regardless and ruins the data consistency, necessitating a rebuild, and resulting in potential data loss. Version-Release number of selected component (if applicable): CNS 3.6 with Openshift 3.7 How reproducible: upgrade a functional OCP 3.6 cluster with CNS 3.6 to OCP 3.7. Steps to Reproduce: 1. build OCP 3.6 with 3 dedicated gluster nodes for CNS 3.6 2. use openshift-ansible to install CNS 3.6 3. upgrade cluster to OCP 3.7 with openshift-ansible Actual results: Observe that there isn't a stage where a health check can be done to sufficiently validate that the gluster cluster is completed healing before upgrading the next node, which results in inconsistent gluster cluster. Expected results: Health check of gluster storage being completely healthy before upgrading the next node in line. Additional info: suggest: assign to jrivera
https://bugzilla.redhat.com/show_bug.cgi?id=1540685
(In reply to Aaren from comment #2) > https://bugzilla.redhat.com/show_bug.cgi?id=1540685 related ^^
Jose, I changed the component to cns-ansible as the bug asks for better checks before running CNS/OCS upgrade playbooks. Triage this bug depending on the current status of OCS upgrade playbook.
This is already taken care of in the downstream builds.