1540680 – [RFE] Deeper CNS Gluster health status checks needed in order to validate health of pools during OCP/CNS upgrades

Bug 1540680 - [RFE] Deeper CNS Gluster health status checks needed in order to validate health of pools during OCP/CNS upgrades

Summary: [RFE] Deeper CNS Gluster health status checks needed in order to validate hea...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	cns-ansible
Sub Component:
Version:	cns-3.6
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jose A. Rivera
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	OCS-3.11.1-devel-triage-done
TreeView+	depends on / blocked

Reported:	2018-01-31 17:41 UTC by Aaren
Modified:	2019-03-21 20:03 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-03-21 20:03:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Aaren 2018-01-31 17:41:30 UTC

Description of problem:
Upgrades of CNS clusters entails cascading updates from node to node, via openshift-ansible plpaybooks, but those don't inspect the gluster cluster health to ensure there's no healing going on before moving on to upgrade the next node in the cluster. This RFE is about adding heketi-cli and/or openshift-ansible functionality to indicate whether or not the Gluster pool for CNS is healing or not, and allow this to tie into openshift-ansible playbooks that upgrade the gluster nodes, so as to avoid breaking the cluster's consistency.

We have experienced a case where a customer's OCP w/ CNS cluster had undergone an upgrade via openshift-ansible and the playbooks don't stop to check for healing state of the gluster pool, so the upgrade continues regardless and ruins the data consistency, necessitating a rebuild, and resulting in potential data loss.

Version-Release number of selected component (if applicable):
CNS 3.6
with
Openshift 3.7

How reproducible:
upgrade a functional OCP 3.6 cluster with CNS 3.6 to OCP 3.7.

Steps to Reproduce:
1. build OCP 3.6 with 3 dedicated gluster nodes for CNS 3.6
2. use openshift-ansible to install CNS 3.6
3. upgrade cluster to OCP 3.7 with openshift-ansible

Actual results:
Observe that there isn't a stage where a health check can be done to sufficiently validate that the gluster cluster is completed healing before upgrading the next node, which results in inconsistent gluster cluster.

Expected results:
Health check of gluster storage being completely healthy before upgrading the next node in line.

Additional info:

suggest: assign to jrivera

Comment 2 Aaren 2018-01-31 18:29:39 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1540685

Comment 3 Aaren 2018-01-31 18:30:04 UTC

(In reply to Aaren from comment #2)
> https://bugzilla.redhat.com/show_bug.cgi?id=1540685

related ^^

Comment 5 Raghavendra Talur 2019-01-23 20:27:06 UTC

Jose,

I changed the component to cns-ansible as the bug asks for better checks before running CNS/OCS upgrade playbooks. Triage this bug depending on the current status of OCS upgrade playbook.

Comment 6 Jose A. Rivera 2019-01-23 22:26:41 UTC

This is already taken care of in the downstream builds.

Note You need to log in before you can comment on or make changes to this bug.