1651754 – After bumping up of op-version we need to reboot the pods

Bug 1651754 - After bumping up of op-version we need to reboot the pods

Summary: After bumping up of op-version we need to reboot the pods

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	doc-Container_Native_Storage_with_OpenShift
Sub Component:
Version:	ocs-3.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	OCS 3.11.1
Assignee:	Chandrakanth Pai
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1644168
TreeView+	depends on / blocked

Reported:	2018-11-20 16:58 UTC by Raghavendra Talur
Modified:	2019-02-08 13:28 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-08 13:28:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Raghavendra Talur 2018-11-20 16:58:15 UTC

Description of problem:
When the op-version of a gluster cluster is upgraded, the volfiles are regenerated. After this step, we need to restart each pod one after other after waiting for self heal.

Comment 3 Raghavendra Talur 2018-11-21 16:37:53 UTC

There are two changes that are required:


1. The second command in step 14 of section 6.4[1] needs to be updated to the command given below. 
gluster --timeout=10800 volume set all cluster.op-version 31302

We could also have a short explanation like "Increasing the op-version of the cluster might take a long time depending on the number of volumes in the cluster. To prevent the cli from timing out, we run the command with a very high timeout value of 10800 seconds."

2. After step 14, we need another step added. This step will be identical to step 11 of the same section. Preface the step with the following explanation.
When the op-version of the cluster is changed, it is necessary that the pods are restarted. Pods should be restarted one at a time, waiting for restarted pod to be ready and pending self heals to complete before proceeding to restart of next pod.
<steps copied from step 11> 


[1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/3.11/html-single/deployment_guide/index#chap-Documentation-Red_Hat_Gluster_Storage_Container_Native_with_OpenShift_Platform-Upgrade-Gluster_pods


Atin, Michael,

Please review.

Comment 11 RamaKasturi 2018-12-06 10:42:22 UTC

Step 16 says to restart the pods but immediately after that documentation is read as to delete the pod, it might be confusing for the user / cu who reads it.

Can we say some thing as below in point i of step 16 ?

Restarting of pods is done by deleting them and to delete the pods,execute the following command.

Comment 12 RamaKasturi 2018-12-20 09:15:57 UTC

Moving this bug to assigned state as i would like the review comments put up in comment 11 to be addressed.

clearing the need info as i am moving the bug to assigned state.

Comment 13 Chandrakanth Pai 2018-12-21 06:28:19 UTC

Hi Kasturi,

I have made the change suggested in comment 11.

The change can be seen in step 17.

Link to review - https://access.qa.redhat.com/documentation/en-us/red_hat_openshift_container_storage/3.10/html-single/deployment_guide/#chap-Documentation-Red_Hat_Gluster_Storage_Container_Native_with_OpenShift_Platform-Upgrade-Gluster_pods

Comment 14 RamaKasturi 2018-12-21 07:43:07 UTC

verified in the link provided at Comment 13.

changes looks good to me.

But few things needs to be corrected again which is command to check for volumes heal info. Can you put the command below instead of what is present in the doc.

for each_volume in `gluster volume list`; do gluster volume heal $each_volume info ; done | grep "Number of entries: [^0]$"

And point ii can be removed.
ii. Run the following command to obtain the volume names:
# gluster volume list

Comment 15 Chandrakanth Pai 2018-12-21 09:36:21 UTC

Hi Kasturi,

I have incorporated the additional feedback given in comment 14.

Link to verify - https://access.qa.redhat.com/documentation/en-us/red_hat_openshift_container_storage/3.10/html-single/deployment_guide/#chap-Documentation-Red_Hat_Gluster_Storage_Container_Native_with_OpenShift_Platform-Upgrade-Gluster_pods

Comment 16 RamaKasturi 2018-12-21 11:02:38 UTC

Verified in the link given in comment 15.

There is a new point 17 added to reboot the pods one after the other after bumping up the op version.

Restart the pods after the op-version of the cluster is changed. Wait for the restarted pod to be ready and pending self heals to be completed before proceeding to restart the next pod. Ensure that all pods are restarted one at a time and not simultaneously.

Moving this to verified.

Note You need to log in before you can comment on or make changes to this bug.