1563690 – [3.7] GlusterFS pods fail to deploy without restarting OpenShift services

Bug 1563690 - [3.7] GlusterFS pods fail to deploy without restarting OpenShift services

Summary: [3.7] GlusterFS pods fail to deploy without restarting OpenShift services

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	CNS-deployment
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	CNS 3.10
Assignee:	Jose A. Rivera
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-04 13:10 UTC by mmariyan
Modified:	2021-06-10 15:39 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-16 11:12:48 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description mmariyan 2018-04-04 13:10:32 UTC

Description of problem:

Fresh installation in ocp 3.7

1>The cu has noted when the glusterfs installation playbook it failed in the task reach " Wait for GlusterFS pods"  which is defined in the playbook "usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/glusterfs_deploy.yml" here he made the changes like below the glusterfs installation playbook runs successfully.

~~

#  retries: "{{ (glusterfs_timeout | int / 10) | int }}"
   retries: 150
~~

>In that time all the glusterfs pods not coming with "1/1 Running" state 
so he restarted the docker and node services then the pods state become 1/1 Running" state. Is this mandatory?

---
systemctl restart docker.service
systemctl restart atomic-openshift-node.service
---

Finally they then came up in the correct state:
---
READY    STATUS
1/1           Running
---


Actual results:

Playbook fails

Expected results:
It should get success glusterfs installation in openshift 3.7

Additional info:

Comment 1 Jose A. Rivera 2018-04-04 15:18:25 UTC

No, this should not be required. I'm not sure I fully understand the problem, however: Are you saying that the customer set the timeout to 150 and then the playbooks succeeded but the pods were not actually running?

Did you wipe the failed installation before re-running the playbooks? The GlusterFS playbooks are not idempotent, and it is not supported to run them more than once without uninstalling/wiping the failed deployment first. If you did do a second run, I suspect the playbooks detected the extant pods and skipped over the task that initially gave you problems. I don't know how they would have completed without heketi, however.

Please inspect the logs found in /var/log/glusterfs/glusterd.log as well as systemctl/journalctl logs for docker and atomic-openshift-node for additional information.

Updating the summary line to a more useful sentence.

Note You need to log in before you can comment on or make changes to this bug.