Bug 1726608
Summary: | [RFE] Limit the number of retries for pre-requisite tasks in upgrade playbook | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nitin Goyal <nigoyal> | ||||
Component: | cns-ansible | Assignee: | John Mulligan <jmulligan> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Prasanth <pprakash> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | ocs-3.11 | CC: | arukumar, dpivonka, hchiramm, jarrpa, knarra, kramdoss, madam, pasik, rhs-bugs, rtalur, sarumuga | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | OCS 3.11.z Batch Update 4 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openshift-ansible-3.11.147-1 | Doc Type: | No Doc Update | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-02-13 05:22:03 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1703695 | ||||||
Attachments: |
|
Description
Nitin Goyal
2019-07-03 09:28:39 UTC
Created attachment 1587013 [details]
logs of ansible which shows the last task where it is trying again and again.
You can set "openshift_storage_glusterfs_timeout" to a smaller interval if so desired. I do not want to change the default. PR posted upstream: https://github.com/openshift/openshift-ansible/pull/11777 Also, fixes bz#1728184 PR above is merged. Fixed in version updated. this bug should be able to be verified by putting cluster in a bad state such as unhealed volumes or full bricks and if this bug is fixed it should try the health check 3 times then the playbook should fail. the 3 times comes from the var openshift_storage_glusterfs_timeout which defualts to 30 and is then divided by 10 resulting in 3 retries. this var openshift_storage_glusterfs_timeout should be in multiple of 10 if changed. That is not the expected result. Setting openshift_storage_glusterfs_timeout to 50 for example should result in 5 retires. Based on comment 24 I am moving this bug to fail-qa state. The variable to change the number of health retries is openshift_storage_glusterfs_health_timeout not openshift_storage_glusterfs_timeout. This variable is still over written to 30 here https://github.com/openshift/openshift-ansible/blob/5218f3c57ea9b5b5570bf0bc61b9bfea6df0632d/roles/openshift_storage_glusterfs/tasks/glusterfs_upgrade.yml#L12 and the default value for it is 1200 here https://github.com/openshift/openshift-ansible/blob/5218f3c57ea9b5b5570bf0bc61b9bfea6df0632d/roles/openshift_storage_glusterfs/defaults/main.yml#L27 I will open a PR to change the default to 30 and remove the over write. That should resolve this. PR merged fixed in version updated I have verfied the bug and moving it to verfied state. Snippet of ansible logs and inventory file arguments are as follows: version: ======== [root@master ~]# rpm -qa|grep ansible openshift-ansible-playbooks-3.11.147-1.git.0.bd6c010.el7.noarch ansible-2.6.19-1.el7ae.noarch openshift-ansible-3.11.147-1.git.0.bd6c010.el7.noarch openshift-ansible-roles-3.11.147-1.git.0.bd6c010.el7.noarch openshift-ansible-docs-3.11.147-1.git.0.bd6c010.el7.noarch case 1: openshift_storage_glusterfs_timeout=70 ============================================== inventory file arguments: -------------------------- openshift_storage_glusterfs_block_host_vol_create=true openshift_storage_glusterfs_block_host_vol_size=100 openshift_storage_glusterfs_health_timeout=70 openshift_storage_gluster_update_techpreview=true attempts=7 (passed), ansible logs: --------------------------------- 2019-10-07 12:56:26,245 p=100682 u=root | Using module file /usr/share/ansible/openshift-ansible/roles/lib_utils/library/glusterfs_check_containerized.py 2019-10-07 12:56:29,208 p=100682 u=root | fatal: [master -> master]: FAILED! => { "attempts": 7, "changed": false, "invocation": { "module_args": { "check_bricks": true, "cluster_name": "storage", "exclude_node": "master", "oc_bin": "oc", "oc_conf": "/etc/origin/master/admin.kubeconfig", "oc_namespace": "glusterfs", "target_nodes": null } }, "msg": "volume vol_82011932030d7bc34672f20a537b35d3 is not ready", "state": "unknown" } case 2: openshift_storage_glusterfs_timeout=20 ============================================== inventory file arguments: -------------------------- openshift_storage_glusterfs_block_host_vol_size=100 openshift_storage_glusterfs_health_timeout=20 openshift_storage_gluster_update_techpreview=true attempts=2 (passed), ansible logs: --------------------------------- 2019-10-07 13:02:28,038 p=113347 u=root | Using module file /usr/share/ansible/openshift-ansible/roles/lib_utils/library/glusterfs_check_containerized.py 2019-10-07 13:02:30,989 p=113347 u=root | fatal: [master -> master]: FAILED! => { "attempts": 2, "changed": false, "invocation": { "module_args": { "check_bricks": true, "cluster_name": "storage", "exclude_node": "master", "oc_bin": "oc", "oc_conf": "/etc/origin/master/admin.kubeconfig", "oc_namespace": "glusterfs", "target_nodes": null } }, "msg": "volume vol_82011932030d7bc34672f20a537b35d3 is not ready", "state": "unknown" } case 3: variable not mentioned in the inventory file ==================================================== attempts=3 (default value, passed), ansible logs: ------------------------------------------------- 2019-10-07 13:17:14,103 p=2657 u=root | Using module file /usr/share/ansible/openshift-ansible/roles/lib_utils/library/glusterfs_check_containerized.py 2019-10-07 13:17:17,028 p=2657 u=root | fatal: [master -> master]: FAILED! => { "attempts": 3, "changed": false, "invocation": { "module_args": { "check_bricks": true, "cluster_name": "storage", "exclude_node": "master", "oc_bin": "oc", "oc_conf": "/etc/origin/master/admin.kubeconfig", "oc_namespace": "glusterfs", "target_nodes": null } }, "msg": "volume vol_82011932030d7bc34672f20a537b35d3 is not ready", "state": "unknown" } 2019-10-07 13:17:17,036 p=2657 u=root | to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/upgrade.retry |