Bug 1645656
Summary: | Director deployed OCP 3.11: replacing an Infra node fails during TASK [openshift_storage_glusterfs : Verify heketi service] | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Marius Cornea <mcornea> |
Component: | Installer | Assignee: | Martin André <m.andre> |
Installer sub component: | openshift-ansible | QA Contact: | Johnny Liu <jialiu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | dbecker, jtrowbri, ltomasbo, m.andre, mburns, morazi |
Version: | 3.11.0 | Keywords: | ZStream |
Target Milestone: | --- | ||
Target Release: | 3.11.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openshift-ansible-3.11.74-1.git.0.cde4c69.el7 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-26 09:07:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marius Cornea
2018-11-02 18:43:17 UTC
Note: https://bugzilla.redhat.com/show_bug.cgi?id=1640382#c7 was applied as a workaround on the env. (In reply to Marius Cornea from comment #2) > Note: https://bugzilla.redhat.com/show_bug.cgi?id=1640382#c7 was applied as > a workaround on the env. For the infra node scale up, you would have to set the openshift-ansible openshift_storage_glusterfs_registry_heketi_admin_key variable instead. Now, I'm less sure about how to retrieve the heketi secret... According to the code, I would expect the secret to be named heketi-registry-admin-secret but all I can see in my environment is a heketi-storage-admin-secret secret. Possibly, the storage and registry share the same secret? Anyway, here is how to retrieve the heketi-storage-admin-secret secret: sudo oc get secret heketi-storage-admin-secret --namespace glusterfs -o json | jq -r .data.key | base64 -d It indeed succeeds scaling out infra node if I set `openshift_storage_glusterfs_registry_heketi_admin_key` to the output of: sudo oc get secret heketi-storage-admin-secret --namespace glusterfs -o json | jq -r .data.key | base64 -d Submitted a partial fix in openshift-ansible: https://github.com/openshift/openshift-ansible/pull/10710 However, there is an issue with the name of the registry heketi secret (https://github.com/openshift/openshift-ansible/issues/10712) so the above patch does not completely fix the issue. Removing blocker flag because we have a workaround. https://bugzilla.redhat.com/show_bug.cgi?id=1640382#c7 Workaround for this is actually in comment 4: https://bugzilla.redhat.com/show_bug.cgi?id=1645656#c4 Proposed a fix to openshift-ansible: https://github.com/openshift/openshift-ansible/pull/11072 Fix included in openshift-ansible-3.11.74-1. The ose-ansible container image was updated to v3.11.82-5 on the registry and should have the fix. https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/ose-ansible TASK [openshift_storage_glusterfs : Verify heketi service] ********************* [0;32mok: [openshift-master-2][0m TASK [openshift_storage_glusterfs : Wait for GlusterFS pods] ******************* [1;30mFAILED - RETRYING: Wait for GlusterFS pods (30 retries left).[0m [1;30mFAILED - RETRYING: Wait for GlusterFS pods (29 retries left).[0m [1;30mFAILED - RETRYING: Wait for GlusterFS pods (28 retries left).[0m [1;30mFAILED - RETRYING: Wait for GlusterFS pods (27 retries left).[0m [1;30mFAILED - RETRYING: Wait for GlusterFS pods (26 retries left).[0m [0;32mok: [openshift-master-2][0m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1605 |