Bug 1640382
Summary: | Director deployed OCP: replacing worker node fails during TASK [openshift_storage_glusterfs : Verify heketi service] | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Marius Cornea <mcornea> | ||||
Component: | Installer | Assignee: | Martin André <m.andre> | ||||
Installer sub component: | openshift-ansible | QA Contact: | Johnny Liu <jialiu> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | urgent | ||||||
Priority: | urgent | CC: | dbecker, jtrowbri, ltomasbo, m.andre, mburns, mlopes, morazi, racedoro, tsedovic | ||||
Version: | 3.11.0 | Keywords: | ZStream | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.11.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openshift-ansible-3.11.74-1.git.0.cde4c69.el7 | Doc Type: | Known Issue | ||||
Doc Text: |
On a director-deployed OpenShift environment, the GlusterFS playbooks auto-generate a new heketi secret key for each run.
As a result of this, operations such as scale out or configuration changes on CNS deployments fail.
As a workaround, complete the following steps:
1. Post-deployment, retrieve the heketi secret key. Use this command on one of the master nodes:
sudo oc get secret heketi-storage-admin-secret --namespace glusterfs -o json | jq -r .data.key | base64 -d
2. In an environment file, set the following parameters to that value:
openshift_storage_glusterfs_heketi_admin_key
openshift_storage_glusterfs_registry_heketi_admin_key
As a result of this workaround, operations such as scale out or configuration changes on CNS deployments work as long as the parameters were manually extracted.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-06-26 09:07:51 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Marius Cornea
2018-10-17 23:23:26 UTC
Created attachment 1494991 [details]
openshift.tar.gz
Attaching /var/lib/mistral/openshift
Possibly fixed with https://review.openstack.org/#/c/611306/ ? New node detection was broken due to different service naming downstream. This patch fixed it. Just remembered node replacement is targeted at OSP15 (https://bugzilla.redhat.com/show_bug.cgi?id=1591288), removing triaged information so that bug appears in our triage meeting. Looks awfully similar to https://bugzilla.redhat.com/show_bug.cgi?id=1637105. Maybe one element of response: "By default, the GlusterFS playbooks will auto-generate a new heketi secret key for each run. You need to extract the key from the heketi config secret and set it as the value for "openshift_storage_glusterfs_heketi_admin_key" in your inventory file. That will reuse the existing key in your cluster when running the scaleup playbook." (In reply to Martin André from comment #5) > Looks awfully similar to https://bugzilla.redhat.com/show_bug.cgi?id=1637105. > > Maybe one element of response: > > "By default, the GlusterFS playbooks will auto-generate a new heketi secret > key for each run. You need to extract the key from the heketi config secret > and set it as the value for "openshift_storage_glusterfs_heketi_admin_key" > in your inventory file. That will reuse the existing key in your cluster > when running the scaleup playbook." Yes, it looks like the same issue but imo we can't consider this as a viable solution to the problem because it involves manual actions of the operator on the overcloud nodes. Can we do these steps automatically on the tripleo side? I can confirm that the scale up goes to completion if I set the openshift_storage_glusterfs_heketi_admin_key var to the existing heketi secret. Here is a one liner to get the heketi secret in an environment where openshift_storage_glusterfs_namespace=glusterfs (the default): sudo oc get secret heketi-storage-admin-secret --namespace glusterfs -o json | jq -r .data.key | base64 -d I've chatted a little bit with Jose A. Rivera about the issue and this looks like a regression in openshift-ansible. Our options are: 1) Fix the issue in openshift-ansible and ship it in 3.11 in time for OSP14 2) Document how to set the openshift_storage_glusterfs_heketi_admin_key variable for a scale up 3) Implement a way in tripleo to retrieve the secret and inject it to openshift-ansible before the scale up operation. Since we have a workaround, I suggest we remove the "blocker?" flag. Submitted a fix in openshift-ansible: https://github.com/openshift/openshift-ansible/pull/10710 Removing blocker flag because we have a workaround. https://bugzilla.redhat.com/show_bug.cgi?id=1640382#c7 Fix included in openshift-ansible-3.11.74-1. The ose-ansible container image was updated to v3.11.82-5 on the registry and should have the fix. https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/ose-ansible Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1605 |