Bug 2114721
Summary: | telemeter-client pod does not use the updated pull secret when it is changed | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Karthik Perumal <kramraja> |
Component: | Monitoring | Assignee: | Joao Marcal <jmarcal> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | Brian Burt <bburt> |
Priority: | medium | ||
Version: | 4.10 | CC: | anpicker, bburt, jmarcal, kgordeev, spasquie, tremes, wking |
Target Milestone: | --- | Keywords: | ServiceDeliveryImpact |
Target Release: | 4.12.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
* Before this update the Telemeter Client (TC) only loaded new pull secrets when it was manually restarted. Therefore, if a pull secret had been changed or updated and the TC had not been restarted, the TC would fail to authenticate with the server. This update addresses the issue so that when the secret is rotated, the deployment is automatically restarted and uses the updated token to authenticate.
(link:https://bugzilla.redhat.com/show_bug.cgi?id=2114721[*BZ#2114721*])
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:54:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Karthik Perumal
2022-08-03 07:01:42 UTC
[1] looks like the monitoring operator is grabbing the pull secret once and then assuming it remains unchanged, although if it's the monitoring operator that's not noticing, I'm not clear on why comment 0's telemeter-client restart alone was sufficient to recover. [1]: https://github.com/openshift/cluster-monitoring-operator/blob/fcc377d33b5c41bcdacecb5838ac5d60fd5010ac/pkg/operator/operator.go#L845-L857 Hello @kramraj do you know if attention was given to the "IMPORTANT" warning in the OpenShift docs about this issue [1]? Was this procedure followed? [1] https://docs.openshift.com/container-platform/4.10/openshift_images/managing_images/using-image-pull-secrets.html#images-update-global-pull-secret_using-image-pull-secrets For managed openShift clusters (OSD/ROSA) the ownership transfer is carried out by SRE by following an internal SOP. See https://access.redhat.com/solutions/6126691 Do you know if the OCM process (for self managed OCP clusters) includes a step (under the hood?) that tells the telemeter client in-cluster, to use the new updated pull secret? telemeter-client pod now use the updated pull secret when it is changed, verification steps: 1. # oc -n openshift-config get secret pull-secret -o jsonpath="{.data.\.dockerconfigjson}" | base64 -d change the "cloud.openshift.com"."auth" to invalid value and base64 encode the whole pull secret, update it to secret pull-secret 2. wait for the telemeter pod restart, would see the error in logs # oc -n openshift-monitoring logs -c telemeter-client $(oc -n openshift-monitoring get pod --no-headers | grep telemeter-client | awk '{print $1}') level=info caller=main.go:97 ts=2022-09-30T03:00:40.608472233Z msg="telemeter client initialized" level=warn caller=forwarder.go:137 ts=2022-09-30T03:00:40.608657708Z component=forwarder msg="not anonymizing any labels" level=info caller=main.go:292 ts=2022-09-30T03:00:40.62568095Z msg="starting telemeter-client" from=https://prometheus-k8s.openshift-monitoring.svc:9091 to=https://infogw.api.openshift.com/ listen=localhost:8080 level=error caller=forwarder.go:276 ts=2022-09-30T03:00:40.854837755Z component=forwarder/worker msg="unable to forward results" err="unable to authorize to server: unable to exchange initial token for a long lived token: 404:\nnot found\n" level=warn caller=forwarder.go:137 ts=2022-09-30T03:00:42.968917576Z component=forwarder msg="not anonymizing any labels" level=error caller=forwarder.go:276 ts=2022-09-30T03:00:43.112688892Z component=forwarder/worker msg="unable to forward results" err="unable to authorize to server: unable to exchange initial token for a long lived token: 404:\nnot found\n" 3. update the "cloud.openshift.com"."auth" to valid value, wait for the telemeter pod restart. no error in logs # oc -n openshift-monitoring logs -c telemeter-client $(oc -n openshift-monitoring get pod --no-headers | grep telemeter-client | awk '{print $1}') level=info caller=main.go:97 ts=2022-09-30T03:10:43.361386255Z msg="telemeter client initialized" level=warn caller=forwarder.go:137 ts=2022-09-30T03:10:43.361537316Z component=forwarder msg="not anonymizing any labels" level=info caller=main.go:292 ts=2022-09-30T03:10:43.380925875Z msg="starting telemeter-client" from=https://prometheus-k8s.openshift-monitoring.svc:9091 to=https://infogw.api.openshift.com/ listen=localhost:8080 level=warn caller=forwarder.go:137 ts=2022-09-30T03:10:45.699834751Z component=forwarder msg="not anonymizing any labels" 4. # oc -n openshift-monitoring get secret telemeter-client -o jsonpath="{.data.token"} | base64 -d is also updated, the result is the same with "cloud.openshift.com"."auth" 5. check in telemeter server, the metrics could be pushed from client to server Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |