Bug 2033652 - node tuning operator metrics endpoint serving old certificates after certificate rotation
Summary: node tuning operator metrics endpoint serving old certificates after certific...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.7
Hardware: All
OS: All
high
high
Target Milestone: ---
: 4.9.z
Assignee: dagray
QA Contact: liqcui
URL:
Whiteboard:
Depends On: 2026387
Blocks: 2039062
TreeView+ depends on / blocked
 
Reported: 2021-12-17 14:32 UTC by OpenShift BugZilla Robot
Modified: 2022-01-17 08:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-17 08:07:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 305 0 None open [release-4.9] Bug 2033652: Handle certificate rotation in pkg/metrics/server.go 2022-01-07 20:16:52 UTC
Red Hat Product Errata RHBA-2022:0110 0 None None None 2022-01-17 08:07:49 UTC

Comment 2 liqcui 2022-01-12 02:12:49 UTC
Verified Result:
[mirroradmin@ec2-18-217-45-133 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2022-01-11-155222   True        False         13m     Cluster version is 4.9.0-0.nightly-2022-01-11-155222

[mirroradmin@ec2-18-217-45-133 ~]$ oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.liqcui-oc4908.qe.gcp.devcluster.openshift.com:6443".
[mirroradmin@ec2-18-217-45-133 ~]$ oc describe service/node-tuning-operator | grep Endpoints
Endpoints:         10.129.0.23:60000

 export METRICS_ENDPOINT="10.129.0.23:60000"

oc debug node/liqcui-oc4908-8kbh6-worker-a-x6sb7.c.openshift-qe.internal -- /bin/bash -c "/host/bin/openssl s_client -connect $METRICS_ENDPOINT 2>/dev/null </dev/null" | tee openssl_output_before.txt

oc debug node/liqcui-oc4908-8kbh6-worker-a-x6sb7.c.openshift-qe.internal -- /bin/bash -c "/host/bin/openssl s_client -connect $METRICS_ENDPOINT 2>/dev/null </dev/null | openssl x509 -noout -dates" | tee cert_dates_before.txt
Starting pod/liqcui-oc4908-8kbh6-worker-a-x6sb7copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
notBefore=Jan 12 01:30:27 2022 GMT
notAfter=Jan 12 01:30:28 2024 GMT

oc delete secret/signing-key -n openshift-service-ca
secret "signing-key" deleted

oc debug node/liqcui-oc4908-8kbh6-worker-a-x6sb7.c.openshift-qe.internal -- /bin/bash -c "/host/bin/openssl s_client -connect $METRICS_ENDPOINT 2>/dev/null </dev/null | openssl x509 -noout -dates" | tee cert_dates_after.txt
Starting pod/liqcui-oc4908-8kbh6-worker-a-x6sb7copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
notBefore=Jan 12 02:07:48 2022 GMT
notAfter=Jan 12 02:07:49 2024 GMT

oc get secret/node-tuning-operator-tls -o json  | jq -r '.data | ."tls.crt"' | base64 -d |  sed '/-END CERTIFICATE-/q' > cert_secret_after.txt
[mirroradmin@ec2-18-217-45-133 ~]$ diff cert_after.txt cert_secret_after.txt

oc logs cluster-node-tuning-operator-695d4898b4-x68kk -n openshift-cluster-node-tuning-operator | tail -20
I0112 01:31:03.070293       1 controller.go:1024] started events processor/controller
I0112 01:31:03.169575       1 controller.go:482] created Tuned rendered
I0112 01:31:03.349059       1 controller.go:586] created profile liqcui-oc4908-8kbh6-master-2.c.openshift-qe.internal [openshift-control-plane]
I0112 01:31:03.375337       1 controller.go:586] created profile liqcui-oc4908-8kbh6-master-1.c.openshift-qe.internal [openshift-control-plane]
I0112 01:31:03.388824       1 controller.go:586] created profile liqcui-oc4908-8kbh6-master-0.c.openshift-qe.internal [openshift-control-plane]
E0112 01:31:03.441910       1 status.go:56] unable to update ClusterOperator: Operation cannot be fulfilled on clusteroperators.config.openshift.io "node-tuning": the object has been modified; please apply your changes to the latest version and try again
E0112 01:31:03.442015       1 controller.go:181] unable to sync(profile/openshift-cluster-node-tuning-operator/liqcui-oc4908-8kbh6-master-2.c.openshift-qe.internal) requeued (0): failed to sync Profile liqcui-oc4908-8kbh6-master-2.c.openshift-qe.internal: failed to sync OperatorStatus: Operation cannot be fulfilled on clusteroperators.config.openshift.io "node-tuning": the object has been modified; please apply your changes to the latest version and try again
I0112 01:31:15.829391       1 status.go:259] at least one Profile application failing
I0112 01:31:15.849253       1 status.go:259] at least one Profile application failing
I0112 01:31:15.892946       1 status.go:259] at least one Profile application failing
E0112 01:35:57.101494       1 leaderelection.go:330] error retrieving resource lock openshift-cluster-node-tuning-operator/node-tuning-operator-lock: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps node-tuning-operator-lock)
I0112 01:36:29.203835       1 controller.go:586] created profile liqcui-oc4908-8kbh6-worker-b-wccj6.c.openshift-qe.internal [openshift-node]
I0112 01:36:29.728934       1 controller.go:586] created profile liqcui-oc4908-8kbh6-worker-a-x6sb7.c.openshift-qe.internal [openshift-node]
I0112 01:36:30.970045       1 controller.go:586] created profile liqcui-oc4908-8kbh6-worker-c-p6n54.c.openshift-qe.internal [openshift-node]
E0112 01:36:32.824938       1 status.go:56] unable to update ClusterOperator: Operation cannot be fulfilled on clusteroperators.config.openshift.io "node-tuning": the object has been modified; please apply your changes to the latest version and try again
E0112 01:36:32.825075       1 controller.go:181] unable to sync(clusteroperator//node-tuning) requeued (0): failed to sync OperatorStatus: Operation cannot be fulfilled on clusteroperators.config.openshift.io "node-tuning": the object has been modified; please apply your changes to the latest version and try again
I0112 02:08:04.521759       1 server.go:144] cert and key changed, need to restart the server.
I0112 02:08:04.521980       1 server.go:107] restarting metrics server to rotate certificates
I0112 02:08:04.522003       1 server.go:60] stopping metrics server
I0112 02:08:04.523589       1 server.go:51] starting metrics server
[mirroradmin@ec2-18-217-45-133 ~]$ 


The metrics sever pickup the rotated CA.

Comment 5 errata-xmlrpc 2022-01-17 08:07:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0110


Note You need to log in before you can comment on or make changes to this bug.