Description of problem: prometheus-adapter becomes inaccessible during rollout. Since prometheus-adapter is updated every 15 days[1], it will be rolled out every time. No data can be obtained from prometheus-adapter while performing this rollout. ``` the server is currently unable to handle the request (get pods.metrics.k8s.io) ``` [1] https://github.com/openshift/cluster-kube-apiserver-operator/blob/release-4.8/pkg/operator/certrotationcontroller/certrotationcontroller.go#L131-L133 ``` openshift-cluster-machine-approver machine-approver-58488dbb64-dlb4f 2/2 Running 0 26d openshift-monitoring prometheus-adapter-7c878c4b69-mkhhd 1/1 Running 0 11d <== here openshift-monitoring prometheus-adapter-7c878c4b69-rmcdk 1/1 Running 0 11d <== here openshift-monitoring prometheus-k8s-0 7/7 Running 8 26d openshift-monitoring prometheus-k8s-1 7/7 Running 8 26d openshift-monitoring prometheus-operator-5f86995d86-dclzv 2/2 Running 3 26d ``` Version-Release number of selected component (if applicable): OCP 4.8 How reproducible: every time Steps to Reproduce: 1. Try to get data from the prometheus-adapter when the rollout takes place 2. 3. Actual results: Unable to get data temporarily. Expected results: Since we have multiple prometheus-adapter, it is expected that data will get. Additional info: This can be a rollout timing-dependent issue. Since the prometheus-adapter does not have any probes(readiness/startup), it is ready before the data is available and we can see such problems during that time.
I could reproduce the problem by simulating a rollout on prometheus-adapter deployment. As described the report, I believe absence of liveness/readiness probes are causing the loss of service during rollout.
Issue still exists in payload 4.11.0-0.nightly-2022-04-01-172551
In 4.11.0-0.nightly-2022-04-04-224437 and later payloads, no accepted payloads for now.
Test with payload 4.11.0-0.nightly-2022-04-06-213816 Follow steps in #c3 No issue now.
*** Bug 2099373 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069