Created attachment 1810385 [details] cluster-monitoring-operator pod logs Description of problem: this is a negative case, deploy openshift-state-metrics/telemeter-client/thanos-querier pods to nodes where the nodeSelector does not exist, in this case, no node labeled with deploy=new configmap .................. apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | telemeterClient: nodeSelector: deploy: new openshiftStateMetrics: nodeSelector: deploy: new thanosQuerier: nodeSelector: deploy: new .................. from the CMO logs, for example, .................. W0803 07:55:07.333089 1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas .................. actually we expected 1 replica for openshift-state-metrics, but the log reported expected 2 replicas, got 1 updated replicas. same for telemeter-client(expected 1 replica, but reported expected 2 replicas)/thanos-querier(expected 2 replicas, but reported expected 3 replicas) # oc -n openshift-monitoring logs $(oc -n openshift-monitoring get po | grep cluster-monitoring-operator | awk '{print $1}') -c cluster-monitoring-operator | grep "updating Deployment object failed: waiting for DeploymentRollout" W0803 07:44:55.214731 1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas W0803 07:44:55.691078 1 tasks.go:71] task 11 of 15: Updating Telemeter client failed: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas W0803 07:44:58.568087 1 tasks.go:71] task 13 of 15: Updating Thanos Querier failed: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas W0803 07:50:01.855932 1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas W0803 07:50:02.513818 1 tasks.go:71] task 11 of 15: Updating Telemeter client failed: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas W0803 07:50:04.632149 1 tasks.go:71] task 13 of 15: Updating Thanos Querier failed: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas W0803 07:55:07.333089 1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas W0803 07:55:07.533440 1 tasks.go:71] task 11 of 15: Updating Telemeter client failed: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas W0803 07:55:10.435813 1 tasks.go:71] task 13 of 15: Updating Thanos Querier failed: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas .................. # oc -n openshift-monitoring get rs | grep -E "openshift-state-metrics|telemeter-client|thanos-querier" openshift-state-metrics-667855d8cb 1 1 0 38m openshift-state-metrics-7bb78c4978 1 1 1 6h14m telemeter-client-64457bfb68 1 1 1 6h2m telemeter-client-9cbd9f797 1 1 0 38m thanos-querier-5644d48fbd 2 2 0 38m thanos-querier-7958b75d7 0 0 0 79m thanos-querier-86b84c6756 1 1 1 6h3m # oc -n openshift-monitoring get deploy | grep -E "openshift-state-metrics|telemeter-client|thanos-querier" openshift-state-metrics 1/1 1 1 6h15m telemeter-client 1/1 1 1 6h3m thanos-querier 1/2 2 1 6h4m # oc -n openshift-monitoring get pod | grep -E "openshift-state-metrics|telemeter-client|thanos-querier" openshift-state-metrics-667855d8cb-ht265 0/3 Pending 0 49m openshift-state-metrics-7bb78c4978-vlvtx 3/3 Running 0 6h25m telemeter-client-64457bfb68-2drsp 3/3 Running 0 6h14m telemeter-client-9cbd9f797-gqmrg 0/3 Pending 0 49m thanos-querier-5644d48fbd-7tl8j 0/5 Pending 0 49m thanos-querier-5644d48fbd-m9bxj 0/5 Pending 0 49m thanos-querier-86b84c6756-fvb8g 5/5 Running 0 51m .................. # oc get co monitoring -oyaml ... - lastTransitionTime: "2021-08-03T08:15:33Z" message: Rolling out the stack. reason: RollOutInProgress status: "True" type: Progressing - lastTransitionTime: "2021-08-03T07:55:10Z" message: |- Failed to rollout the stack. Error: updating openshift-state-metrics: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas updating telemeter client: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas updating thanos querier: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas reason: MultipleTasksFailed status: "True" type: Degraded Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-08-02-145924 How reproducible: always Steps to Reproduce: 1. see the description 2. 3. Actual results: Expected results: Additional info:
tested with the PR, the error now looks normal # oc -n openshift-monitoring get pod | grep -E "openshift-state-metrics|telemeter-client|thanos-querier" openshift-state-metrics-59dc557c86-6jcfb 3/3 Running 0 69m openshift-state-metrics-b8557c78d-dxzq4 0/3 Pending 0 15m telemeter-client-584b7d88d8-b4h2h 0/3 Pending 0 15m telemeter-client-b48ddbc69-w8rbp 3/3 Running 0 69m thanos-querier-5696fff86b-5lzpp 0/5 Pending 0 15m thanos-querier-5696fff86b-v7bkm 0/5 Pending 0 15m thanos-querier-7589f7578d-cqdx6 5/5 Running 0 63m # oc get co monitoring -oyaml ... - lastTransitionTime: "2021-09-22T07:45:04Z" message: |- Failed to rollout the stack. Error: updating openshift-state-metrics: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: the number of pods targeted by the deployment (2 pods) is different from the number of pods targeted by the deployment that have the desired template spec (1 pods) updating telemeter client: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: the number of pods targeted by the deployment (2 pods) is different from the number of pods targeted by the deployment that have the desired template spec (1 pods) updating thanos querier: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: the number of pods targeted by the deployment (3 pods) is different from the number of pods targeted by the deployment that have the desired template spec (2 pods) reason: MultipleTasksFailed status: "True" type: Degraded
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056