Bug 1989438 - expected replicas is wrong
Summary: expected replicas is wrong
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.10.0
Assignee: Prashant Balachandran
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-03 08:33 UTC by Junqi Zhao
Modified: 2022-03-12 04:37 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-12 04:37:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cluster-monitoring-operator pod logs (218.79 KB, text/plain)
2021-08-03 08:33 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1322 0 None None None 2021-09-06 15:38:25 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:37:13 UTC

Description Junqi Zhao 2021-08-03 08:33:45 UTC
Created attachment 1810385 [details]
cluster-monitoring-operator pod logs

Description of problem:
this is a negative case, deploy openshift-state-metrics/telemeter-client/thanos-querier pods to nodes where the nodeSelector does not exist, in this case, no node labeled with deploy=new
configmap
..................
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    telemeterClient:
      nodeSelector:
        deploy: new
    openshiftStateMetrics:
      nodeSelector:
        deploy: new
    thanosQuerier:
      nodeSelector:
        deploy: new
..................
from the CMO logs, for example, 
..................
W0803 07:55:07.333089       1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas
..................
actually we expected 1 replica for openshift-state-metrics, but the log reported expected 2 replicas, got 1 updated replicas. same for telemeter-client(expected 1 replica, but reported expected 2 replicas)/thanos-querier(expected 2 replicas, but reported expected 3 replicas)
# oc -n openshift-monitoring logs $(oc -n openshift-monitoring get po | grep cluster-monitoring-operator | awk '{print $1}') -c cluster-monitoring-operator | grep "updating Deployment object failed: waiting for DeploymentRollout"
W0803 07:44:55.214731       1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas
W0803 07:44:55.691078       1 tasks.go:71] task 11 of 15: Updating Telemeter client failed: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas
W0803 07:44:58.568087       1 tasks.go:71] task 13 of 15: Updating Thanos Querier failed: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas
W0803 07:50:01.855932       1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas
W0803 07:50:02.513818       1 tasks.go:71] task 11 of 15: Updating Telemeter client failed: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas
W0803 07:50:04.632149       1 tasks.go:71] task 13 of 15: Updating Thanos Querier failed: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas
W0803 07:55:07.333089       1 tasks.go:71] task 9 of 15: Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas
W0803 07:55:07.533440       1 tasks.go:71] task 11 of 15: Updating Telemeter client failed: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas
W0803 07:55:10.435813       1 tasks.go:71] task 13 of 15: Updating Thanos Querier failed: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas
..................
# oc -n openshift-monitoring get rs | grep -E "openshift-state-metrics|telemeter-client|thanos-querier"
openshift-state-metrics-667855d8cb       1         1         0       38m
openshift-state-metrics-7bb78c4978       1         1         1       6h14m
telemeter-client-64457bfb68              1         1         1       6h2m
telemeter-client-9cbd9f797               1         1         0       38m
thanos-querier-5644d48fbd                2         2         0       38m
thanos-querier-7958b75d7                 0         0         0       79m
thanos-querier-86b84c6756                1         1         1       6h3m

# oc -n openshift-monitoring get deploy | grep -E "openshift-state-metrics|telemeter-client|thanos-querier"
openshift-state-metrics       1/1     1            1           6h15m
telemeter-client              1/1     1            1           6h3m
thanos-querier                1/2     2            1           6h4m

# oc -n openshift-monitoring get pod | grep -E "openshift-state-metrics|telemeter-client|thanos-querier"
openshift-state-metrics-667855d8cb-ht265       0/3     Pending   0          49m
openshift-state-metrics-7bb78c4978-vlvtx       3/3     Running   0          6h25m
telemeter-client-64457bfb68-2drsp              3/3     Running   0          6h14m
telemeter-client-9cbd9f797-gqmrg               0/3     Pending   0          49m
thanos-querier-5644d48fbd-7tl8j                0/5     Pending   0          49m
thanos-querier-5644d48fbd-m9bxj                0/5     Pending   0          49m
thanos-querier-86b84c6756-fvb8g                5/5     Running   0          51m
..................
# oc get co monitoring -oyaml
...
  - lastTransitionTime: "2021-08-03T08:15:33Z"
    message: Rolling out the stack.
    reason: RollOutInProgress
    status: "True"
    type: Progressing
  - lastTransitionTime: "2021-08-03T07:55:10Z"
    message: |-
      Failed to rollout the stack. Error: updating openshift-state-metrics: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: expected 2 replicas, got 1 updated replicas
      updating telemeter client: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: expected 2 replicas, got 1 updated replicas
      updating thanos querier: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: expected 3 replicas, got 2 updated replicas
    reason: MultipleTasksFailed
    status: "True"
    type: Degraded


Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-02-145924

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Junqi Zhao 2021-09-22 07:51:47 UTC
tested with the PR, the error now looks normal
# oc -n openshift-monitoring get pod | grep -E "openshift-state-metrics|telemeter-client|thanos-querier"
openshift-state-metrics-59dc557c86-6jcfb       3/3     Running   0             69m
openshift-state-metrics-b8557c78d-dxzq4        0/3     Pending   0             15m
telemeter-client-584b7d88d8-b4h2h              0/3     Pending   0             15m
telemeter-client-b48ddbc69-w8rbp               3/3     Running   0             69m
thanos-querier-5696fff86b-5lzpp                0/5     Pending   0             15m
thanos-querier-5696fff86b-v7bkm                0/5     Pending   0             15m
thanos-querier-7589f7578d-cqdx6                5/5     Running   0             63m

# oc get co monitoring -oyaml
...
  - lastTransitionTime: "2021-09-22T07:45:04Z"
    message: |-
      Failed to rollout the stack. Error: updating openshift-state-metrics: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: the number of pods targeted by the deployment (2 pods) is different from the number of pods targeted by the deployment that have the desired template spec (1 pods)
      updating telemeter client: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: the number of pods targeted by the deployment (2 pods) is different from the number of pods targeted by the deployment that have the desired template spec (1 pods)
      updating thanos querier: reconciling Thanos Querier Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/thanos-querier: the number of pods targeted by the deployment (3 pods) is different from the number of pods targeted by the deployment that have the desired template spec (2 pods)
    reason: MultipleTasksFailed
    status: "True"
    type: Degraded

Comment 10 errata-xmlrpc 2022-03-12 04:37:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.