Bug 1866782
| Summary: | deploy replicas number is changed wrongly under some conditions | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> |
| Component: | kube-scheduler | Assignee: | Tomáš Nožička <tnozicka> |
| Status: | CLOSED DUPLICATE | QA Contact: | RamaKasturi <knarra> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.6 | CC: | aos-bugs, mfojtik |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-21 11:49:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hit similar issue upgrading from 4.5.5-x86_64 -> 4.6.0-0.nightly-2020-08-05-103641 on matrix 23_IPI on OSP13 & FIPS on & OVN. Below are the error details:
[ramakasturinarra@dhcp35-60 ~]$ oc describe co/monitoring
Name: monitoring
Namespace:
Labels: <none>
Annotations: <none>
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2020-08-06T07:08:53Z
Generation: 1
Managed Fields:
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:status:
.:
f:extension:
Manager: cluster-version-operator
Operation: Update
Time: 2020-08-06T07:08:53Z
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
f:relatedObjects:
f:versions:
Manager: operator
Operation: Update
Time: 2020-08-06T09:57:09Z
Resource Version: 132621
Self Link: /apis/config.openshift.io/v1/clusteroperators/monitoring
UID: 22961c8a-e86f-4300-b128-79a4ad5554d3
Spec:
Status:
Conditions:
Last Transition Time: 2020-08-06T09:57:09Z
Message: Rollout of the monitoring stack is in progress. Please wait until it finishes.
Reason: RollOutInProgress
Status: True
Type: Upgradeable
Last Transition Time: 2020-08-06T08:50:08Z
Status: False
Type: Available
Last Transition Time: 2020-08-06T09:57:09Z
Message: Rolling out the stack.
Reason: RollOutInProgress
Status: True
Type: Progressing
Last Transition Time: 2020-08-06T08:50:08Z
Message: Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator: expected 2 replicas, got 1 updated replicas
Reason: UpdatingPrometheusOperatorFailed
Status: True
Type: Degraded
Extension: <nil>
Related Objects:
Group:
Name: openshift-monitoring
Resource: namespaces
Group: monitoring.coreos.com
Name:
Resource: servicemonitors
Group: monitoring.coreos.com
Name:
Resource: prometheusrules
Group: monitoring.coreos.com
Name:
Resource: alertmanagers
Group: monitoring.coreos.com
Name:
Resource: prometheuses
Versions:
Name: operator
Version: 4.6.0-0.nightly-2020-08-05-103641
Events: <none>
[ramakasturinarra@dhcp35-60 ~]$ oc describe co/monitoring
Name: monitoring
Namespace:
Labels: <none>
Annotations: <none>
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2020-08-06T07:08:53Z
Generation: 1
Managed Fields:
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:status:
.:
f:extension:
Manager: cluster-version-operator
Operation: Update
Time: 2020-08-06T07:08:53Z
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
f:relatedObjects:
f:versions:
Manager: operator
Operation: Update
Time: 2020-08-06T15:32:16Z
Resource Version: 481839
Self Link: /apis/config.openshift.io/v1/clusteroperators/monitoring
UID: 22961c8a-e86f-4300-b128-79a4ad5554d3
Spec:
Status:
Conditions:
Last Transition Time: 2020-08-06T15:32:16Z
Message: Rolling out the stack.
Reason: RollOutInProgress
Status: True
Type: Progressing
Last Transition Time: 2020-08-06T08:50:08Z
Message: Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator: expected 2 replicas, got 1 updated replicas
Reason: UpdatingPrometheusOperatorFailed
Status: True
Type: Degraded
Last Transition Time: 2020-08-06T15:32:16Z
Message: Rollout of the monitoring stack is in progress. Please wait until it finishes.
Reason: RollOutInProgress
Status: True
Type: Upgradeable
Last Transition Time: 2020-08-06T08:50:08Z
Status: False
Type: Available
Extension: <nil>
Related Objects:
Group:
Name: openshift-monitoring
Resource: namespaces
Group: monitoring.coreos.com
Name:
Resource: servicemonitors
Group: monitoring.coreos.com
Name:
Resource: prometheusrules
Group: monitoring.coreos.com
Name:
Resource: alertmanagers
Group: monitoring.coreos.com
Name:
Resource: prometheuses
Versions:
Name: operator
Version: 4.6.0-0.nightly-2020-08-05-103641
Events: <none>
It would be good to get the kcm logs, when this happens. KCM gets Unauthorized and doesn't react for some time until KAS let's it proceed. Being investigated in 1868750 *** This bug has been marked as a duplicate of bug 1868750 *** |
Description of problem: upgrade from 4.5.5 to 4.6.0-0.nightly-2020-08-05-103641, since there is network issue, cluster operator monitoring is Degraded and it expected 2 prometheus-operator replicas, actually we need only 1, see from the .spec.replicas:1 from deploy prometheus-operator # oc get co/monitoring -oyaml ... status: conditions: - lastTransitionTime: "2020-08-06T08:50:08Z" message: 'Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator: expected 2 replicas, got 1 updated replicas' reason: UpdatingPrometheusOperatorFailed status: "True" type: Degraded ... # oc -n openshift-monitoring get deploy prometheus-operator -oyaml ... spec: progressDeadlineSeconds: 600 replicas: 1 ... status: availableReplicas: 1 conditions: - lastTransitionTime: "2020-08-06T07:17:52Z" lastUpdateTime: "2020-08-06T07:17:52Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2020-08-06T08:55:09Z" lastUpdateTime: "2020-08-06T08:55:09Z" message: ReplicaSet "prometheus-operator-7896ccc77c" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 57 readyReplicas: 1 replicas: 2 unavailableReplicas: 1 updatedReplicas: 1 ... # oc -n openshift-monitoring get pod | grep prometheus-operator prometheus-operator-7795d56f7-d84bb 2/2 Running 0 94m prometheus-operator-7896ccc77c-jv2tx 0/2 ContainerCreating 0 89m # oc -n openshift-monitoring describe pod prometheus-operator-7896ccc77c-jv2tx Warning FailedCreatePodSandBox <invalid> (x45 over 63m) kubelet, kasturi-upg1-5hljl-master-2 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(aeb95da994e35dcea7be7a4122ec3ddca7afc909463de399377e2b26d72ef907): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition # oc -n openshift-monitoring get event | grep prometheus-operator-7795d56f7 118m Normal Scheduled pod/prometheus-operator-7795d56f7-d84bb Successfully assigned openshift-monitoring/prometheus-operator-7795d56f7-d84bb to kasturi-upg1-5hljl-master-2 118m Normal AddedInterface pod/prometheus-operator-7795d56f7-d84bb Add eth0 [10.129.0.13/23] 118m Normal Pulling pod/prometheus-operator-7795d56f7-d84bb Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:848dfc960804c25aef2ec9f45f0c9d236dc1616879785384d5111f14c70dd52c" 118m Normal Pulled pod/prometheus-operator-7795d56f7-d84bb Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:848dfc960804c25aef2ec9f45f0c9d236dc1616879785384d5111f14c70dd52c" 118m Normal Created pod/prometheus-operator-7795d56f7-d84bb Created container prometheus-operator 118m Normal Started pod/prometheus-operator-7795d56f7-d84bb Started container prometheus-operator 118m Normal Pulling pod/prometheus-operator-7795d56f7-d84bb Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c80cd7ddfcd963384b55bf43b801f4bd551bddf560bfa2354c5195552b52f4c" 118m Normal Pulled pod/prometheus-operator-7795d56f7-d84bb Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c80cd7ddfcd963384b55bf43b801f4bd551bddf560bfa2354c5195552b52f4c" 118m Normal Created pod/prometheus-operator-7795d56f7-d84bb Created container kube-rbac-proxy 118m Normal Started pod/prometheus-operator-7795d56f7-d84bb Started container kube-rbac-proxy 118m Normal SuccessfulCreate replicaset/prometheus-operator-7795d56f7 Created pod: prometheus-operator-7795d56f7-d84bb 118m Normal ScalingReplicaSet deployment/prometheus-operator Scaled up replica set prometheus-operator-7795d56f7 to 1 # oc -n openshift-monitoring get event | grep prometheus-operator-7896ccc77c 114m Normal Scheduled pod/prometheus-operator-7896ccc77c-jv2tx Successfully assigned openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx to kasturi-upg1-5hljl-master-2 112m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(3a36a03fdd25d271bed46b1fac03f370c72535f6410b5d8985a06342ea72c9e8): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 111m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(91cafe671005b7494dbfc30899fdd36e15108a7c22a307b9b65f2bb4185792a2): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 109m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(b5844ffe1d629effe817a5497af382eb53315066e91f625f6692f47f8acb4f7f): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 108m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(de52607f4fe180751cea839bfe337e85db52720b9ddc90141ef6330ff99b10ea): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 106m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(ef67e63a9e0fac2df5615e34ee94a50c7c3d08f6435b620823bf1773bf554aef): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 105m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(451838746e2515be66f900b5a25d7fe31ffa6ab49bd4e0e327b6820ed6f97955): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 103m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(70d15d9acdfca080aa37de9dcff25a878b623260c5c75f1fc776c16c551fa072): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 102m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(606a2733bcdf53687d6703a3e03273fb1b799d60d7706ebce7248cddb280b57b): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 100m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(78152b8f85cb85c145720c8149e7e62b8bff70c179447141673f3f8b610982b7): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 2m19s Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(b1285ff64718d51f69941edcc53c0f9a61e78a86f956a9a8fb435fda3a200b34): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 114m Normal SuccessfulCreate replicaset/prometheus-operator-7896ccc77c Created pod: prometheus-operator-7896ccc77c-jv2tx 114m Normal ScalingReplicaSet deployment/prometheus-operator Scaled up replica set prometheus-operator-7896ccc77c to 1 Version-Release number of selected component (if applicable): upgrade from 4.5.5 to 4.6.0-0.nightly-2020-08-05-103641 How reproducible: not sure Steps to Reproduce: 1. see the description 2. 3. Actual results: Expected results: Additional info: