Description of problem: upgrade from 4.5.5 to 4.6.0-0.nightly-2020-08-05-103641, since there is network issue, cluster operator monitoring is Degraded and it expected 2 prometheus-operator replicas, actually we need only 1, see from the .spec.replicas:1 from deploy prometheus-operator # oc get co/monitoring -oyaml ... status: conditions: - lastTransitionTime: "2020-08-06T08:50:08Z" message: 'Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator: expected 2 replicas, got 1 updated replicas' reason: UpdatingPrometheusOperatorFailed status: "True" type: Degraded ... # oc -n openshift-monitoring get deploy prometheus-operator -oyaml ... spec: progressDeadlineSeconds: 600 replicas: 1 ... status: availableReplicas: 1 conditions: - lastTransitionTime: "2020-08-06T07:17:52Z" lastUpdateTime: "2020-08-06T07:17:52Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2020-08-06T08:55:09Z" lastUpdateTime: "2020-08-06T08:55:09Z" message: ReplicaSet "prometheus-operator-7896ccc77c" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 57 readyReplicas: 1 replicas: 2 unavailableReplicas: 1 updatedReplicas: 1 ... # oc -n openshift-monitoring get pod | grep prometheus-operator prometheus-operator-7795d56f7-d84bb 2/2 Running 0 94m prometheus-operator-7896ccc77c-jv2tx 0/2 ContainerCreating 0 89m # oc -n openshift-monitoring describe pod prometheus-operator-7896ccc77c-jv2tx Warning FailedCreatePodSandBox <invalid> (x45 over 63m) kubelet, kasturi-upg1-5hljl-master-2 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(aeb95da994e35dcea7be7a4122ec3ddca7afc909463de399377e2b26d72ef907): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition # oc -n openshift-monitoring get event | grep prometheus-operator-7795d56f7 118m Normal Scheduled pod/prometheus-operator-7795d56f7-d84bb Successfully assigned openshift-monitoring/prometheus-operator-7795d56f7-d84bb to kasturi-upg1-5hljl-master-2 118m Normal AddedInterface pod/prometheus-operator-7795d56f7-d84bb Add eth0 [10.129.0.13/23] 118m Normal Pulling pod/prometheus-operator-7795d56f7-d84bb Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:848dfc960804c25aef2ec9f45f0c9d236dc1616879785384d5111f14c70dd52c" 118m Normal Pulled pod/prometheus-operator-7795d56f7-d84bb Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:848dfc960804c25aef2ec9f45f0c9d236dc1616879785384d5111f14c70dd52c" 118m Normal Created pod/prometheus-operator-7795d56f7-d84bb Created container prometheus-operator 118m Normal Started pod/prometheus-operator-7795d56f7-d84bb Started container prometheus-operator 118m Normal Pulling pod/prometheus-operator-7795d56f7-d84bb Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c80cd7ddfcd963384b55bf43b801f4bd551bddf560bfa2354c5195552b52f4c" 118m Normal Pulled pod/prometheus-operator-7795d56f7-d84bb Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c80cd7ddfcd963384b55bf43b801f4bd551bddf560bfa2354c5195552b52f4c" 118m Normal Created pod/prometheus-operator-7795d56f7-d84bb Created container kube-rbac-proxy 118m Normal Started pod/prometheus-operator-7795d56f7-d84bb Started container kube-rbac-proxy 118m Normal SuccessfulCreate replicaset/prometheus-operator-7795d56f7 Created pod: prometheus-operator-7795d56f7-d84bb 118m Normal ScalingReplicaSet deployment/prometheus-operator Scaled up replica set prometheus-operator-7795d56f7 to 1 # oc -n openshift-monitoring get event | grep prometheus-operator-7896ccc77c 114m Normal Scheduled pod/prometheus-operator-7896ccc77c-jv2tx Successfully assigned openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx to kasturi-upg1-5hljl-master-2 112m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(3a36a03fdd25d271bed46b1fac03f370c72535f6410b5d8985a06342ea72c9e8): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 111m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(91cafe671005b7494dbfc30899fdd36e15108a7c22a307b9b65f2bb4185792a2): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 109m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(b5844ffe1d629effe817a5497af382eb53315066e91f625f6692f47f8acb4f7f): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 108m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(de52607f4fe180751cea839bfe337e85db52720b9ddc90141ef6330ff99b10ea): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 106m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(ef67e63a9e0fac2df5615e34ee94a50c7c3d08f6435b620823bf1773bf554aef): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 105m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(451838746e2515be66f900b5a25d7fe31ffa6ab49bd4e0e327b6820ed6f97955): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 103m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(70d15d9acdfca080aa37de9dcff25a878b623260c5c75f1fc776c16c551fa072): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 102m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(606a2733bcdf53687d6703a3e03273fb1b799d60d7706ebce7248cddb280b57b): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 100m Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(78152b8f85cb85c145720c8149e7e62b8bff70c179447141673f3f8b610982b7): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 2m19s Warning FailedCreatePodSandBox pod/prometheus-operator-7896ccc77c-jv2tx (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-operator-7896ccc77c-jv2tx_openshift-monitoring_a8c431f8-4235-42a4-bd2b-16364ade8fbd_0(b1285ff64718d51f69941edcc53c0f9a61e78a86f956a9a8fb435fda3a200b34): Multus: [openshift-monitoring/prometheus-operator-7896ccc77c-jv2tx]: PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for the condition 114m Normal SuccessfulCreate replicaset/prometheus-operator-7896ccc77c Created pod: prometheus-operator-7896ccc77c-jv2tx 114m Normal ScalingReplicaSet deployment/prometheus-operator Scaled up replica set prometheus-operator-7896ccc77c to 1 Version-Release number of selected component (if applicable): upgrade from 4.5.5 to 4.6.0-0.nightly-2020-08-05-103641 How reproducible: not sure Steps to Reproduce: 1. see the description 2. 3. Actual results: Expected results: Additional info:
Hit similar issue upgrading from 4.5.5-x86_64 -> 4.6.0-0.nightly-2020-08-05-103641 on matrix 23_IPI on OSP13 & FIPS on & OVN. Below are the error details: [ramakasturinarra@dhcp35-60 ~]$ oc describe co/monitoring Name: monitoring Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-08-06T07:08:53Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: f:status: .: f:extension: Manager: cluster-version-operator Operation: Update Time: 2020-08-06T07:08:53Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:relatedObjects: f:versions: Manager: operator Operation: Update Time: 2020-08-06T09:57:09Z Resource Version: 132621 Self Link: /apis/config.openshift.io/v1/clusteroperators/monitoring UID: 22961c8a-e86f-4300-b128-79a4ad5554d3 Spec: Status: Conditions: Last Transition Time: 2020-08-06T09:57:09Z Message: Rollout of the monitoring stack is in progress. Please wait until it finishes. Reason: RollOutInProgress Status: True Type: Upgradeable Last Transition Time: 2020-08-06T08:50:08Z Status: False Type: Available Last Transition Time: 2020-08-06T09:57:09Z Message: Rolling out the stack. Reason: RollOutInProgress Status: True Type: Progressing Last Transition Time: 2020-08-06T08:50:08Z Message: Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator: expected 2 replicas, got 1 updated replicas Reason: UpdatingPrometheusOperatorFailed Status: True Type: Degraded Extension: <nil> Related Objects: Group: Name: openshift-monitoring Resource: namespaces Group: monitoring.coreos.com Name: Resource: servicemonitors Group: monitoring.coreos.com Name: Resource: prometheusrules Group: monitoring.coreos.com Name: Resource: alertmanagers Group: monitoring.coreos.com Name: Resource: prometheuses Versions: Name: operator Version: 4.6.0-0.nightly-2020-08-05-103641 Events: <none> [ramakasturinarra@dhcp35-60 ~]$ oc describe co/monitoring Name: monitoring Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-08-06T07:08:53Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: f:status: .: f:extension: Manager: cluster-version-operator Operation: Update Time: 2020-08-06T07:08:53Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:relatedObjects: f:versions: Manager: operator Operation: Update Time: 2020-08-06T15:32:16Z Resource Version: 481839 Self Link: /apis/config.openshift.io/v1/clusteroperators/monitoring UID: 22961c8a-e86f-4300-b128-79a4ad5554d3 Spec: Status: Conditions: Last Transition Time: 2020-08-06T15:32:16Z Message: Rolling out the stack. Reason: RollOutInProgress Status: True Type: Progressing Last Transition Time: 2020-08-06T08:50:08Z Message: Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator: expected 2 replicas, got 1 updated replicas Reason: UpdatingPrometheusOperatorFailed Status: True Type: Degraded Last Transition Time: 2020-08-06T15:32:16Z Message: Rollout of the monitoring stack is in progress. Please wait until it finishes. Reason: RollOutInProgress Status: True Type: Upgradeable Last Transition Time: 2020-08-06T08:50:08Z Status: False Type: Available Extension: <nil> Related Objects: Group: Name: openshift-monitoring Resource: namespaces Group: monitoring.coreos.com Name: Resource: servicemonitors Group: monitoring.coreos.com Name: Resource: prometheusrules Group: monitoring.coreos.com Name: Resource: alertmanagers Group: monitoring.coreos.com Name: Resource: prometheuses Versions: Name: operator Version: 4.6.0-0.nightly-2020-08-05-103641 Events: <none>
It would be good to get the kcm logs, when this happens.
KCM gets Unauthorized and doesn't react for some time until KAS let's it proceed. Being investigated in 1868750 *** This bug has been marked as a duplicate of bug 1868750 ***