Bug originating from bug 1956308. Looking at [1], the initial error reported by CMO is indeed the same: "creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists". But the subsequent reconciliations fail for different reasons [2]. The next failure is because the prometheus operator isn't ready yet and the admission webhook fails (see bug 1949840): E0602 05:01:20.514522 1 operator.go:400] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Control Plane components failed: reconciling etcd rules PrometheusRule failed: updating PrometheusRule object failed: Internal error occurred: failed calling webhook "prometheusrules.openshift.io": Post "https://prometheus-operator.openshift-monitoring.svc:8080/admission-prometheusrules/validate?timeout=5s": no endpoints available for service "prometheus-operator" Then CMO fails repeatedly because the prometheus-k8s statefulset never converges to the desired state: E0602 05:06:24.595880 1 operator.go:400] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Prometheus-k8s failed: waiting for Prometheus object changes failed: waiting for Prometheus openshift-monitoring/k8s: expected 2 replicas, got 1 updated replicas Now looking at the pods [3], there's no node on which the prometheus-k8s-1 pod can be scheduled: "status": { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2021-06-02T05:02:17Z", "message": "0/6 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.", "reason": "Unschedulable", "status": "False", "type": "PodScheduled" } ], IIUC the mischeduling happens because some worker nodes have moved away from the 'us-east-1b' zone while prometheus PVs are bounded to this zone. $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/nodes.json | jq '.items| map(select( .metadata.labels["node-role.kubernetes.io/worker"] == "" )) | map( .metadata.name + ": " + .metadata.labels["topology.ebs.csi.aws.com/zone"] )' [ "ip-10-0-144-164.ec2.internal: us-east-1a", "ip-10-0-176-27.ec2.internal: us-east-1a", "ip-10-0-207-235.ec2.internal: us-east-1b" ] $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/persistentvolumes.json | jq '.items | map( .metadata.name + ": " + .spec.claimRef.name + ": " + .metadata.labels["failure-domain.beta.kubernetes.io/zone"])' [ "pvc-4b072738-5319-4337-8a3c-d819f28c4bf5: prometheus-data-prometheus-k8s-1: us-east-1b", "pvc-64a7fce5-45c3-4551-a955-5c7bb3cd5a89: prometheus-data-prometheus-k8s-0: us-east-1b" ] [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616 [2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/pods/openshift-monitoring_cluster-monitoring-operator-7556c4b9c6-7vlhj_cluster-monitoring-operator.log [3] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/pods.json
Timeline for [1]: * 4:13Z, prometheus-k8s-0 on ip-10-0-207-235 [2], which is the only node in us-east-1b [3]. * 4:13Z, prometheus-k8s-1 also on ip-10-0-207-235 [2]. * 5:02Z, prometheus-k8s-0 drained and rescheduled on ip-10-0-207-235 [2]. * 5:02Z, prometheus-k8s-1 drained, but sticks [4] because it can no longer land on ip-10-0-207-235 (hard anti-affinity [5], bug 1949262, recently ported back to 4.7 as bug 1957703, about to go out with 4.7.14 [6]) but it doesn't want to leave behind the persistent volume it had been using, which is in us-east-1b, and not available on the other nodes, which are both in us-east-1a [3]. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616 [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.reason == "Scheduled" and .metadata.namespace == "openshift-monitoring" and (.involvedObject.name | contains("prometheus-k8s"))) | .eventTime + " " + .message' | sort 2021-06-02T04:13:44.787896Z Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-207-235.ec2.internal 2021-06-02T04:13:44.906128Z Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-207-235.ec2.internal 2021-06-02T05:02:17.225115Z Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-207-235.ec2.internal [3]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/nodes.json | jq -r '.items[].metadata | select(.labels["node-role.kubernetes.io/worker"] == "") | .labels["failure-domain.beta.kubernetes.io/zone"] + " " + .name' | sort us-east-1a ip-10-0-144-164.ec2.internal us-east-1a ip-10-0-176-27.ec2.internal us-east-1b ip-10-0-207-235.ec2.internal [4]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1399935465968111616/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[] | select(.metadata.name == "prometheus-k8s-1").status' { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2021-06-02T05:02:17Z", "message": "0/6 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.", "reason": "Unschedulable", "status": "False", "type": "PodScheduled" } ], "phase": "Pending", "qosClass": "Burstable" } [5]: https://github.com/openshift/cluster-monitoring-operator/pull/1135 [6]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.7.14
Ok, as comment 1 pointed out, hard anti-affinity was recently ported back to 4.7 as bug 1957703 and is in 4.7.14. Testing with a cluster-bot cluster [1]: $ oc get clusterversion -o jsonpath='{.status.desired.version}{"\n"}' version 4.7.13 $ oc -n openshift-monitoring get -o wide pods | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 50m 10.131.0.23 ip-10-0-184-219.us-west-1.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 50m 10.131.0.25 ip-10-0-184-219.us-west-1.compute.internal <none> <none> $ oc get -o json nodes | jq -r '.items[].metadata | select(.labels["node-role.kubernetes.io/worker"] == "") | .labels["failure-domain.beta.kubernetes.io/zone"] + " " + .name' | sort us-west-1a ip-10-0-129-15.us-west-1.compute.internal us-west-1a ip-10-0-184-219.us-west-1.compute.internal us-west-1b ip-10-0-253-132.us-west-1.compute.internal I want to squeeze them down onto that 1b node: $ oc adm cordon ip-10-0-129-15.us-west-1.compute.internal $ oc adm cordon ip-10-0-184-219.us-west-1.compute.internal $ oc -n openshift-monitoring delete pod prometheus-k8s-0 $ oc -n openshift-monitoring delete pod prometheus-k8s-1 $ oc -n openshift-monitoring get -o wide pods | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 31s 10.129.2.8 ip-10-0-253-132.us-west-1.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 25s 10.129.2.9 ip-10-0-253-132.us-west-1.compute.internal <none> <none> Give them a PV, following [2]: $ cat <<EOF >manifest_cluster-monitoring-pvc.yml apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: volumeClaimTemplate: metadata: name: pvc spec: resources: requests: storage: 5Gi EOF $ oc apply -f manifest_cluster-monitoring-pvc.yml $ oc -n openshift-monitoring get -o wide pods | grep prometheus-k8s prometheus-k8s-0 0/7 ContainerCreating 0 16s <none> ip-10-0-253-132.us-west-1.compute.internal <none> <none> prometheus-k8s-1 0/7 ContainerCreating 0 16s <none> ip-10-0-253-132.us-west-1.compute.internal <none> <none> Uncordon: $ oc adm uncordon ip-10-0-129-15.us-west-1.compute.internal $ oc adm uncordon ip-10-0-184-219.us-west-1.compute.internal Update: $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "candidate-4.7"}]' $ oc adm upgrade --to 4.7.14 Wait a while. Confirm that, as expected, it hung: $ oc adm upgrade info: An upgrade is in progress. Working towards 4.7.14: 524 of 669 done (78% complete), waiting on monitoring ... $ oc get -o json clusteroperator monitoring | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-06-03T22:10:36Z Available=False : 2021-06-03T23:01:16Z Progressing=True RollOutInProgress: Rolling out the stack. 2021-06-03T22:10:36Z Degraded=True UpdatingPrometheusK8SFailed: Failed to rollout the stack. Error: running task Updating Prometheus-k8s failed: waiting for Prometheus object changes failed: waiting for Prometheus openshift-monitoring/k8s: expected 2 replicas, got 0 updated replicas 2021-06-03T23:01:16Z Upgradeable=True RollOutInProgress: Rollout of the monitoring stack is in progress. Please wait until it finishes. $ oc -n openshift-monitoring get -o wide pods | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 94m 10.129.2.10 ip-10-0-253-132.us-west-1.compute.internal <none> <none> prometheus-k8s-1 0/7 Pending 0 60m <none> <none> <none> <none> $ oc -n openshift-monitoring get -o json pod prometheus-k8s-1 | jq .status { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2021-06-03T22:05:41Z", "message": "0/6 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.", "reason": "Unschedulable", "status": "False", "type": "PodScheduled" } ], "phase": "Pending", "qosClass": "Burstable" } Check the PVs: $ oc get persistentvolumes NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-9e6673ed-31d4-4e6c-ab64-bd801210c126 5Gi RWO Delete Bound openshift-monitoring/pvc-prometheus-k8s-0 gp2 95m pvc-bc4f1b64-c233-4c51-a78e-5ac793b6c025 5Gi RWO Delete Bound openshift-monitoring/pvc-prometheus-k8s-1 gp2 95m Hopefully unstick by deleting the stuck pod's PV: $ oc -n openshift-monitoring delete persistentvolume pvc-bc4f1b64-c233-4c51-a78e-5ac793b6c025 This didn't actually complete, it just moved the with the PV to Terminating status, because the PVC blocks PV deletion [3]. Remove the PVC too: $ oc -n openshift-monitoring delete persistentvolumeclaim pvc-prometheus-k8s-1 Ok, that unblocked the PV: $ oc get persistentvolumes NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-9e6673ed-31d4-4e6c-ab64-bd801210c126 5Gi RWO Delete Bound openshift-monitoring/pvc-prometheus-k8s-0 gp2 103m But not the pod: $ oc -n openshift-monitoring get -o json pod prometheus-k8s-1 | jq .status { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2021-06-03T22:05:41Z", "message": "0/6 nodes are available: 6 persistentvolumeclaim \"pvc-prometheus-k8s-1\" not found.", "reason": "Unschedulable", "status": "False", "type": "PodScheduled" } ], "phase": "Pending", "qosClass": "Burstable" } Delete the pod to get a fresh replacement: $ oc -n openshift-monitoring delete pod prometheus-k8s-1 pod "prometheus-k8s-1" deleted Hooray: $ oc -n openshift-monitoring get -o wide pods | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 105m 10.129.2.10 ip-10-0-253-132.us-west- 1.compute.internal <none> <none> prometheus-k8s-1 6/7 Running 1 40s 10.131.0.56 ip-10-0-184-219.us-west-1.compute.internal <none> <none> And the update is flowing again: $ oc adm upgrade info: An upgrade is in progress. Working towards 4.7.14: 531 of 669 done (79% complete) ... So, for folks who are running 4.7.13 or earlier and thus have soft-anti-affinity, and who happen to have their Prom pods scheduled to the same node, and who happen to have only that node as a possible PV attachment point (e.g. because their storage provider pins PVs to a single availability zone, and they only have one node in that zone), 4.7.14 can stick, and recovering requires manual intervention on the order of three 'oc ... delete ...' calls (it's possible the three I used can be optimized). I'm moving the affected version back to 4.7 and setting the Regression keyword, and we can sort out whether we think this corner case is large enough to be worth tombstoning 4.7.14. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1400546094630309888 [2]: https://github.com/openshift/release/pull/11546/files [3]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection
Also interesting from the comment 2 reproducer, here's the pod that stuck: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1400546094630309888/artifacts/launch/events.json | jq -r '.items[] | select(tostring | (contains("prometheus-k8s-1") and (contains("pvc-") or contains("Scheduled")))) | (.lastTimestamp // .eventTime) + " " + .reason + ": " + .message' | sort 2021-06-03T20:35:35.514455Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-184-219.us-west-1.compute.internal 2021-06-03T21:28:44.430634Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-253-132.us-west-1.compute.internal 2021-06-03T21:31:11Z SuccessfulCreate: create Claim pvc-prometheus-k8s-1 Pod prometheus-k8s-1 in StatefulSet prometheus-k8s success 2021-06-03T21:31:11Z WaitForFirstConsumer: waiting for first consumer to be created before binding 2021-06-03T21:31:17.993036Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-253-132.us-west-1.compute.internal 2021-06-03T21:31:17Z ProvisioningSucceeded: Successfully provisioned volume pvc-bc4f1b64-c233-4c51-a78e-5ac793b6c025 using kubernetes.io/aws-ebs 2021-06-03T21:31:22Z SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-bc4f1b64-c233-4c51-a78e-5ac793b6c025" 2021-06-03T23:14:42.080514Z FailedScheduling: 0/6 nodes are available: 6 persistentvolumeclaim "pvc-prometheus-k8s-1" not found. 2021-06-03T23:14:52.410034Z FailedScheduling: 0/6 nodes are available: 6 persistentvolumeclaim "pvc-prometheus-k8s-1" not found. 2021-06-03T23:16:13Z SuccessfulCreate: create Claim pvc-prometheus-k8s-1 Pod prometheus-k8s-1 in StatefulSet prometheus-k8s success 2021-06-03T23:16:13Z WaitForFirstConsumer: waiting for first consumer to be created before binding 2021-06-03T23:16:18Z ProvisioningSucceeded: Successfully provisioned volume pvc-6f62185b-198a-4846-aa74-145c948d89ed using kubernetes.io/aws-ebs 2021-06-03T23:16:19.549401Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-184-219.us-west-1.compute.internal 2021-06-03T23:16:24Z SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-6f62185b-198a-4846-aa74-145c948d89ed" 2021-06-03T23:41:40.236834Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-1 to ip-10-0-129-15.us-west-1.compute.internal 2021-06-03T23:41:40Z FailedAttachVolume: Multi-Attach error for volume "pvc-6f62185b-198a-4846-aa74-145c948d89ed" Volume is already exclusively attached to one node and can't be attached to another 2021-06-03T23:41:53Z SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-6f62185b-198a-4846-aa74-145c948d89ed" Here's the other Prom pod: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1400546094630309888/artifacts/launch/events.json | jq -r '.items[] | select(tost ring | (contains("prometheus-k8s-0") and (contains("pvc-") or contains("Scheduled")))) | (.lastTimestamp // .eventTime) + " " + .reason + ": " + .message' | sort 2021-06-03T20:35:35.384820Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-184-219.us-west-1.compute.internal 2021-06-03T21:28:38.436446Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-253-132.us-west-1.compute.internal 2021-06-03T21:31:11Z SuccessfulCreate: create Claim pvc-prometheus-k8s-0 Pod prometheus-k8s-0 in StatefulSet prometheus-k8s success 2021-06-03T21:31:11Z WaitForFirstConsumer: waiting for first consumer to be created before binding 2021-06-03T21:31:17.836268Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-253-132.us-west-1.compute.internal 2021-06-03T21:31:17Z ProvisioningSucceeded: Successfully provisioned volume pvc-9e6673ed-31d4-4e6c-ab64-bd801210c126 using kubernetes.io/aws-ebs 2021-06-03T21:31:20Z SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-9e6673ed-31d4-4e6c-ab64-bd801210c126" 2021-06-03T23:17:01.551702Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-253-132.us-west-1.compute.internal 2021-06-03T23:17:09Z SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-9e6673ed-31d4-4e6c-ab64-bd801210c126" 2021-06-03T23:35:19.165148Z Scheduled: Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-253-132.us-west-1.compute.internal 2021-06-03T23:35:21Z SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-9e6673ed-31d4-4e6c-ab64-bd801210c126" And here are the nodes: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1400546094630309888/artifacts/launch/events.json | jq -r '.items[] | select(.metadata.namespace == "default" and (.reason | match("Node.*Ready"))) | .lastTimestamp + " " + .reason + " " + .message' | sort 2021-06-03T20:26:48Z NodeReady Node ip-10-0-189-50.us-west-1.compute.internal status is now: NodeReady 2021-06-03T20:26:50Z NodeReady Node ip-10-0-140-103.us-west-1.compute.internal status is now: NodeReady 2021-06-03T20:34:20Z NodeReady Node ip-10-0-184-219.us-west-1.compute.internal status is now: NodeReady 2021-06-03T20:37:31Z NodeReady Node ip-10-0-129-15.us-west-1.compute.internal status is now: NodeReady 2021-06-03T20:37:31Z NodeReady Node ip-10-0-253-132.us-west-1.compute.internal status is now: NodeReady 2021-06-03T23:35:09Z NodeNotReady Node ip-10-0-140-103.us-west-1.compute.internal status is now: NodeNotReady 2021-06-03T23:36:09Z NodeReady Node ip-10-0-140-103.us-west-1.compute.internal status is now: NodeReady 2021-06-03T23:39:24Z NodeNotReady Node ip-10-0-189-50.us-west-1.compute.internal status is now: NodeNotReady 2021-06-03T23:40:24Z NodeNotReady Node ip-10-0-129-15.us-west-1.compute.internal status is now: NodeNotReady 2021-06-03T23:40:51Z NodeReady Node ip-10-0-189-50.us-west-1.compute.internal status is now: NodeReady 2021-06-03T23:41:07Z NodeReady Node ip-10-0-129-15.us-west-1.compute.internal status is now: NodeReady 2021-06-03T23:44:27Z NodeNotReady Node ip-10-0-184-219.us-west-1.compute.internal status is now: NodeNotReady 2021-06-03T23:44:27Z NodeNotReady Node ip-10-0-204-184.us-west-1.compute.internal status is now: NodeNotReady 2021-06-03T23:44:33Z NodeReady Node ip-10-0-184-219.us-west-1.compute.internal status is now: NodeReady 2021-06-03T23:45:35Z NodeReady Node ip-10-0-204-184.us-west-1.compute.internal status is now: NodeReady Putting those all together: * 20:35Z was the initial prometheus-k8s-0 and prometheus-k8s-1 installs. * 21:28Z I push prometheus-k8s-0 and prometheus-k8s-1 onto node 132, in zone 1b, using cordons. * 21:31Z I configure the volumeClaimTemplate and the initial persistent volume creation. * 23:14Z the scheduler gets mad about prometheus-k8s-1 after I'd deleted the persistent volume and claim, but before I'd deleted the pod. * 23:16Z the new prometheus-k8s-1, persistent volume (now pvc-6f62...), and claim are created after I'd deleted the pod too. Interestingly, the new pod came back up on node 219, in zone 1a. * 23:17Z new prometheus-k8s-0 too, as it gets bumped to the 4.7.14 images. * 23:35Z new prometheus-k8s-0, still on node 132, in zone 1b. More on this below. * 23:37Z node 15 goes down. * 23:41Z node 15 comes back. * 23:41Z prometheus-k8s-1 is rescheduled during the pool-rolling period, this time onto node 15, also in 1a. * 23:41Z prometheus-k8s-1 moves over to node 15, presumably because 219 is being cordoned and drained, and 15 is in the same zone and can handle the volume attachment. * 23:44Z node 219 finishes draining and goes NodeNotReady. I dunno why we don't have Node*Ready events for node 132 around 23:35Z. But events are best-effort. However, the node conditions also have old timestamps: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1400546094630309888/artifacts/launch/nodes.json | jq -r '.items[] | select(.metadata.name == "ip-10-0-253-132.us-west-1.compute.internal").status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' | sort 2021-06-03T20:36:31Z DiskPressure=False KubeletHasNoDiskPressure: kubelet has no disk pressure 2021-06-03T20:36:31Z MemoryPressure=False KubeletHasSufficientMemory: kubelet has sufficient memory available 2021-06-03T20:36:31Z PIDPressure=False KubeletHasSufficientPID: kubelet has sufficient PID available 2021-06-03T20:37:31Z Ready=True KubeletReady: kubelet is posting ready status Node journals show we actually did reboot: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1400546094630309888/artifacts/launch/nodes/ip-10-0-253-132.us-west-1.compute.internal/journal | gunzip | grep -A6 'Starting Reboot' Jun 03 23:33:49.058030 ip-10-0-253-132 systemd[1]: Starting Reboot... Jun 03 23:33:49.066748 ip-10-0-253-132 systemd[1]: Shutting down. Jun 03 23:33:49.124865 ip-10-0-253-132 systemd-shutdown[1]: Syncing filesystems and block devices. Jun 03 23:33:49.186939 ip-10-0-253-132 systemd-shutdown[1]: Sending SIGTERM to remaining processes... Jun 03 23:33:49.195302 ip-10-0-253-132 systemd-journald[845]: Journal stopped -- Logs begin at Thu 2021-06-03 20:30:10 UTC, end at Thu 2021-06-03 23:47:29 UTC. -- Jun 03 23:34:39.524762 localhost kernel: Linux version 4.18.0-240.22.1.el8_3.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Mar 25 14:36:04 EDT 2021 So maybe some sort of kubelet or API-server bug in getting the node object updated? Anyhow, this lack of node 132 condition changes seems unrelated to this affinity bug.
the revert does not help # oc -n openshift-monitoring get sts prometheus-k8s -oyaml | grep podAntiAffinity podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: openshift-monitoring prometheus: k8s namespaces: - openshift-monitoring topologyKey: kubernetes.io/hostname weight: 100 ... # oc get node NAME STATUS ROLES AGE VERSION ip-10-0-143-99.us-west-2.compute.internal Ready master 100m v1.21.0-rc.0+2dfc46b ip-10-0-146-41.us-west-2.compute.internal Ready master 99m v1.21.0-rc.0+2dfc46b ip-10-0-169-141.us-west-2.compute.internal Ready worker 93m v1.21.0-rc.0+2dfc46b ip-10-0-171-63.us-west-2.compute.internal Ready worker 92m v1.21.0-rc.0+2dfc46b ip-10-0-206-43.us-west-2.compute.internal Ready worker 92m v1.21.0-rc.0+2dfc46b ip-10-0-218-125.us-west-2.compute.internal Ready master 100m v1.21.0-rc.0+2dfc46b # oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 13m 10.129.2.37 ip-10-0-171-63.us-west-2.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 13m 10.128.2.13 ip-10-0-206-43.us-west-2.compute.internal <none> <none> remove node where prometheus-k8s-1 is scheduled on # oc delete node ip-10-0-206-43.us-west-2.compute.internal node "ip-10-0-206-43.us-west-2.compute.internal" deleted # oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 24m 10.129.2.37 ip-10-0-171-63.us-west-2.compute.internal <none> <none> prometheus-k8s-1 0/7 Pending 0 7m29s <none> <none> <none> <none> # oc -n openshift-monitoring describe pod prometheus-k8s-1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <invalid> default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Warning FailedScheduling <invalid> default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Warning FailedScheduling <invalid> default-scheduler 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
This is the PVC used by monitoring # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-alertmanager-main-0 Bound pvc-979311f1-e47a-45e1-8571-a3c255f2141a 4Gi RWO gp2 25m alertmanager-alertmanager-main-1 Bound pvc-3a17a889-0f57-4c0b-9fc0-f4f7da3b448a 4Gi RWO gp2 25m alertmanager-alertmanager-main-2 Bound pvc-ad50e689-57a3-41b5-b53c-7a6b0bfe68c1 4Gi RWO gp2 25m prometheus-prometheus-k8s-0 Bound pvc-aababb58-2840-46b9-b653-f252d3036d86 10Gi RWO gp2 25m prometheus-prometheus-k8s-1 Bound pvc-66347a06-8766-4d43-bce6-319c6d410fcf 10Gi RWO gp2 25m
the machine-config/monitoring operators try to upgraded to 4.8, but meet errors, now we could see the volume node affinity conflict error if we attach PVs for alertmanager/prometheus pods, and these pods are in the same one node, in this case, we still have issue # oc -n openshift-monitoring get po -o wide | grep -E "alertmanager-main|prometheus-k8s" alertmanager-main-0 5/5 Running 0 59m 10.128.2.53 ip-10-0-205-141.us-east-2.compute.internal <none> <none> alertmanager-main-1 5/5 Running 0 59m 10.128.2.54 ip-10-0-205-141.us-east-2.compute.internal <none> <none> alertmanager-main-2 0/5 Pending 0 38m <none> <none> <none> <none> prometheus-k8s-0 0/7 Pending 0 38m <none> <none> <none> <none> prometheus-k8s-1 7/7 Running 1 59m 10.128.2.58 ip-10-0-205-141.us-east-2.compute.internal <none> <none> # oc get co monitoring machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE monitoring 4.8.0-0.nightly-2021-06-07-034343 False True True 23m machine-config 4.7.0-0.nightly-2021-06-06-160728 False True True 45m # oc get no ip-10-0-205-141.us-east-2.compute.internal NAME STATUS ROLES AGE VERSION ip-10-0-205-141.us-east-2.compute.internal Ready,SchedulingDisabled worker 141m v1.20.0+2817867 # oc -n openshift-monitoring describe pod alertmanager-main-2 Warning FailedScheduling 7m23s default-scheduler 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable. Warning FailedScheduling 4m32s default-scheduler 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable. Warning FailedScheduling 105s default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. # oc -n openshift-monitoring describe pod prometheus-k8s-0 Warning FailedScheduling 8m34s default-scheduler 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable. Warning FailedScheduling 5m45s default-scheduler 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable. Warning FailedScheduling 2m58s default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
this is the info before upgrade, cluster version is 4.7.0-0.nightly-2021-06-06-160728, Comment 10 is result after upgraded to 4.8.0-0.nightly-2021-06-07-034343 # oc -n openshift-monitoring get po -o wide | grep -E "alertmanager-main|prometheus-k8s" alertmanager-main-0 5/5 Running 0 12m 10.128.2.53 ip-10-0-205-141.us-east-2.compute.internal <none> <none> alertmanager-main-1 5/5 Running 0 12m 10.128.2.54 ip-10-0-205-141.us-east-2.compute.internal <none> <none> alertmanager-main-2 5/5 Running 0 12m 10.128.2.55 ip-10-0-205-141.us-east-2.compute.internal <none> <none> prometheus-k8s-0 7/7 Running 1 11m 10.128.2.52 ip-10-0-205-141.us-east-2.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 11m 10.128.2.58 ip-10-0-205-141.us-east-2.compute.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-alertmanager-main-0 Bound pvc-98e390a7-5dba-423a-8424-f83df96e51f4 4Gi RWO gp2 49m alertmanager-alertmanager-main-1 Bound pvc-4fbc08a3-5a91-4cbf-949e-13dc20249684 4Gi RWO gp2 49m alertmanager-alertmanager-main-2 Bound pvc-00c9e5dc-c531-4610-adf1-e27e4eea0b67 4Gi RWO gp2 49m prometheus-prometheus-k8s-0 Bound pvc-fc0c50f4-0d8e-4725-b00d-76336d382dae 10Gi RWO gp2 48m prometheus-prometheus-k8s-1 Bound pvc-8b12b298-693e-488f-93dc-d218f706b20a 10Gi RWO gp2 48m
followed steps • OCP 4.7 cluster • node A is in AZ 1, nodes B and C in AZ 2 • prom0 and prom1 scheduled on node A with persistent volumes • upgrade to 4.8 • CMO goes unavailable/degraded because the hard affinity makes it impossible to schedule prom1 (or prom0) on nodes which are in AZ 2 (the PV sticks to AZ 1) tried again and bound PVs only for prometheus pods, still have "volume node affinity conflict" issue. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-06-06-160728 True False 3m49s Cluster version is 4.7.0-0.nightly-2021-06-06-160728 # oc get node --show-labels NAME STATUS ROLES AGE VERSION LABELS ip-10-0-156-55.us-west-1.compute.internal Ready master 31m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-1,failure-domain.beta.kubernetes.io/zone=us-west-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-156-55,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-1a,topology.kubernetes.io/region=us-west-1,topology.kubernetes.io/zone=us-west-1a ip-10-0-167-70.us-west-1.compute.internal Ready master 31m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-1,failure-domain.beta.kubernetes.io/zone=us-west-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-167-70,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-1a,topology.kubernetes.io/region=us-west-1,topology.kubernetes.io/zone=us-west-1a ip-10-0-177-19.us-west-1.compute.internal Ready worker 24m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-1,failure-domain.beta.kubernetes.io/zone=us-west-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-177-19,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-1a,topology.kubernetes.io/region=us-west-1,topology.kubernetes.io/zone=us-west-1a ip-10-0-178-86.us-west-1.compute.internal Ready worker 22m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-1,failure-domain.beta.kubernetes.io/zone=us-west-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-178-86,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-1a,topology.kubernetes.io/region=us-west-1,topology.kubernetes.io/zone=us-west-1a ip-10-0-216-216.us-west-1.compute.internal Ready worker 22m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-1,failure-domain.beta.kubernetes.io/zone=us-west-1b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-216-216,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-1b,topology.kubernetes.io/region=us-west-1,topology.kubernetes.io/zone=us-west-1b ip-10-0-253-73.us-west-1.compute.internal Ready master 31m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-1,failure-domain.beta.kubernetes.io/zone=us-west-1b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-253-73,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-1b,topology.kubernetes.io/region=us-west-1,topology.kubernetes.io/zone=us-west-1b ip-10-0-216-216.us-west-1.compute.internal is worker which topology.kubernetes.io/zone=us-west-1b, other workers' zone is topology.kubernetes.io/zone=us-west-1a, attach PVs for prometheus pods and scheduled prometheus pods to ip-10-0-216-216.us-west-1.compute.internal # oc -n openshift-monitoring get po -o wide | grep "prometheus-k8s" prometheus-k8s-0 7/7 Running 1 38s 10.129.2.22 ip-10-0-216-216.us-west-1.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 38s 10.129.2.21 ip-10-0-216-216.us-west-1.compute.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE prometheus-prometheus-k8s-0 Bound pvc-d3af3885-d7a1-48b0-a8f9-776f0b4debc2 10Gi RWO gp2 56s prometheus-prometheus-k8s-1 Bound pvc-b7b330f1-1ffe-43de-ba00-0eb11e995cdc 10Gi RWO gp2 56s upgrade to 4.8.0-0.nightly-2021-06-07-034343 # oc -n openshift-monitoring get po -o wide | grep "prometheus-k8s" prometheus-k8s-0 0/7 Pending 0 47m <none> <none> <none> <none> prometheus-k8s-1 7/7 Running 1 82m 10.129.2.35 ip-10-0-216-216.us-west-1.compute.internal <none> <none> # oc -n openshift-monitoring describe pod prometheus-k8s-0 ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 24m default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Warning FailedScheduling 24m default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Warning FailedScheduling 24m default-scheduler 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable. Warning FailedScheduling 21m default-scheduler 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable. Warning FailedScheduling 18m default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. # oc get co machine-config -oyaml ... - lastTransitionTime: "2021-06-08T06:52:43Z" message: Cluster not available for 4.8.0-0.nightly-2021-06-07-034343 status: "False" type: Available extension: master: all 3 nodes are at latest configuration rendered-master-311d2a1cedd6275cd6ff0f9e6e7f355c worker: 'pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ip-10-0-216-216.us-west-1.compute.internal is reporting: \"failed to drain node (5 tries): timed out waiting for the condition: error when evicting pods/\\\"prometheus-k8s-1\\\" -n \\\"openshift-monitoring\\\": global timeout reached: 1m30s\""' # oc get co monitoring machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE monitoring 4.8.0-0.nightly-2021-06-07-034343 False True True 41m machine-config 4.8.0-0.nightly-2021-06-07-034343 False False True 17m
continue with Comment 12, after upgraded to 4.8, ip-10-0-216-216.us-west-1.compute.internal is SchedulingDisabled, since machine-config needs to set the node as SchedulingDisabled to upgrade machine-config to 4.8 # oc get no ip-10-0-216-216.us-west-1.compute.internal NAME STATUS ROLES AGE VERSION ip-10-0-216-216.us-west-1.compute.internal Ready,SchedulingDisabled worker 150m v1.20.0+2817867
We tombstoned 4.7.14 on this issue [1]. That's not technically a blocked edge, but it's pretty similar, so I'll mark up this bug as if it was a blocked edge [2]. [1]: https://github.com/openshift/cincinnati-graph-data/pull/839 [2]: https://github.com/openshift/enhancements/pull/475
4.7.0-0.nightly-2021-06-07-203428 cluster, ip-10-0-209-88.us-east-2.compute.internal is the worker node which has topology.kubernetes.io/zone=us-east-2b,it is different with other worker nodes, bind PVs for alertmanger/prometheus pods and schedule these pods to ip-10-0-209-88.us-east-2.compute.internal, then upgrade to 4.8.0-0.nightly-2021-06-09-000526, no "volume node affinity conflict" error now # oc get node --show-labels NAME STATUS ROLES AGE VERSION LABELS ip-10-0-132-29.us-east-2.compute.internal Ready worker 39m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-132-29,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-148-89.us-east-2.compute.internal Ready master 46m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-148-89,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-151-181.us-east-2.compute.internal Ready worker 40m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-151-181,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-164-105.us-east-2.compute.internal Ready master 46m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-164-105,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-209-88.us-east-2.compute.internal Ready worker 40m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-209-88.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2b,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2b ip-10-0-217-206.us-east-2.compute.internal Ready master 46m v1.20.0+2817867 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-217-206,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2b,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2b # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-06-07-203428 True False 4m21s Cluster version is 4.7.0-0.nightly-2021-06-07-203428 # oc -n openshift-monitoring get po -o wide | grep -E "prometheus-k8s|alertmanager-main" alertmanager-main-0 5/5 Running 0 2m18s 10.131.0.34 ip-10-0-209-88.us-east-2.compute.internal <none> <none> alertmanager-main-1 5/5 Running 0 2m18s 10.131.0.35 ip-10-0-209-88.us-east-2.compute.internal <none> <none> alertmanager-main-2 5/5 Running 0 2m18s 10.131.0.36 ip-10-0-209-88.us-east-2.compute.internal <none> <none> prometheus-k8s-0 7/7 Running 1 2m18s 10.131.0.32 ip-10-0-209-88.us-east-2.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 2m18s 10.131.0.33 ip-10-0-209-88.us-east-2.compute.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-9fb04e7e-6498-450a-a77b-421a02b55870 4Gi RWO gp2 2m25s alertmanager-main-db-alertmanager-main-1 Bound pvc-7022640d-3bda-45ec-bbf1-0f572804323e 4Gi RWO gp2 2m25s alertmanager-main-db-alertmanager-main-2 Bound pvc-4a62d6ab-18a6-4650-a92b-c8f831e61c34 4Gi RWO gp2 2m25s prometheus-k8s-db-prometheus-k8s-0 Bound pvc-ba0ec8c9-c0aa-412f-b8ed-2d696588d2eb 10Gi RWO gp2 2m25s prometheus-k8s-db-prometheus-k8s-1 Bound pvc-79c19466-d33a-47f4-8841-d2ee5ec98a1b 10Gi RWO gp2 2m25s upgrade to 4.8.0-0.nightly-2021-06-09-000526 # oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-06-09-000526 --allow-explicit-upgrade=true --forces after upgrade # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-09-000526 True False 3m12s Cluster version is 4.8.0-0.nightly-2021-06-09-000526 # oc -n openshift-monitoring get po -o wide | grep -E "prometheus-k8s|alertmanager-main" alertmanager-main-0 5/5 Running 0 21m 10.131.0.14 ip-10-0-209-88.us-east-2.compute.internal <none> <none> alertmanager-main-1 5/5 Running 0 21m 10.131.0.16 ip-10-0-209-88.us-east-2.compute.internal <none> <none> alertmanager-main-2 5/5 Running 0 21m 10.131.0.15 ip-10-0-209-88.us-east-2.compute.internal <none> <none> prometheus-k8s-0 7/7 Running 1 21m 10.131.0.12 ip-10-0-209-88.us-east-2.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 21m 10.131.0.11 ip-10-0-209-88.us-east-2.compute.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-9fb04e7e-6498-450a-a77b-421a02b55870 4Gi RWO gp2 75m alertmanager-main-db-alertmanager-main-1 Bound pvc-7022640d-3bda-45ec-bbf1-0f572804323e 4Gi RWO gp2 75m alertmanager-main-db-alertmanager-main-2 Bound pvc-4a62d6ab-18a6-4650-a92b-c8f831e61c34 4Gi RWO gp2 75m prometheus-k8s-db-prometheus-k8s-0 Bound pvc-ba0ec8c9-c0aa-412f-b8ed-2d696588d2eb 10Gi RWO gp2 75m prometheus-k8s-db-prometheus-k8s-1 Bound pvc-79c19466-d33a-47f4-8841-d2ee5ec98a1b 10Gi RWO gp2 75m
These clusters are "broken". How are we getting customers to fix these clusters and alert them there is a problem. 2 prometheus tied to one instance due to volumes is a high severity bug and the admin needs to take corrective action. Are we alerting on this situation now? The PDB is what we want - these users are wasting resources (they expect prometheus to be HA) and are not able to fix it. The product bug is not the PDB, the bug is that we allowed the cluster to get in this state and didn't notify the admin of why. I expect us to a) deliver an alert that flags this with corrective action b) once that alert rate is down, redeliver the PDB in 4.9 to fix the issue c) potentially broaden the alert if necessary to other similar cases
(In reply to Clayton Coleman from comment #21) > a) deliver an alert that flags this with corrective action Bug 1974832 has been created to track this. > b) once that alert rate is down, redeliver the PDB in 4.9 to fix the issue With the reversions from this bug landing, bug 1949262 was re-opened to track this redelivery. > c) potentially broaden the alert if necessary to other similar cases If this needs to be extended, it sounds like a separate bug too. With (a) and (b) being tracked in other places, and this bug's revert already being taken back to 4.7.16 via bug 1967966, I'm going to move this back to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438