Description of problem:
OpenShift upgrade when a pod as a PodDisruptionBudget maxUnavailable set to 1 and disruptionsAllowed set to 0
MCO fails to upgrade os version but nothing shows as degraded.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a pod
# oc new-app --template httpd-example
2. Create poddisruptionbudget
# oc create poddisruptionbudget --min-available=1 test --selector="name=httpd-example"
3. Go to upgrade.
# oc adm upgrade --to-latest
- Upgrade succeeds but loops between some degraded state of random operators mostly because of scheduling failures.
- Nothing tells you why the upgrade failed until you look at the logs of the machine-config-daemon
# oc logs machine-config-daemon-v44gn
update.go:89] error when evicting pod "httpd-example-4-8fkpr" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
Then looking at `oc get node` we see all worker nodes on different version than masters.
For upgrade to succeed or fail just on the one node and tell us why it failed. Explaining that its due to PodDisruptionBudget.
# oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-130-250.us-west-2.compute.internal Ready master 115d v1.13.4+a80aad556
ip-10-0-134-211.us-west-2.compute.internal Ready worker 115d v1.13.4+12ee15d4a
ip-10-0-150-54.us-west-2.compute.internal Ready worker 115d v1.13.4+12ee15d4a
ip-10-0-155-30.us-west-2.compute.internal Ready master 115d v1.13.4+a80aad556
ip-10-0-171-64.us-west-2.compute.internal Ready,SchedulingDisabled worker 51d v1.13.4+12ee15d4a
ip-10-0-173-199.us-west-2.compute.internal Ready master 115d v1.13.4+a80aad556
Is https://bugzilla.redhat.com/show_bug.cgi?id=1747472 a side-effect of this issue?
PDBs are working exactly as expected. There is a related though NOT a dup bug https://bugzilla.redhat.com/show_bug.cgi?id=1752111
For this bug I expect the workload team to generate an Info level alert any time there is a PDB in the system which has no ability to be disrupted for $some period of time (say 5m?).
We need to pro-actively alert customers that they have something configured which is likely to cause them problems. This bug is tracking that Info level proactive alert.
1752111 is tracking a reactive MCO alert which will WARN a customer when such a PDB situation has broken the MCO's ability to do it's job.
If you would like to coordinate your efforts on these 2 alerts that is fine, however the workloads team owns an info level alert if the situation every happens. The MCO team owns a warn level alert if it affects the MCO.
If you are unclear what is required here, please don't hesitate to ask me or Clayton.
Verified this against a test system by creating a PDB at limit and verifying alert fired, then switching the PDB to require more pods than were possible and verifying it failed.
However, I noticed that the namespace of the failed PDB is not listed which complicates finding the offending PDB. I think the reported alert needs to have that namespace label set in some form.
Opened https://github.com/openshift/cluster-kube-controller-manager-operator/pull/309 with the namespace
Ge Liu, seems upgrade failure is expected; you need to check the alert should be displayed in Prometheus page's "Alerts" tab.
(In reply to Xingxing Xia from comment #10)
> Ge Liu, seems upgrade failure is expected; you need to check the alert
> should be displayed in Prometheus page's "Alerts" tab.
That is correct, the reason for that is due to this problem updates will fail, that's why we setup the alert in the first place.
Confirmed with payload: 4.3.0-0.nightly-2019-12-05-073829 upgrade to payload: 4.3.0-0.nightly-2019-12-05-213858:
[root@dhcp-140-138 ~]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.3.0-0.nightly-2019-12-05-213858 True False 68m Error while reconciling 4.3.0-0.nightly-2019-12-05-213858: the cluster operator ingress is degraded
[root@dhcp-140-138 ~]# oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-135-45.us-east-2.compute.internal Ready master 139m v1.16.2
ip-10-0-135-7.us-east-2.compute.internal Ready,SchedulingDisabled worker 129m v1.16.2
ip-10-0-147-20.us-east-2.compute.internal Ready worker 129m v1.16.2
ip-10-0-159-247.us-east-2.compute.internal Ready master 139m v1.16.2
ip-10-0-160-104.us-east-2.compute.internal Ready master 139m v1.16.2
Check alert in Prometheus:
== on(namespace, poddisruptionbudget, service) kube_poddisruptionbudget_status_desired_healthy
message: The pod disruption budget is preventing further disruption to pods because
it is at the minimum allowed level.