Description of problem: OpenShift 4.4.5, we have observed pods being scheduled on to master nodes. Version-Release number of selected component (if applicable): OpenShift 4.4.5 bare-metal IPI (upgraded from 4.4.3) How reproducible: Every time this workload is created. Steps to Reproduce: 1. Ensure that master nodes have the "master" role and only the "master" role. 2. Ensure that the `.spec.mastersSchedulable` field of the `Schedulable/cluster` object is `false`. 3. Create application workload (via Helm 3). Actual results: Some of the newly-created pods have been scheduled to master nodes. Expected results: All of the newly-created pods are scheduled to *worker* nodes. Additional info:
@Amit, could your team handle the verification of this BZ?
I opened https://bugzilla.redhat.com/show_bug.cgi?id=1870665 and https://github.com/openshift/machine-config-operator/pull/2016 to backport the MCO fix to 4.5
Verified on Client Version: 4.6.0-0.nightly-2020-08-24-110601 Server Version: 4.6.0-0.nightly-2020-08-24-110601 Kubernetes Version: v1.19.0-rc.2+3e083ac-dirty The same problem was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1828250 and fix backported to 4.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1846503) and 4.4 (https://bugzilla.redhat.com/show_bug.cgi?id=1849217) I re-verified again now
I re-verified on 4.6 for the current change to ensure it is still working In 4.4 (4.4.0-0.nightly-2020-07-18-033102) it works since https://bugzilla.redhat.com/show_bug.cgi?id=1849217 verification. As far as I understand this fix is inside 4.4.13 and above. I didn't re-verify 4.4 now
To update this, I think the discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1828250 (which describes a similar issue) might have found the root cause of how the taints were removed after an upgrade. However I don't see any updates to the faulty logic that reconciles the mastersSchedulable field, so while the 2 bugs are related our fix is still necessary.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196