Bug 1851226 - OpenShift 4.4.5: workload scheduled on master nodes
Summary: OpenShift 4.4.5: workload scheduled on master nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: ravig
QA Contact: Lubov
URL:
Whiteboard:
Depends On:
Blocks: 1870665
TreeView+ depends on / blocked
 
Reported: 2020-06-25 19:54 UTC by jteagno
Modified: 2020-10-27 16:10 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1870665 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:09:46 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1878 0 None closed Bug 1851226: Refactor mastersSchedulable reconciliation checks 2020-12-08 12:18:29 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:10:13 UTC

Description jteagno 2020-06-25 19:54:08 UTC
Description of problem:

OpenShift 4.4.5, we have observed pods being scheduled on to master nodes.

Version-Release number of selected component (if applicable):

OpenShift 4.4.5 bare-metal IPI (upgraded from 4.4.3)


How reproducible:

Every time this workload is created.


Steps to Reproduce:
1. Ensure that master nodes have the "master" role and only the "master" role.
2. Ensure that the `.spec.mastersSchedulable` field of the `Schedulable/cluster` object is `false`.
3. Create application workload (via Helm 3).

Actual results:

Some of the newly-created pods have been scheduled to master nodes.


Expected results:

All of the newly-created pods are scheduled to *worker* nodes.


Additional info:

Comment 8 Micah Abbott 2020-08-11 19:03:51 UTC
@Amit, could your team handle the verification of this BZ?

Comment 13 Mike Dame 2020-08-20 14:58:11 UTC
I opened https://bugzilla.redhat.com/show_bug.cgi?id=1870665 and https://github.com/openshift/machine-config-operator/pull/2016 to backport the MCO fix to 4.5

Comment 15 Lubov 2020-08-25 10:08:03 UTC
Verified on
Client Version: 4.6.0-0.nightly-2020-08-24-110601
Server Version: 4.6.0-0.nightly-2020-08-24-110601
Kubernetes Version: v1.19.0-rc.2+3e083ac-dirty

The same problem was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1828250 and fix backported to 4.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1846503) and 4.4 (https://bugzilla.redhat.com/show_bug.cgi?id=1849217)
I re-verified again now

Comment 17 Lubov 2020-08-25 11:49:17 UTC
I re-verified on 4.6 for the current change to ensure it is still working

In 4.4 (4.4.0-0.nightly-2020-07-18-033102) it works since https://bugzilla.redhat.com/show_bug.cgi?id=1849217 verification. As far as I understand this fix is inside 4.4.13 and above. I didn't re-verify 4.4 now

Comment 18 Mike Dame 2020-08-31 13:54:18 UTC
To update this, I think the discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1828250 (which describes a similar issue) might have found the root cause of how the taints were removed after an upgrade. However I don't see any updates to the faulty logic that reconciles the mastersSchedulable field, so while the 2 bugs are related our fix is still necessary.

Comment 20 errata-xmlrpc 2020-10-27 16:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.