Bug 1851226

Summary: OpenShift 4.4.5: workload scheduled on master nodes
Product: OpenShift Container Platform Reporter: jteagno <jteagno+bugzilla>
Component: Machine Config OperatorAssignee: ravig <rgudimet>
Status: CLOSED ERRATA QA Contact: Lubov <lshilin>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: aos-bugs, bhershbe, ealcaniz, lshilin, markmc, mdame, mfojtik, miabbott, rgudimet, sgordon, sreichar
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1870665 (view as bug list) Environment:
Last Closed: 2020-10-27 16:09:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1870665    

Description jteagno 2020-06-25 19:54:08 UTC
Description of problem:

OpenShift 4.4.5, we have observed pods being scheduled on to master nodes.

Version-Release number of selected component (if applicable):

OpenShift 4.4.5 bare-metal IPI (upgraded from 4.4.3)


How reproducible:

Every time this workload is created.


Steps to Reproduce:
1. Ensure that master nodes have the "master" role and only the "master" role.
2. Ensure that the `.spec.mastersSchedulable` field of the `Schedulable/cluster` object is `false`.
3. Create application workload (via Helm 3).

Actual results:

Some of the newly-created pods have been scheduled to master nodes.


Expected results:

All of the newly-created pods are scheduled to *worker* nodes.


Additional info:

Comment 8 Micah Abbott 2020-08-11 19:03:51 UTC
@Amit, could your team handle the verification of this BZ?

Comment 13 Mike Dame 2020-08-20 14:58:11 UTC
I opened https://bugzilla.redhat.com/show_bug.cgi?id=1870665 and https://github.com/openshift/machine-config-operator/pull/2016 to backport the MCO fix to 4.5

Comment 15 Lubov 2020-08-25 10:08:03 UTC
Verified on
Client Version: 4.6.0-0.nightly-2020-08-24-110601
Server Version: 4.6.0-0.nightly-2020-08-24-110601
Kubernetes Version: v1.19.0-rc.2+3e083ac-dirty

The same problem was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1828250 and fix backported to 4.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1846503) and 4.4 (https://bugzilla.redhat.com/show_bug.cgi?id=1849217)
I re-verified again now

Comment 17 Lubov 2020-08-25 11:49:17 UTC
I re-verified on 4.6 for the current change to ensure it is still working

In 4.4 (4.4.0-0.nightly-2020-07-18-033102) it works since https://bugzilla.redhat.com/show_bug.cgi?id=1849217 verification. As far as I understand this fix is inside 4.4.13 and above. I didn't re-verify 4.4 now

Comment 18 Mike Dame 2020-08-31 13:54:18 UTC
To update this, I think the discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1828250 (which describes a similar issue) might have found the root cause of how the taints were removed after an upgrade. However I don't see any updates to the faulty logic that reconciles the mastersSchedulable field, so while the 2 bugs are related our fix is still necessary.

Comment 20 errata-xmlrpc 2020-10-27 16:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196