Bug 1828250 - [baremetal] Master nodes should be tagged as NoSchedule
Summary: [baremetal] Master nodes should be tagged as NoSchedule
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: Unspecified
OS: Linux
Target Milestone: ---
: 4.6.0
Assignee: Stephen Benjamin
QA Contact: Lubov
Depends On:
Blocks: 1846503
TreeView+ depends on / blocked
Reported: 2020-04-27 12:44 UTC by Lubov
Modified: 2020-10-27 15:58 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the control plane would always be schedulable for user workloads on the baremetal platform. Now the control plane nodes are correctly configured as NoSchedule in a typical deployment with workers.
Clone Of:
: 1846503 (view as bug list)
Last Closed: 2020-10-27 15:58:27 UTC
Target Upstream Version:

Attachments (Terms of Use)
master description (16.28 KB, text/plain)
2020-05-21 06:21 UTC, Lubov
no flags Details

System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1817 0 None closed Bug 1828250: Sync kublelet config across platforms 2020-12-28 12:25:54 UTC
Red Hat Bugzilla 1827996 0 medium CLOSED Pod is running on master node after scale up or evacuation 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:58:46 UTC

Description Lubov 2020-04-27 12:44:04 UTC
Description of problem:
After automated cluster provisioning and deployment:
scaling up an application pod resulted by pods running both on workers and masters

After investigation it cleared that masters are not tagged properly (see #1827996)

How reproducible:

Steps to Reproduce:
1. Provision and deploy a cluster using automation
2. Create an application pod:
# oc create new-project httpd-proj
# oc create new-app httpd
# oc expose dc/httpd
3. Add pod's replicas
oc scale dc httpd --replicas=3

Actual results:
Application pods are running both on workers and masters

Expected results:
Application pods should run on workers only

Additional info:

Comment 1 Stephen Benjamin 2020-04-27 13:12:16 UTC
Can you include what your install-config.yaml looks like (minus secrets)? This is likely intentional, if you do not specify > 0 compute replicas, the masters must be made schedulable. Generally we recommend you deploy with a minimum of 2 compute replicas, and 3 control plane replicas.

Comment 3 Stephen Benjamin 2020-04-27 14:09:20 UTC
Could you provide a must-gather (oc adm must-gather) from this cluster? Given your compute replicas, the masters should not be scheduable.

Comment 4 Stephen Cuppett 2020-04-27 14:20:04 UTC
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 6 Stephen Benjamin 2020-04-27 16:09:24 UTC
From your must-gather:

./cluster-scoped-resources/config.openshift.io/schedulers.yaml:    mastersSchedulable: false

Masters are not schedulable. I'm not sure why you're httpd app would end up on the masters in that case, moving this against the kube-scheduler folks for their feedback.

Comment 7 Maciej Szulik 2020-04-28 08:44:50 UTC
From what I see the spec of each and every node is empty:

spec: {}

kube-scheduler-operator is not in the bussiness of setting those tolerations, and kube-scheduler will react accordingly
to what is being set. I don't see a problem here from this perspective. 

I took a quick pick into https://github.com/openshift/machine-config-operator/blob/6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/vsphere/units/kubelet.yaml
which should register a node with appropriate taints, but I'll let them check this one out.

Comment 8 Stephen Benjamin 2020-05-04 12:35:18 UTC
Antonio, why have you moved this back to be assigned to me?

Maciej, you've linked to a vsphere template. This is baremetal: https://github.com/openshift/machine-config-operator/blob/6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/baremetal/units/kubelet.yaml

We rely on this functionality: https://github.com/openshift/installer/blob/master/pkg/asset/manifests/scheduler.go#L73

Comment 9 Maciej Szulik 2020-05-04 14:12:55 UTC
> Maciej, you've linked to a vsphere template. This is baremetal:
> https://github.com/openshift/machine-config-operator/blob/
> 6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/
> baremetal/units/kubelet.yaml

Good point, thx!

Comment 10 Antonio Murdaca 2020-05-04 18:14:02 UTC
(In reply to Stephen Benjamin from comment #8)
> Antonio, why have you moved this back to be assigned to me?
> Maciej, you've linked to a vsphere template. This is baremetal:
> https://github.com/openshift/machine-config-operator/blob/
> 6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/
> baremetal/units/kubelet.yaml
> We rely on this functionality:
> https://github.com/openshift/installer/blob/master/pkg/asset/manifests/
> scheduler.go#L73

my bad, this should go to vsphere

Comment 13 Joseph Callen 2020-05-20 20:09:40 UTC
Sorry this fell through the cracks. 

Looking at the install-config its for baremetal not sure why this is specified as vSphere.
Just created manifests with the latest 4.5 just to be sure

apiVersion: config.openshift.io/v1
kind: Scheduler
  creationTimestamp: null
  name: cluster
  mastersSchedulable: false
    name: ""
status: {}

and a recent cluster just built

$ oc -o yaml get node jcallen-cfsz2-master-0

  providerID: vsphere://423b8b69-7c78-d68b-7f2d-711fcbb3cfd6
  - effect: NoSchedule
    key: node-role.kubernetes.io/master    

and must-gather

$ cat schedulers.yaml 
apiVersion: config.openshift.io/v1
- apiVersion: config.openshift.io/v1
  kind: Scheduler
    mastersSchedulable: false

Comment 14 Steve Milner 2020-05-20 21:00:56 UTC
Lubov, can you provide some more info here?

Comment 15 Lubov 2020-05-21 06:21:34 UTC
Created attachment 1690512 [details]
master description

Comment 16 Lubov 2020-05-21 06:23:05 UTC
It is baremetal

Attache output of "oc -o yaml get node master-0-0": no taints there

Comment 17 Kirsten Garrison 2020-05-21 20:59:11 UTC
From an MCO perspective the pools are not degraded, configs seem to be applied correctly. If this is indeed a baremetal cluster with nodes not tainted correctly and those taints are set in https://github.com/openshift/installer/blob/master/pkg/asset/manifests/scheduler.go#L73 and not in the MCO templates, can the baremetal installer team look more closely at this? 

Looking at cluster-scoped-resources/config.openshift.io/schedulers.yaml in the must gather I see:

apiVersion: config.openshift.io/v1
- apiVersion: config.openshift.io/v1
  kind: Scheduler
    creationTimestamp: "2020-04-26T10:11:02Z"
    generation: 1
    name: cluster
    resourceVersion: "1121"
    selfLink: /apis/config.openshift.io/v1/schedulers/cluster
    uid: b285072f-de86-4e15-92b2-6ad64b4ae59c
    mastersSchedulable: false

This doesn't seem like a Bug that the MCO team should be owning....

Comment 18 Kirsten Garrison 2020-05-21 21:22:20 UTC
Digging into this a little more, to get background, I found pr: https://github.com/openshift/machine-config-operator/pull/846

I see:
For IPI baremetal, we need to support the platform in MCO. This PR also overrides the kubelet config to remove the NoSchedule taint.

Further down:
Baremetal IPI environment is not installable without removing the NoSchedule taint from the masters.

I don't know whether all of this means the baremetal template needs to be updated to add:  --register-with-taints=node-role.kubernetes.io/master=:NoSchedule as the kubelet.yaml does in base/vsphere/openstack templates or the installer needs to do something else. Reassigning to @Stephen as he's more familiar with these templates & installer functionality and can reassign as appropriate within baremetal team for investigation.

Comment 20 Kirsten Garrison 2020-06-09 18:44:03 UTC
@Stephen Can you PTAL

Comment 21 Steven Hardy 2020-06-10 10:38:12 UTC
Stephen is out for a few days so I took a look, and I think we should be relying on the installer telling the scheduler to make the masters schedulable, only in the case where there aren't any workers defined:


That relies on some MCO changes which landed in https://github.com/openshift/machine-config-operator/pull/937

However I think we missed this PR that removes the customized baremetal kubelet conf https://github.com/openshift/machine-config-operator/pull/993

That got incorrectly closed without merging, and never revisited for review.

So I think we need to revive that PR and it should resolve this issue?

Comment 22 Kirsten Garrison 2020-06-10 17:46:26 UTC
Thanks for picking this up @Steven !

Comment 23 Russell Bryant 2020-06-11 18:12:50 UTC
I've opened a new PR to address this: https://github.com/openshift/machine-config-operator/pull/1817

Comment 28 errata-xmlrpc 2020-10-27 15:58:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.