Bug 1828250

Summary: [baremetal] Master nodes should be tagged as NoSchedule
Product: OpenShift Container Platform Reporter: Lubov <lshilin>
Component: Machine Config OperatorAssignee: Stephen Benjamin <stbenjam>
Status: CLOSED ERRATA QA Contact: Lubov <lshilin>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: amurdaca, aos-bugs, dhellmann, dmaizel, eminguez, fdeutsch, jcallen, kgarriso, lshilin, maszulik, mfojtik, nstielau, prabinov, rbartal, rbryant, scuppett, shardy, smilner, yprokule
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the control plane would always be schedulable for user workloads on the baremetal platform. Now the control plane nodes are correctly configured as NoSchedule in a typical deployment with workers.
Story Points: ---
Clone Of:
: 1846503 (view as bug list) Environment:
Last Closed: 2020-10-27 15:58:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1846503    
Attachments:
Description Flags
master description none

Description Lubov 2020-04-27 12:44:04 UTC
Description of problem:
After automated cluster provisioning and deployment:
scaling up an application pod resulted by pods running both on workers and masters

After investigation it cleared that masters are not tagged properly (see #1827996)


How reproducible:
constantly

Steps to Reproduce:
1. Provision and deploy a cluster using automation
2. Create an application pod:
# oc create new-project httpd-proj
# oc create new-app httpd
# oc expose dc/httpd
3. Add pod's replicas
oc scale dc httpd --replicas=3

Actual results:
Application pods are running both on workers and masters

Expected results:
Application pods should run on workers only

Additional info:

Comment 1 Stephen Benjamin 2020-04-27 13:12:16 UTC
Can you include what your install-config.yaml looks like (minus secrets)? This is likely intentional, if you do not specify > 0 compute replicas, the masters must be made schedulable. Generally we recommend you deploy with a minimum of 2 compute replicas, and 3 control plane replicas.

Comment 3 Stephen Benjamin 2020-04-27 14:09:20 UTC
Could you provide a must-gather (oc adm must-gather) from this cluster? Given your compute replicas, the masters should not be scheduable.

Comment 4 Stephen Cuppett 2020-04-27 14:20:04 UTC
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 6 Stephen Benjamin 2020-04-27 16:09:24 UTC
From your must-gather:

```
./cluster-scoped-resources/config.openshift.io/schedulers.yaml:    mastersSchedulable: false
```

Masters are not schedulable. I'm not sure why you're httpd app would end up on the masters in that case, moving this against the kube-scheduler folks for their feedback.

Comment 7 Maciej Szulik 2020-04-28 08:44:50 UTC
From what I see the spec of each and every node is empty:

spec: {}

kube-scheduler-operator is not in the bussiness of setting those tolerations, and kube-scheduler will react accordingly
to what is being set. I don't see a problem here from this perspective. 

I took a quick pick into https://github.com/openshift/machine-config-operator/blob/6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/vsphere/units/kubelet.yaml
which should register a node with appropriate taints, but I'll let them check this one out.

Comment 8 Stephen Benjamin 2020-05-04 12:35:18 UTC
Antonio, why have you moved this back to be assigned to me?

Maciej, you've linked to a vsphere template. This is baremetal: https://github.com/openshift/machine-config-operator/blob/6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/baremetal/units/kubelet.yaml

We rely on this functionality: https://github.com/openshift/installer/blob/master/pkg/asset/manifests/scheduler.go#L73

Comment 9 Maciej Szulik 2020-05-04 14:12:55 UTC
> Maciej, you've linked to a vsphere template. This is baremetal:
> https://github.com/openshift/machine-config-operator/blob/
> 6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/
> baremetal/units/kubelet.yaml

Good point, thx!

Comment 10 Antonio Murdaca 2020-05-04 18:14:02 UTC
(In reply to Stephen Benjamin from comment #8)
> Antonio, why have you moved this back to be assigned to me?
> 
> Maciej, you've linked to a vsphere template. This is baremetal:
> https://github.com/openshift/machine-config-operator/blob/
> 6c690dafbbea5ab76c2e197239e1b70e386e753b/templates/master/01-master-kubelet/
> baremetal/units/kubelet.yaml
> 
> We rely on this functionality:
> https://github.com/openshift/installer/blob/master/pkg/asset/manifests/
> scheduler.go#L73

my bad, this should go to vsphere

Comment 13 Joseph Callen 2020-05-20 20:09:40 UTC
Sorry this fell through the cracks. 

Looking at the install-config its for baremetal not sure why this is specified as vSphere.
Just created manifests with the latest 4.5 just to be sure

apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  creationTimestamp: null
  name: cluster
spec:
  mastersSchedulable: false
  policy:
    name: ""
status: {}

and a recent cluster just built

$ oc -o yaml get node jcallen-cfsz2-master-0

spec:                   
  providerID: vsphere://423b8b69-7c78-d68b-7f2d-711fcbb3cfd6
  taints:            
  - effect: NoSchedule
    key: node-role.kubernetes.io/master    


and must-gather
cluster-scoped-resources/config.openshift.io/schedulers.yaml

$ cat schedulers.yaml 
---
apiVersion: config.openshift.io/v1
items:
- apiVersion: config.openshift.io/v1
  kind: Scheduler
...
  spec:
    mastersSchedulable: false

Comment 14 Steve Milner 2020-05-20 21:00:56 UTC
Lubov, can you provide some more info here?

Comment 15 Lubov 2020-05-21 06:21:34 UTC
Created attachment 1690512 [details]
master description

Comment 16 Lubov 2020-05-21 06:23:05 UTC
It is baremetal

Attache output of "oc -o yaml get node master-0-0": no taints there

Comment 17 Kirsten Garrison 2020-05-21 20:59:11 UTC
From an MCO perspective the pools are not degraded, configs seem to be applied correctly. If this is indeed a baremetal cluster with nodes not tainted correctly and those taints are set in https://github.com/openshift/installer/blob/master/pkg/asset/manifests/scheduler.go#L73 and not in the MCO templates, can the baremetal installer team look more closely at this? 

Looking at cluster-scoped-resources/config.openshift.io/schedulers.yaml in the must gather I see:

apiVersion: config.openshift.io/v1
items:
- apiVersion: config.openshift.io/v1
  kind: Scheduler
  metadata:
    creationTimestamp: "2020-04-26T10:11:02Z"
    generation: 1
    name: cluster
    resourceVersion: "1121"
    selfLink: /apis/config.openshift.io/v1/schedulers/cluster
    uid: b285072f-de86-4e15-92b2-6ad64b4ae59c
  spec:
    mastersSchedulable: false

This doesn't seem like a Bug that the MCO team should be owning....

Comment 18 Kirsten Garrison 2020-05-21 21:22:20 UTC
Digging into this a little more, to get background, I found pr: https://github.com/openshift/machine-config-operator/pull/846

I see:
```
For IPI baremetal, we need to support the platform in MCO. This PR also overrides the kubelet config to remove the NoSchedule taint.
```


Further down:
```
Baremetal IPI environment is not installable without removing the NoSchedule taint from the masters.
```


I don't know whether all of this means the baremetal template needs to be updated to add:  --register-with-taints=node-role.kubernetes.io/master=:NoSchedule as the kubelet.yaml does in base/vsphere/openstack templates or the installer needs to do something else. Reassigning to @Stephen as he's more familiar with these templates & installer functionality and can reassign as appropriate within baremetal team for investigation.

Comment 20 Kirsten Garrison 2020-06-09 18:44:03 UTC
@Stephen Can you PTAL

Comment 21 Steven Hardy 2020-06-10 10:38:12 UTC
Stephen is out for a few days so I took a look, and I think we should be relying on the installer telling the scheduler to make the masters schedulable, only in the case where there aren't any workers defined:

https://github.com/openshift/installer/pull/2004

That relies on some MCO changes which landed in https://github.com/openshift/machine-config-operator/pull/937

However I think we missed this PR that removes the customized baremetal kubelet conf https://github.com/openshift/machine-config-operator/pull/993

That got incorrectly closed without merging, and never revisited for review.

So I think we need to revive that PR and it should resolve this issue?

Comment 22 Kirsten Garrison 2020-06-10 17:46:26 UTC
Thanks for picking this up @Steven !

Comment 23 Russell Bryant 2020-06-11 18:12:50 UTC
I've opened a new PR to address this: https://github.com/openshift/machine-config-operator/pull/1817

Comment 28 errata-xmlrpc 2020-10-27 15:58:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196