Bug 1535673

Summary:	Need to mark masters schedulable
Product:	OpenShift Container Platform	Reporter:	Scott Dodson <sdodson>
Component:	Installer	Assignee:	Vadim Rutkovsky <vrutkovs>
Status:	CLOSED ERRATA	QA Contact:	Weihua Meng <wmeng>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	aos-bugs, jiajliu, jokerman, mifiedle, mmccomas, spadgett, vrutkovs, wmeng, xxia
Target Milestone:	---
Target Release:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	Feature: master nodes are now schedulable Reason: web console pods are now restricted to be running on masters only Result: master nodes are no longer marked as non-schedulable	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-12-13 19:26:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Scott Dodson 2018-01-17 21:18:51 UTC

In order to land the console pods on masters we intend to make the masters schedulable and add a taint to prevent normal pods from running on them.

Comment 1 Scott Dodson 2018-01-17 21:20:40 UTC

This needs to be done on upgrade from 3.7 to 3.8 as well as clean 3.9 installs. This should be a blocker for 3.9.

Comment 2 Scott Dodson 2018-01-24 13:26:07 UTC

It's been requested that we transition to labeling nodes in a defined manner like this

node-role.kubernetes.io/{master,node,infra}=true

A node could be all three if it were an all in one installation.

See https://trello.com/c/7m7A7Vpu/579-5-standardize-on-rolenodekubernetesio-masternodeinfratrue

For 3.9 lets just make sure that all masters are labeled

node-role.kubernetes.io/master=true

Comment 3 Vadim Rutkovsky 2018-01-24 16:28:27 UTC

PR to mark master nodes: https://github.com/openshift/openshift-ansible/pull/6849

Comment 4 Mike Fiedler 2018-01-24 17:57:07 UTC

Just labeling the master and setting a nodeSelector for the console namespace is not going to keep other pods off of the master.   

User pods can land on the labelled master if their nodeSelect from project/deployment/pod/etc spec is "" (or if the cluster defaultNodeSelector is "" or not specified).  

An OOTB configuration which allows pods on the master is probably not desirable.

Comment 5 Vadim Rutkovsky 2018-01-24 18:27:44 UTC

(In reply to Mike Fiedler from comment #4)
> Just labeling the master and setting a nodeSelector for the console
> namespace is not going to keep other pods off of the master.   

Agree, probably https://docs.openshift.com/container-platform/3.7/admin_guide/scheduling/scheduler.html#constraining-pod-placement-nodeselector would be helpful here.

The other option is tainting the node - https://docs.openshift.com/container-platform/3.7/admin_guide/scheduling/taints_tolerations.html#admin-guide-taints - but it seems it could be overcome as well

Comment 6 Vadim Rutkovsky 2018-01-30 10:34:00 UTC

Created https://github.com/openshift/openshift-ansible/pull/6932 to taint masters (unless there are no dedicated nodes)

Comment 7 Scott Dodson 2018-01-30 14:27:45 UTC

*** Bug 1540038 has been marked as a duplicate of this bug. ***

Comment 8 Scott Dodson 2018-02-01 15:13:48 UTC

Changed scope of this bug to say this is only about making masters schedulable

https://github.com/openshift/openshift-ansible/pull/6949

Comment 9 Vadim Rutkovsky 2018-02-02 17:51:52 UTC

Fix is available in openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7

Comment 10 Johnny Liu 2018-02-05 09:26:17 UTC

Verified this bug with openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch, and PASS.

Now master nodes are scheduled.

# oc get nodes
NAME             STATUS    ROLES     AGE       VERSION
192.168.100.10   Ready     <none>    1d        v1.9.1+a0ce1bc657
192.168.100.15   Ready     master    1d        v1.9.1+a0ce1bc657
192.168.100.17   Ready     master    1d        v1.9.1+a0ce1bc657
192.168.100.6    Ready     master    1d        v1.9.1+a0ce1bc657
192.168.100.8    Ready     <none>    1d        v1.9.1+a0ce1bc657


About "taint" change, would introduce some other issues, will track it in https://bugzilla.redhat.com/show_bug.cgi?id=1539691

Comment 11 Johnny Liu 2018-02-05 09:30:56 UTC

(In reply to Johnny Liu from comment #10)
> Verified this bug with
> openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch, and PASS.
> 
> Now master nodes are scheduled.
> 
> # oc get nodes
> NAME             STATUS    ROLES     AGE       VERSION
> 192.168.100.10   Ready     <none>    1d        v1.9.1+a0ce1bc657
> 192.168.100.15   Ready     master    1d        v1.9.1+a0ce1bc657
> 192.168.100.17   Ready     master    1d        v1.9.1+a0ce1bc657
> 192.168.100.6    Ready     master    1d        v1.9.1+a0ce1bc657
> 192.168.100.8    Ready     <none>    1d        v1.9.1+a0ce1bc657
> 
> 
> About "taint" change, would introduce some other issues, will track it in
> https://bugzilla.redhat.com/show_bug.cgi?id=1539691

Forget to say, this is for fresh install.

Comment 12 Weihua Meng 2018-02-05 09:35:10 UTC

Upgrade to OCP v3.9 has already mark all master hosts as schedulable.

Comment 13 liujia 2018-02-06 06:37:32 UTC

Version:
openshift-ansible-3.9.0-0.38.0.git.0.57e1184.el7.noarch

# oc version
oc v3.9.0-0.38.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-jliu-rpm-master-etcd-1:8443
openshift v3.9.0-0.38.0
kubernetes v1.9.1+a0ce1bc657


# oc get node --show-labels
NAME                                 STATUS    ROLES     AGE       VERSION             LABELS
qe-jliu-rpm-master-etcd-1            Ready     <none>    4h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-jliu-rpm-master-etcd-1,openshift-infra=apiserver,role=node
qe-jliu-rpm-node-registry-router-1   Ready     <none>    4h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-jliu-rpm-node-registry-router-1,registry=enabled,role=node,router=enabled


After upgrade, master was schedulable, but master label was not added, so assign back the bug.

Comment 14 Vadim Rutkovsky 2018-02-06 13:14:26 UTC

Created https://github.com/openshift/openshift-ansible/pull/7020 to fix it

Comment 15 Vadim Rutkovsky 2018-02-07 08:25:14 UTC

Fix available in openshift-ansible-3.9.0-0.39.0.git.0.fea6997.el7

Comment 16 Weihua Meng 2018-02-08 14:20:46 UTC

fixed.
openshift-ansible-3.9.0-0.41.0.git.0.8290c01.el7.noarch

after upgrade to OCP v3.9

master is the same as fresh install -- schedulable and with right label node-role.kubernetes.io/master=true

# oc get nodes --show-labels
NAME             STATUS    ROLES     AGE       VERSION             LABELS
172.16.120.124   Ready     <none>    7h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=regionOne,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/hostname=172.16.120.124,registry=enabled,role=node,router=enabled
172.16.120.82    Ready     master    7h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=regionOne,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/hostname=172.16.120.82,node-role.kubernetes.io/master=true,role=node

Comment 19 errata-xmlrpc 2018-12-13 19:26:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748