Bug 1535673 - Need to mark masters schedulable
Summary: Need to mark masters schedulable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.9.0
Assignee: Vadim Rutkovsky
QA Contact: Weihua Meng
URL:
Whiteboard:
: 1540038 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-17 21:18 UTC by Scott Dodson
Modified: 2018-12-13 19:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: master nodes are now schedulable Reason: web console pods are now restricted to be running on masters only Result: master nodes are no longer marked as non-schedulable
Clone Of:
Environment:
Last Closed: 2018-12-13 19:26:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1525342 0 unspecified CLOSED oc get node show "<none>" role 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2018:3748 0 None None None 2018-12-13 19:26:58 UTC

Internal Links: 1525342

Description Scott Dodson 2018-01-17 21:18:51 UTC
In order to land the console pods on masters we intend to make the masters schedulable and add a taint to prevent normal pods from running on them.

Comment 1 Scott Dodson 2018-01-17 21:20:40 UTC
This needs to be done on upgrade from 3.7 to 3.8 as well as clean 3.9 installs. This should be a blocker for 3.9.

Comment 2 Scott Dodson 2018-01-24 13:26:07 UTC
It's been requested that we transition to labeling nodes in a defined manner like this

node-role.kubernetes.io/{master,node,infra}=true

A node could be all three if it were an all in one installation.

See https://trello.com/c/7m7A7Vpu/579-5-standardize-on-rolenodekubernetesio-masternodeinfratrue

For 3.9 lets just make sure that all masters are labeled

node-role.kubernetes.io/master=true

Comment 3 Vadim Rutkovsky 2018-01-24 16:28:27 UTC
PR to mark master nodes: https://github.com/openshift/openshift-ansible/pull/6849

Comment 4 Mike Fiedler 2018-01-24 17:57:07 UTC
Just labeling the master and setting a nodeSelector for the console namespace is not going to keep other pods off of the master.   

User pods can land on the labelled master if their nodeSelect from project/deployment/pod/etc spec is "" (or if the cluster defaultNodeSelector is "" or not specified).  

An OOTB configuration which allows pods on the master is probably not desirable.

Comment 5 Vadim Rutkovsky 2018-01-24 18:27:44 UTC
(In reply to Mike Fiedler from comment #4)
> Just labeling the master and setting a nodeSelector for the console
> namespace is not going to keep other pods off of the master.   

Agree, probably https://docs.openshift.com/container-platform/3.7/admin_guide/scheduling/scheduler.html#constraining-pod-placement-nodeselector would be helpful here.

The other option is tainting the node - https://docs.openshift.com/container-platform/3.7/admin_guide/scheduling/taints_tolerations.html#admin-guide-taints - but it seems it could be overcome as well

Comment 6 Vadim Rutkovsky 2018-01-30 10:34:00 UTC
Created https://github.com/openshift/openshift-ansible/pull/6932 to taint masters (unless there are no dedicated nodes)

Comment 7 Scott Dodson 2018-01-30 14:27:45 UTC
*** Bug 1540038 has been marked as a duplicate of this bug. ***

Comment 8 Scott Dodson 2018-02-01 15:13:48 UTC
Changed scope of this bug to say this is only about making masters schedulable

https://github.com/openshift/openshift-ansible/pull/6949

Comment 9 Vadim Rutkovsky 2018-02-02 17:51:52 UTC
Fix is available in openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7

Comment 10 Johnny Liu 2018-02-05 09:26:17 UTC
Verified this bug with openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch, and PASS.

Now master nodes are scheduled.

# oc get nodes
NAME             STATUS    ROLES     AGE       VERSION
192.168.100.10   Ready     <none>    1d        v1.9.1+a0ce1bc657
192.168.100.15   Ready     master    1d        v1.9.1+a0ce1bc657
192.168.100.17   Ready     master    1d        v1.9.1+a0ce1bc657
192.168.100.6    Ready     master    1d        v1.9.1+a0ce1bc657
192.168.100.8    Ready     <none>    1d        v1.9.1+a0ce1bc657


About "taint" change, would introduce some other issues, will track it in https://bugzilla.redhat.com/show_bug.cgi?id=1539691

Comment 11 Johnny Liu 2018-02-05 09:30:56 UTC
(In reply to Johnny Liu from comment #10)
> Verified this bug with
> openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch, and PASS.
> 
> Now master nodes are scheduled.
> 
> # oc get nodes
> NAME             STATUS    ROLES     AGE       VERSION
> 192.168.100.10   Ready     <none>    1d        v1.9.1+a0ce1bc657
> 192.168.100.15   Ready     master    1d        v1.9.1+a0ce1bc657
> 192.168.100.17   Ready     master    1d        v1.9.1+a0ce1bc657
> 192.168.100.6    Ready     master    1d        v1.9.1+a0ce1bc657
> 192.168.100.8    Ready     <none>    1d        v1.9.1+a0ce1bc657
> 
> 
> About "taint" change, would introduce some other issues, will track it in
> https://bugzilla.redhat.com/show_bug.cgi?id=1539691

Forget to say, this is for fresh install.

Comment 12 Weihua Meng 2018-02-05 09:35:10 UTC
Upgrade to OCP v3.9 has already mark all master hosts as schedulable.

Comment 13 liujia 2018-02-06 06:37:32 UTC
Version:
openshift-ansible-3.9.0-0.38.0.git.0.57e1184.el7.noarch

# oc version
oc v3.9.0-0.38.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-jliu-rpm-master-etcd-1:8443
openshift v3.9.0-0.38.0
kubernetes v1.9.1+a0ce1bc657


# oc get node --show-labels
NAME                                 STATUS    ROLES     AGE       VERSION             LABELS
qe-jliu-rpm-master-etcd-1            Ready     <none>    4h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-jliu-rpm-master-etcd-1,openshift-infra=apiserver,role=node
qe-jliu-rpm-node-registry-router-1   Ready     <none>    4h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-jliu-rpm-node-registry-router-1,registry=enabled,role=node,router=enabled


After upgrade, master was schedulable, but master label was not added, so assign back the bug.

Comment 14 Vadim Rutkovsky 2018-02-06 13:14:26 UTC
Created https://github.com/openshift/openshift-ansible/pull/7020 to fix it

Comment 15 Vadim Rutkovsky 2018-02-07 08:25:14 UTC
Fix available in openshift-ansible-3.9.0-0.39.0.git.0.fea6997.el7

Comment 16 Weihua Meng 2018-02-08 14:20:46 UTC
fixed.
openshift-ansible-3.9.0-0.41.0.git.0.8290c01.el7.noarch

after upgrade to OCP v3.9

master is the same as fresh install -- schedulable and with right label node-role.kubernetes.io/master=true

# oc get nodes --show-labels
NAME             STATUS    ROLES     AGE       VERSION             LABELS
172.16.120.124   Ready     <none>    7h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=regionOne,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/hostname=172.16.120.124,registry=enabled,role=node,router=enabled
172.16.120.82    Ready     master    7h        v1.9.1+a0ce1bc657   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=regionOne,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/hostname=172.16.120.82,node-role.kubernetes.io/master=true,role=node

Comment 19 errata-xmlrpc 2018-12-13 19:26:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748


Note You need to log in before you can comment on or make changes to this bug.