Bug 1535673
| Summary: | Need to mark masters schedulable | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Scott Dodson <sdodson> |
| Component: | Installer | Assignee: | Vadim Rutkovsky <vrutkovs> |
| Status: | CLOSED ERRATA | QA Contact: | Weihua Meng <wmeng> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.9.0 | CC: | aos-bugs, jiajliu, jokerman, mifiedle, mmccomas, spadgett, vrutkovs, wmeng, xxia |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: |
Feature:
master nodes are now schedulable
Reason:
web console pods are now restricted to be running on masters only
Result: master nodes are no longer marked as non-schedulable
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-12-13 19:26:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Scott Dodson
2018-01-17 21:18:51 UTC
This needs to be done on upgrade from 3.7 to 3.8 as well as clean 3.9 installs. This should be a blocker for 3.9. It's been requested that we transition to labeling nodes in a defined manner like this
node-role.kubernetes.io/{master,node,infra}=true
A node could be all three if it were an all in one installation.
See https://trello.com/c/7m7A7Vpu/579-5-standardize-on-rolenodekubernetesio-masternodeinfratrue
For 3.9 lets just make sure that all masters are labeled
node-role.kubernetes.io/master=true
PR to mark master nodes: https://github.com/openshift/openshift-ansible/pull/6849 Just labeling the master and setting a nodeSelector for the console namespace is not going to keep other pods off of the master. User pods can land on the labelled master if their nodeSelect from project/deployment/pod/etc spec is "" (or if the cluster defaultNodeSelector is "" or not specified). An OOTB configuration which allows pods on the master is probably not desirable. (In reply to Mike Fiedler from comment #4) > Just labeling the master and setting a nodeSelector for the console > namespace is not going to keep other pods off of the master. Agree, probably https://docs.openshift.com/container-platform/3.7/admin_guide/scheduling/scheduler.html#constraining-pod-placement-nodeselector would be helpful here. The other option is tainting the node - https://docs.openshift.com/container-platform/3.7/admin_guide/scheduling/taints_tolerations.html#admin-guide-taints - but it seems it could be overcome as well Created https://github.com/openshift/openshift-ansible/pull/6932 to taint masters (unless there are no dedicated nodes) *** Bug 1540038 has been marked as a duplicate of this bug. *** Changed scope of this bug to say this is only about making masters schedulable https://github.com/openshift/openshift-ansible/pull/6949 Fix is available in openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7 Verified this bug with openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch, and PASS. Now master nodes are scheduled. # oc get nodes NAME STATUS ROLES AGE VERSION 192.168.100.10 Ready <none> 1d v1.9.1+a0ce1bc657 192.168.100.15 Ready master 1d v1.9.1+a0ce1bc657 192.168.100.17 Ready master 1d v1.9.1+a0ce1bc657 192.168.100.6 Ready master 1d v1.9.1+a0ce1bc657 192.168.100.8 Ready <none> 1d v1.9.1+a0ce1bc657 About "taint" change, would introduce some other issues, will track it in https://bugzilla.redhat.com/show_bug.cgi?id=1539691 (In reply to Johnny Liu from comment #10) > Verified this bug with > openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch, and PASS. > > Now master nodes are scheduled. > > # oc get nodes > NAME STATUS ROLES AGE VERSION > 192.168.100.10 Ready <none> 1d v1.9.1+a0ce1bc657 > 192.168.100.15 Ready master 1d v1.9.1+a0ce1bc657 > 192.168.100.17 Ready master 1d v1.9.1+a0ce1bc657 > 192.168.100.6 Ready master 1d v1.9.1+a0ce1bc657 > 192.168.100.8 Ready <none> 1d v1.9.1+a0ce1bc657 > > > About "taint" change, would introduce some other issues, will track it in > https://bugzilla.redhat.com/show_bug.cgi?id=1539691 Forget to say, this is for fresh install. Upgrade to OCP v3.9 has already mark all master hosts as schedulable. Version: openshift-ansible-3.9.0-0.38.0.git.0.57e1184.el7.noarch # oc version oc v3.9.0-0.38.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-jliu-rpm-master-etcd-1:8443 openshift v3.9.0-0.38.0 kubernetes v1.9.1+a0ce1bc657 # oc get node --show-labels NAME STATUS ROLES AGE VERSION LABELS qe-jliu-rpm-master-etcd-1 Ready <none> 4h v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-jliu-rpm-master-etcd-1,openshift-infra=apiserver,role=node qe-jliu-rpm-node-registry-router-1 Ready <none> 4h v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-jliu-rpm-node-registry-router-1,registry=enabled,role=node,router=enabled After upgrade, master was schedulable, but master label was not added, so assign back the bug. Created https://github.com/openshift/openshift-ansible/pull/7020 to fix it Fix available in openshift-ansible-3.9.0-0.39.0.git.0.fea6997.el7 fixed. openshift-ansible-3.9.0-0.41.0.git.0.8290c01.el7.noarch after upgrade to OCP v3.9 master is the same as fresh install -- schedulable and with right label node-role.kubernetes.io/master=true # oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS 172.16.120.124 Ready <none> 7h v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=regionOne,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/hostname=172.16.120.124,registry=enabled,role=node,router=enabled 172.16.120.82 Ready master 7h v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=regionOne,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/hostname=172.16.120.82,node-role.kubernetes.io/master=true,role=node Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748 |