Bug 1554828
| Summary: | During upgrade some masters are labeled node-role.kubernetes.io/compute=true | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Scott Dodson <sdodson> |
| Component: | Cluster Version Operator | Assignee: | Fabian von Feilitzsch <fabian> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | liujia <jiajliu> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.9.0 | CC: | aos-bugs, dma, fabian, jokerman, mgugino, mmccomas, sdodson, wmeng, xtian |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | 1548641 | Environment: | |
| Last Closed: | 2018-03-16 16:55:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Scott Dodson
2018-03-13 12:48:19 UTC
This looks like it would be the case if the masters are not labeled with node-role.kubernetes.io/master: true. We added this label in 3.9, so we'll need to ensure it takes place against all masters. https://github.com/openshift/openshift-ansible/pull/7512 is the proposed fix for this Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/72cc39c57c56d707d3d0edee582e86bfd85aeed6 Bug 1554828- Nodes are now labeled compute after other labels have been applied https://github.com/openshift/openshift-ansible/commit/f67de0196b7f1b5798cc651f3825f360b895623b Merge pull request #7528 from openshift-cherrypick-robot/cherry-pick-7512-to-master [master] Bug 1554828- Nodes are now labeled compute after all other labels have been applied Verify: ansible-2.4.3.0-1.el7ae.noarch openshift-ansible-3.9.9-1.git.0.1a1f7d8.el7.noarch Steps: 1. HA install ocp v3.7 with osm_default_node_selector: 'region=primary' setting in inventory file. # oc get node NAME STATUS AGE VERSION qe-jliu-ha1-master-etcd-1 Ready,SchedulingDisabled 19m v1.7.6+a08f5eeb62 qe-jliu-ha1-master-etcd-2 Ready,SchedulingDisabled 19m v1.7.6+a08f5eeb62 qe-jliu-ha1-master-etcd-3 Ready,SchedulingDisabled 19m v1.7.6+a08f5eeb62 qe-jliu-ha1-node-primary-1 Ready 19m v1.7.6+a08f5eeb62 qe-jliu-ha1-node-primary-2 Ready 19m v1.7.6+a08f5eeb62 qe-jliu-ha1-nrri-1 Ready 19m v1.7.6+a08f5eeb62 qe-jliu-ha1-nrri-2 Ready 19m v1.7.6+a08f5eeb62 # oc get pod -o wide --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default docker-registry-1-7vd6p 1/1 Running 0 13m 10.2.6.3 qe-jliu-ha1-nrri-2 default docker-registry-1-zvr6l 1/1 Running 0 13m 10.2.8.3 qe-jliu-ha1-nrri-1 default registry-console-1-b97qw 1/1 Running 0 12m 10.2.12.2 qe-jliu-ha1-node-primary-1 default router-1-kg964 1/1 Running 0 15m 10.240.0.134 qe-jliu-ha1-nrri-1 default router-1-zswpq 1/1 Running 0 15m 10.240.0.135 qe-jliu-ha1-nrri-2 install-test mongodb-1-jswhh 1/1 Running 0 10m 10.2.12.5 qe-jliu-ha1-node-primary-1 install-test nodejs-mongodb-example-1-build 0/1 Completed 0 10m 10.2.12.4 qe-jliu-ha1-node-primary-1 install-test nodejs-mongodb-example-1-z8jmr 1/1 Running 0 9m 10.2.12.6 qe-jliu-ha1-node-primary-1 kube-service-catalog apiserver-92tm4 1/1 Running 0 12m 10.2.2.2 qe-jliu-ha1-master-etcd-1 kube-service-catalog controller-manager-n5mmn 1/1 Running 0 12m 10.2.2.3 qe-jliu-ha1-master-etcd-1 openshift-ansible-service-broker asb-1-deploy 0/1 Error 0 11m 10.2.10.3 qe-jliu-ha1-node-primary-2 openshift-ansible-service-broker asb-etcd-1-deploy 0/1 Error 0 11m 10.2.12.3 qe-jliu-ha1-node-primary-1 openshift-template-service-broker apiserver-85dz2 1/1 Running 0 11m 10.2.6.4 qe-jliu-ha1-nrri-2 openshift-template-service-broker apiserver-thgb2 1/1 Running 0 11m 10.2.8.4 qe-jliu-ha1-nrri-1 # cat /etc/origin/master/master-config.yaml|grep "defaultNodeSelector" defaultNodeSelector: region=primary 2. Trigger upgrade with above inventory file in step1 Result: Master was scheduled and web console was running on master node. Default node selector was the same with it was before upgrade. Old and new app were scheduled according to defaule node selcetor config. Masters and infra nodes were not added compute label. # oc get node NAME STATUS ROLES AGE VERSION qe-jliu-ha1-master-etcd-1 Ready master 1h v1.9.1+a0ce1bc657 qe-jliu-ha1-master-etcd-2 Ready master 1h v1.9.1+a0ce1bc657 qe-jliu-ha1-master-etcd-3 Ready master 1h v1.9.1+a0ce1bc657 qe-jliu-ha1-node-primary-1 Ready compute 1h v1.9.1+a0ce1bc657 qe-jliu-ha1-node-primary-2 Ready compute 1h v1.9.1+a0ce1bc657 qe-jliu-ha1-nrri-1 Ready <none> 1h v1.9.1+a0ce1bc657 qe-jliu-ha1-nrri-2 Ready <none> 1h v1.9.1+a0ce1bc657 # cat /etc/origin/master/master-config.yaml|grep "defaultNodeSelector" defaultNodeSelector: region=primary @Fabian von Feilitzsch @Scott Only one result needed to be confirmed that non-infra node was added to be compute label. Though added label compute to non-infra node did not cause any issue in my verify, but according to comment 2 and comment 8, no compute labels should be added to any nodes when default node selector defined. So please confirm about this point. I believe we discussed all non-master, non-infra should have the compute label applied in addition to whatever the user specifies. The reason being: We don't want to set a default selector on a cluster that didn't used to have one, and the new selector matches a label from an existing node. We can't know what label they were previously using should be used as a new default label. We needed to come up with a way to keep things consistent in the future. Labeling all non-masters, non-infra as compute and setting the default selector to compute if none is provided allows us to keep the users scheduling unmodified. ^ This was also my impression, though it was an evolution of the original discussion. @Michael Gugino @Fabian von Feilitzsch Got it now, thx for the confirm. Then change the bug to verify. |