Bug 1699460 - defaultNodeSelector does not work in crd Scheduler
Summary: defaultNodeSelector does not work in crd Scheduler
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.1.0
Assignee: ravig
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-12 19:04 UTC by Hongkai Liu
Modified: 2019-06-04 10:47 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:47:31 UTC
Target Upstream Version:


Attachments (Terms of Use)
api-server log (446.95 KB, text/plain)
2019-05-28 11:47 UTC, Sunil Choudhary
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:47:40 UTC

Description Hongkai Liu 2019-04-12 19:04:02 UTC
Description of problem:
Following up https://github.com/openshift/installer/issues/1020

Version-Release number of selected component (if applicable):
$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-10-182914   True        False         124m    Cluster version is 4.0.0-0.nightly-2019-04-10-182914

How reproducible:


Steps to Reproduce:

$ oc get scheduler
NAME      AGE
cluster   13m
$ oc get scheduler cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  creationTimestamp: "2019-04-12T18:35:04Z"
  generation: 1
  name: cluster
  resourceVersion: "67588"
  selfLink: /apis/config.openshift.io/v1/schedulers/cluster
  uid: b069e611-5d51-11e9-9957-065019240360
spec:
  defaultNodeSelector: aaa=bbb

$ oc new-project aaa
$ oc create -f https://raw.githubusercontent.com/hongkailiu/svt-case-doc/master/files/pod_test.yaml
pod/web created
$ oc get pod -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
web    1/1     Running   0          9s    10.128.2.58   ip-10-0-142-150.us-east-2.compute.internal   <none>           <none>

$ oc get node --show-labels ip-10-0-142-150.us-east-2.compute.internal
NAME                                         STATUS   ROLES    AGE    VERSION             LABELS
ip-10-0-142-150.us-east-2.compute.internal   Ready    worker   130m   v1.12.4+509916ce1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/hostname=ip-10-0-142-150,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,node.openshift.io/os_version=4.1


Actual results:


Expected results:
There is no node with label `aaa=bbb`.
The test pod should be `Pending` instead of Running.


Additional info:

More questions:
1. why does the `metadata.name` have to be `cluster`?
$ oc create -f abc.yaml 
The Scheduler "" is invalid: metadata.name: Invalid value: "my-scheduler": must be cluster

2. how to delete the crd?
$ oc delete scheduler cluster
Error from server (Forbidden): schedulers.config.openshift.io "cluster" is forbidden: deleting required schedulers.config.openshift.io resource, named cluster, is not allowed

Comment 1 ravig 2019-04-15 14:42:22 UTC
https://github.com/openshift/cluster-kube-apiserver-operator/pull/394

> Error from server (Forbidden): schedulers.config.openshift.io "cluster" is forbidden: deleting required schedulers.config.openshift.io resource, named cluster, is not allowed

It's validation code that prohibits deletion.

> why does the `metadata.name` have to be `cluster`?

Because we want the operator to be singleton meaning we want only scheduler CR `cluster` nothing else.

Comment 2 Seth Jennings 2019-04-26 13:22:11 UTC
Add annotation PRs:
https://github.com/openshift/cluster-authentication-operator/pull/117
https://github.com/operator-framework/operator-marketplace/pull/173
https://github.com/openshift/cluster-kube-apiserver-operator/pull/439
https://github.com/openshift/cluster-image-registry-operator/pull/266
https://github.com/openshift/cluster-version-operator/pull/174
https://github.com/openshift/service-ca-operator/pull/50
https://github.com/operator-framework/operator-lifecycle-manager/pull/824
https://github.com/openshift/machine-config-operator/pull/667
https://github.com/openshift/cluster-network-operator/pull/157
https://github.com/openshift/cluster-ingress-operator/pull/217
https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/93
https://github.com/openshift/console-operator/pull/214
https://github.com/openshift/cluster-dns-operator/pull/100
https://github.com/openshift/machine-api-operator/pull/300
https://github.com/openshift/cluster-kube-scheduler-operator/pull/115
https://github.com/openshift/cluster-samples-operator/pull/139
https://github.com/openshift/cluster-kube-controller-manager-operator/pull/237
https://github.com/openshift/cluster-openshift-apiserver-operator/pull/195
https://github.com/openshift/cloud-credential-operator/pull/61
https://github.com/openshift/cluster-etcd-operator/pull/10
https://github.com/openshift/cluster-storage-operator/pull/30
https://github.com/openshift/cluster-machine-approver/pull/20

The enable defaultNodeSelector:
https://github.com/openshift/cluster-kube-apiserver-operator/pull/394

Comment 4 Simon 2019-04-30 15:14:41 UTC
Retest 4/30

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-04-28-064010   True        False         88m     Cluster version is 4.1.0-0.nightly-2019-04-28-064010


$ oc get schedulers.config.openshift.io cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  creationTimestamp: "2019-04-30T13:09:32Z"
  generation: 2
  name: cluster
  resourceVersion: "39437"
  selfLink: /apis/config.openshift.io/v1/schedulers/cluster
  uid: 31c2c846-6b49-11e9-983d-023db2b3209c
spec:
  defaultNodeSelector: aaa=bbb

$ oc new-project aaa
$ oc create -f https://raw.githubusercontent.com/hongkailiu/svt-case-doc/master/files/pod_test.yaml
$ oc get pods
NAME   READY   STATUS    RESTARTS   AGE
web    1/1     Running   0          35m

$ oc get node $(oc get pods -o wide --no-headers | awk {'print $7'}) --show-labels
NAME                                         STATUS   ROLES    AGE    VERSION             LABELS
ip-10-0-173-225.us-east-2.compute.internal   Ready    worker   116m   v1.13.4+27a00af64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/hostname=ip-10-0-173-225,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,node.openshift.io/os_version=4.1

No node with aaa=bbb label
Pod status: Running

Retest FAILED!

Comment 5 Sunil Choudhary 2019-04-30 15:18:48 UTC
Simon, I made the same mistake few minutes before :)

Edit scheduler cluster and change "defaultNodeSelector: aaa=bbb" to "DefaultNodeSelector: aaa=bbb". D of default in caps.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-04-28-064010   True        False         11h     Cluster version is 4.1.0-0.nightly-2019-04-28-064010


$ oc get scheduler cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  creationTimestamp: "2019-04-30T03:49:30Z"
  generation: 3
  name: cluster
  resourceVersion: "223489"
  selfLink: /apis/config.openshift.io/v1/schedulers/cluster
  uid: f57d14ab-6afa-11e9-b805-0673e8707602
spec:
  DefaultNodeSelector: aaa=bbb


$ oc describe pod web
Name:               web
Namespace:          sunilc
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=web
Annotations:        openshift.io/scc: anyuid
Status:             Pending
IP:                 
Containers:
  test-go:
    Image:      quay.io/hongkailiu/test-go:testctl-0.0.6-83ce61e2
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      /testctl
    Args:
      http
      start
      -v
    Environment:
      GIN_MODE:  release
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mw988 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-mw988:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mw988
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  aaa=bbb
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  5s (x5 over 93s)  default-scheduler  0/5 nodes are available: 5 node(s) didn't match node selector.

Comment 6 Simon 2019-04-30 15:41:28 UTC
Retest 2

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-04-28-064010   True        False         130m    Cluster version is 4.1.0-0.nightly-2019-04-28-064010

using correct key name (capital first letter):
DefaultNodeSelector: aaa=bbb

$ oc get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
web    0/1     Pending   0          2m29s   <none>   <none>   <none>           <none>

$ oc describe pod web
Name:               web
Namespace:          aaa
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=web
Annotations:        openshift.io/scc: anyuid
Status:             Pending
IP:                 
Containers:
  test-go:
    Image:      quay.io/hongkailiu/test-go:testctl-0.0.6-83ce61e2
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      /testctl
    Args:
      http
      start
      -v
    Environment:
      GIN_MODE:  release
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-rwtgm (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-rwtgm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-rwtgm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  aaa=bbb
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  22s (x8 over 2m49s)  default-scheduler  0/6 nodes are available: 6 node(s) didn't match node selector.

Retest PASS!

Comment 7 Hongkai Liu 2019-04-30 15:43:44 UTC
This looks odd then:
https://github.com/openshift/api/blob/master/config/v1/types_scheduling.go#L50

DefaultNodeSelector string `json:"defaultNodeSelector,omitempty"`

Probably I am looking at a wrong src file.

Comment 9 ravig 2019-05-28 11:26:40 UTC
Shouldn't this be `defaultNodeSelector`? Can you provide me with the access to the cluster? I specifically need kube-apiserver logs.

Comment 11 Sunil Choudhary 2019-05-28 11:47:12 UTC
Created attachment 1574322 [details]
api-server log

Comment 12 Seth Jennings 2019-05-28 14:10:36 UTC
Sunil, be advised, after making the change to the defaultNodeSelector in the cluster Scheduler config resource, the kube-apiserver pods must redeploy.  This can take several minutes.  Until the kube-apiserver pods redeploy, the defaultNodeSelector will not have effect.

Comment 13 Sunil Choudhary 2019-05-28 14:46:10 UTC
Thanks ravig & Seth for clarification. After applying the change, I was monitoring kube-scheduler pods for restart which is not the case as ravig advised.
After waiting for several minutes, I do see all 3 kube-apiserver pods restarted after which new pods have defaultNodeSelector label.

Also I see `DefaultNodeSelector` does not work which I assumed earlier, only `defaultNodeSelector` works.

Comment 15 errata-xmlrpc 2019-06-04 10:47:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.