This bug was initially created as a copy of Bug #1788112 I am copying this bug because: This bug was initially created as a copy of Bug #1782151 I am copying this bug because: Backport to 4.3 Description of problem: Insights Operator pod cannot be scheduled on clusters with a default node selector set in the cluster scheduler. I note that other system namespaces are annotated to ignore global default scheduler settings: metadata: annotations: openshift.io/node-selector: "" However openshift-insights does not have this annotation. So if the cluster scheduler sets a default selector of node-role.kubernetes.io/worker, this will be combined with the ResultSet's spec template. Since the latter specifies node-role.kubernetes.io/master, the pod will be created with conflicting selectors and won't be schedulable. Version-Release number of selected component (if applicable): Tested on OCP 4.2.4, 4.2.8 How reproducible: 100% Steps to Reproduce: 1. oc patch scheduler/cluster --type='json' -p='[{"op":"replace","path":"/spec/defaultNodeSelector","value":"node-role.kubernetes.io/worker="}]' 2. Kill insights operator pod if already running Actual results: $ oc get rs -o json -n openshift-insights | jq '.items[].spec.template.spec.nodeSelector' { "beta.kubernetes.io/os": "linux", "node-role.kubernetes.io/master": "" } $ oc get pods -o json -n openshift-insights | jq '.items[].spec.nodeSelector' { "beta.kubernetes.io/os": "linux", "node-role.kubernetes.io/master": "", "node-role.kubernetes.io/worker": "" } $ oc describe pod Name: insights-operator-5db58db885-5rxv2 Namespace: openshift-insights Priority: 2000000000 PriorityClassName: system-cluster-critical Node: <none> Labels: app=insights-operator pod-template-hash=5db58db885 Annotations: <none> Status: Pending IP: Controlled By: ReplicaSet/insights-operator-5db58db885 Containers: operator: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b4fb29b2018b1c56e67e73a8d91dd265eb82c6670e6bad3ed8cad3aabaac1824 Port: 8443/TCP Host Port: 0/TCP Args: start -v=4 --config=/etc/insights-operator/server.yaml Requests: cpu: 10m memory: 30Mi Environment: POD_NAME: insights-operator-5db58db885-5rxv2 (v1:metadata.name) POD_NAMESPACE: openshift-insights (v1:metadata.namespace) RELEASE_VERSION: 4.2.8 Mounts: /var/lib/insights-operator from snapshots (rw) /var/run/configmaps/trusted-ca-bundle from trusted-ca-bundle (ro) /var/run/secrets/kubernetes.io/serviceaccount from operator-token-vq4kd (ro) Conditions: Type Status PodScheduled False Volumes: snapshots: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: trusted-ca-bundle Optional: true operator-token-vq4kd: Type: Secret (a volume populated by a Secret) SecretName: operator-token-vq4kd Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux node-role.kubernetes.io/master= node-role.kubernetes.io/worker= Tolerations: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 900s node.kubernetes.io/unreachable:NoExecute for 900s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 118m (x25 over 118m) default-scheduler 0/10 nodes are available: 10 node(s) didn't match node selector. Warning FailedScheduling 118s (x121 over 117m) default-scheduler 0/10 nodes are available: 10 node(s) didn't match node selector. Expected results: The namespace should set openshift.io/node-selector to an empty string to avoid merging the cluster-wide default when creating the operator pod. Additional info: Example found in Red Hat case 02538590, in which a 4.1.24 -> 4.2.9 upgrade hung due to insights-operator not being scheduled. After updating the project to include the empty node selector and deleting the original pod, the replacement pod was able to be scheduled. $ oc get pods NAME READY STATUS RESTARTS AGE insights-operator-5db58db885-5rxv2 0/1 Pending 0 139m $ oc patch project openshift-insights --type=json -p='[{"op":"replace","path":"/metadata/annotations/openshift.io~1node-selector","value":""}]' project.project.openshift.io/openshift-insights patched $ oc get project openshift-insights -o yaml apiVersion: project.openshift.io/v1 kind: Project metadata: annotations: openshift.io/display-name: "" openshift.io/node-selector: "" openshift.io/sa.scc.mcs: s0:c23,c22 openshift.io/sa.scc.supplemental-groups: 1000550000/10000 openshift.io/sa.scc.uid-range: 1000550000/10000 creationTimestamp: "2019-12-11T07:17:38Z" labels: name: openshift-insights openshift.io/run-level: "1" name: openshift-insights resourceVersion: "17330464" selfLink: /apis/project.openshift.io/v1/projects/openshift-insights uid: 5012afaf-1be6-11ea-8777-005056a50823 spec: finalizers: - kubernetes status: phase: Active $ oc delete pod insights-operator-5db58db885-5rxv2 pod "insights-operator-5db58db885-5rxv2" deleted $ oc get pods NAME READY STATUS RESTARTS AGE insights-operator-5db58db885-j6gxz 0/1 ContainerCreating 0 2s $ oc describe pod insights-operator-5db58db885-j6gxz Name: insights-operator-5db58db885-j6gxz Namespace: openshift-insights Priority: 2000000000 PriorityClassName: system-cluster-critical Node: etcd-1.prod.openshift.tcc.etn.com/172.20.72.13 Start Time: Wed, 11 Dec 2019 04:37:51 -0500 Labels: app=insights-operator pod-template-hash=5db58db885 Annotations: k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.1.43" ], "default": true, "dns": {} }] Status: Pending IP: Controlled By: ReplicaSet/insights-operator-5db58db885 Containers: operator: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b4fb29b2018b1c56e67e73a8d91dd265eb82c6670e6bad3ed8cad3aabaac1824 Image ID: Port: 8443/TCP Host Port: 0/TCP Args: start -v=4 --config=/etc/insights-operator/server.yaml State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 30Mi Environment: POD_NAME: insights-operator-5db58db885-j6gxz (v1:metadata.name) POD_NAMESPACE: openshift-insights (v1:metadata.namespace) RELEASE_VERSION: 4.2.8 Mounts: /var/lib/insights-operator from snapshots (rw) /var/run/configmaps/trusted-ca-bundle from trusted-ca-bundle (ro) /var/run/secrets/kubernetes.io/serviceaccount from operator-token-vq4kd (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: snapshots: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: trusted-ca-bundle Optional: true operator-token-vq4kd: Type: Secret (a volume populated by a Secret) SecretName: operator-token-vq4kd Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 900s node.kubernetes.io/unreachable:NoExecute for 900s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11s default-scheduler Successfully assigned openshift-insights/insights-operator-5db58db885-j6gxz to etcd-1.prod.openshift.tcc.etn.com Normal Pulling 3s kubelet, etcd-1.prod.openshift.tcc.etn.com Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b4fb29b2018b1c56e67e73a8d91dd265eb82c6670e6bad3ed8cad3aabaac1824"
Fixed and verified in 4.2.0-0.nightly-2020-02-10-153446. Insights-operator operator is successfully scheduled.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0460