Description of problem: NPD pods are not deployed on master nodes, only nodes are selected for deploying Version-Release number of selected component (if applicable): [root@ip-172-18-13-30 ~]# oc version oc v3.10.0-0.47.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-13-30.ec2.internal:8443 openshift v3.10.0-0.47.0 kubernetes v1.10.0+b81c8f8 [root@ip-172-18-13-30 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) How reproducible: always Steps to Reproduce: 1. Enable NPD by ansible playbook or during cluster deployment 2. # oc get ds -n openshift-node-problem-detector 3. # oc get project openshift-node-problem-detector -o yaml Actual results: [root@ip-172-18-13-30 ~]# oc get ds -n openshift-node-problem-detector NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE node-problem-detector 3 2 2 2 2 <none> 1m (openshift.io/node-selector: "" is missing in the annotations below) [root@ip-172-18-13-30 ~]# oc get project openshift-node-problem-detector -o yaml apiVersion: project.openshift.io/v1 kind: Project metadata: annotations: openshift.io/description: "" openshift.io/display-name: "" openshift.io/sa.scc.mcs: s0:c12,c4 openshift.io/sa.scc.supplemental-groups: 1000140000/10000 openshift.io/sa.scc.uid-range: 1000140000/10000 creationTimestamp: 2018-05-21T05:57:14Z name: openshift-node-problem-detector resourceVersion: "24293" selfLink: /apis/project.openshift.io/v1/projects/openshift-node-problem-detector uid: cfbd3686-5cbb-11e8-8b8a-0ed9d4bed224 spec: finalizers: - openshift.io/origin - kubernetes status: phase: Active Expected results: # oc get ds -n openshift-node-problem-detector NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE node-problem-detector 3 3 3 3 3 <none> 1m openshift.io/node-selector: "" is added to the project openshift-node-problem-detector's annotations, so as to overide defaultNodeSelector as below. cat /etc/origin/master/master-config.yaml|grep defaultNodeSelector: defaultNodeSelector: node-role.kubernetes.io/compute=true Additional info: N/A
Thanks for reporting. I have your suggested fix posted in this PR: https://github.com/openshift/openshift-ansible/pull/8459
https://trello.com/c/OA9z4cMU/701-8-node-deliver-node-problem-detector-for-tech-preview says NPD is tech-preview therefore not a 3.10 blocker here.
still not fixed on 3.10.0-0.58.0 Using project "default" on server "https://qe-weinliu-r5-nonpdmaster-etcd-1:8443". [root@qe-weinliu-r5-nonpdmaster-etcd-1 ~]# oc project openshift-node-problem-detector Now using project "openshift-node-problem-detector" on server "https://qe-weinliu-r5-nonpdmaster-etcd-1:8443". [root@qe-weinliu-r5-nonpdmaster-etcd-1 ~]# oc get pod NAME READY STATUS RESTARTS AGE node-problem-detector-js9kf 1/1 Running 0 2m [root@qe-weinliu-r5-nonpdmaster-etcd-1 ~]# oc version oc v3.10.0-0.58.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-weinliu-r5-nonpdmaster-etcd-1:8443 openshift v3.10.0-0.58.0 kubernetes v1.10.0+b81c8f8
Verified to be fixed on openshift-ansible-3.10.0-0.60.0 [root@qe-weinliu-r5-nonpd-master-etcd-1 ~]# oc project openshift-node-problem-detector Now using project "openshift-node-problem-detector" on server "https://qe-weinliu-r5-nonpd-master-etcd-1:8443". [root@qe-weinliu-r5-nonpd-master-etcd-1 ~]# oc get po NAME READY STATUS RESTARTS AGE node-problem-detector-pwszg 1/1 Running 0 16s node-problem-detector-sqbml 1/1 Running 0 16s [root@qe-weinliu-r5-nonpd-master-etcd-1 ~]# oc version oc v3.10.0-0.60.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-weinliu-r5-nonpd-master-etcd-1:8443 openshift v3.10.0-0.60.0 kubernetes v1.10.0+b81c8f8 [root@qe-weinliu-r5-nonpd-master-etcd-1 ~]# commit bf95bf8acb5e52282eae6b7d5bb0c30a3f18d615 Author: Justin Pierce <jupierce> Date: Tue Jun 5 10:22:19 2018 -0400 Automatic commit of package [openshift-ansible] release [3.10.0-0.60.0]. Created by command: :
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816