Description of problem: oadm diagnostics NetworkCheck cannot deploy diagnostic pods on some nodes Version-Release number of selected component (if applicable): oc v3.5.5.15 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ocp-l01.ocp.trkc.tgc:443 openshift v3.5.5.15 kubernetes v1.5.2+43a9be4 How reproducible: Steps to Reproduce: 1.Create a cluster which has some nodes that has no default selector labels 2.execute oadm diagnostics NetworkCheck Actual results: ------------------ ERROR: [DNet2008 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:147] [Logs for network diagnostic pod on node "ocp-i03.ocp.trkc.tgc" failed: container "network-diag-pod-gsxm7" in pod "network-diag-pod-gsxm7" is not available, Logs for network diagnostic pod on node "ocp-i02.ocp.trkc.tgc" failed: container "network-diag-pod-gx2x0" in pod "network-diag-pod-gx2x0" is not available, Logs for network diagnostic pod on node "ocp-i01.ocp.trkc.tgc" failed: container "network-diag-pod-w4tt4" in pod "network-diag-pod-w4tt4" is not available] ####Node's syslog !! Jun 4 11:27:48 ocp-i01 atomic-openshift-node: I0604 11:27:48.813533 13536 kubelet.go:1782] SyncLoop (ADD, "api"): "network-diag-pod-j06c0_network-diag-ns-6k9vs(b163c02d-48ff-11e7-b9d6-00505697fb55)" Jun 4 11:27:48 ocp-i01 atomic-openshift-node: I0604 11:27:48.813695 13536 predicate.go:84] Predicate failed on Pod: network-diag-pod-j06c0_network-diag-ns-6k9vs(b163c02d-48ff-11e7-b9d6-00505697fb55), for reason: Predicate MatchNodeSelector failed Expected results: Successful execution Additional info: Might be same bug as in https://bugzilla.redhat.com/show_bug.cgi?id=1431588
"Predicate MatchNodeSelector failed" simply means that the pods have a nodeSelector that the node label doesn't match. This is a normal scheduling message where the pod doesn't fit the node. The likely reason is that there is a default node selector in the master config, and the projects created for this diagnostic just inherit that. Then they won't run on any nodes that aren't selected by the default node selector. Indeed is same bug as (RFE) https://bugzilla.redhat.com/show_bug.cgi?id=1431588 Not exactly a bug, just normal functioning, but if users are expecting the network pods to land everywhere, I think it should be possible to implement by just creating the projects with an empty node selector.
(In reply to Luke Meyer from comment #2) > "Predicate MatchNodeSelector failed" simply means that the pods have a > nodeSelector that the node label doesn't match. This is a normal scheduling > message where the pod doesn't fit the node. > > The likely reason is that there is a default node selector in the master > config, and the projects created for this diagnostic just inherit that. Then > they won't run on any nodes that aren't selected by the default node > selector. > > Indeed is same bug as (RFE) > https://bugzilla.redhat.com/show_bug.cgi?id=1431588 > > Not exactly a bug, just normal functioning, but if users are expecting the > network pods to land everywhere, I think it should be possible to implement > by just creating the projects with an empty node selector. I think it's better to change the default behavior as "run it everywhere", because large clusters always have some special group of nodes that are kept out of default nodes which are specified with "osm_default_node_selector". Infrastructre nodes are just one examples that. As running diagnostic tools, users are not trying to make deployments, they're simply trying to diagnose the cluster, so from their point of view how OCP running this diagnostics internally is irrelevant.
https://github.com/openshift/origin/pull/14686
verified this bug on # openshift version openshift v3.6.126.1 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 1. changed the master-config.yaml defaultNodeSelector: "test=zzhao" 2. restart the master service 3. run 'oadm diagnostics NetworkCheck' 4. Check the pod will be scheduled on the node.
*** Bug 1431588 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716