Bug 1459241
| Summary: | oadm diagnostics NetworkCheck cannot deploy pods on non default nodes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Serhat Dirik <hdirik> |
| Component: | oc | Assignee: | Luke Meyer <lmeyer> |
| Status: | CLOSED ERRATA | QA Contact: | zhaozhanqi <zzhao> |
| Severity: | urgent | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.5.1 | CC: | aos-bugs, bmeng, erich, jokerman, knakayam, mmccomas, myllynen, nnosenzo, orhan.biyiklioglu, wmeng, xiazhao |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
When the master config specifies a default nodeSelector for the cluster, test projects created by oadm diagnostics NetworkCheck got this nodeSelector, and therefore the test pods were also confined to this nodeSelector.
Consequence:
NetworkCheck test pods could only be scheduled on a subset of nodes, preventing the diagnostic covering the entire cluster; in some clusters this might even result in too few pods running for the diagnostic to succeed even if the cluster health is fine.
Fix:
NetworkCheck now creates the test projects with an empty nodeSelector so they can land on any schedulable node.
Result:
The diagnostic should be more robust and meaningful.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-10 05:26:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Serhat Dirik
2017-06-06 15:31:29 UTC
"Predicate MatchNodeSelector failed" simply means that the pods have a nodeSelector that the node label doesn't match. This is a normal scheduling message where the pod doesn't fit the node. The likely reason is that there is a default node selector in the master config, and the projects created for this diagnostic just inherit that. Then they won't run on any nodes that aren't selected by the default node selector. Indeed is same bug as (RFE) https://bugzilla.redhat.com/show_bug.cgi?id=1431588 Not exactly a bug, just normal functioning, but if users are expecting the network pods to land everywhere, I think it should be possible to implement by just creating the projects with an empty node selector. (In reply to Luke Meyer from comment #2) > "Predicate MatchNodeSelector failed" simply means that the pods have a > nodeSelector that the node label doesn't match. This is a normal scheduling > message where the pod doesn't fit the node. > > The likely reason is that there is a default node selector in the master > config, and the projects created for this diagnostic just inherit that. Then > they won't run on any nodes that aren't selected by the default node > selector. > > Indeed is same bug as (RFE) > https://bugzilla.redhat.com/show_bug.cgi?id=1431588 > > Not exactly a bug, just normal functioning, but if users are expecting the > network pods to land everywhere, I think it should be possible to implement > by just creating the projects with an empty node selector. I think it's better to change the default behavior as "run it everywhere", because large clusters always have some special group of nodes that are kept out of default nodes which are specified with "osm_default_node_selector". Infrastructre nodes are just one examples that. As running diagnostic tools, users are not trying to make deployments, they're simply trying to diagnose the cluster, so from their point of view how OCP running this diagnostics internally is irrelevant. verified this bug on
# openshift version
openshift v3.6.126.1
kubernetes v1.6.1+5115d708d7
etcd 3.2.0
1. changed the master-config.yaml
defaultNodeSelector: "test=zzhao"
2. restart the master service
3. run 'oadm diagnostics NetworkCheck'
4. Check the pod will be scheduled on the node.
*** Bug 1431588 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |