Bug 1534775
Summary: | oadm diagnostics NetworkCheck fails to schedule pods if there is a default node selector in the master-config | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eric Jones <erjones> |
Component: | oc | Assignee: | Luke Meyer <lmeyer> |
Status: | CLOSED ERRATA | QA Contact: | zhaozhanqi <zzhao> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.5.0 | CC: | aos-bugs, erjones, jokerman, lmeyer, mmccomas |
Target Milestone: | --- | ||
Target Release: | 3.5.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
When the master config specifies a default nodeSelector for the cluster, test projects created by oadm diagnostics NetworkCheck got this nodeSelector, and therefore the test pods were also confined to this nodeSelector.
Consequence:
NetworkCheck test pods could only be scheduled on a subset of nodes, preventing the diagnostic covering the entire cluster; in some clusters this might even result in too few pods running for the diagnostic to succeed even if the cluster health is fine.
Fix:
NetworkCheck now creates the test projects with an empty nodeSelector so they can land on any schedulable node.
Result:
The diagnostic should be more robust and meaningful.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-12 06:01:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Jones
2018-01-15 23:14:04 UTC
This problem was fixed in origin with https://github.com/openshift/origin/pull/14686 which was released in 3.6.z per https://access.redhat.com/errata/RHEA-2017:1716 The 3.5 backport occurred in the private repo with https://github.com/openshift/ose/pull/849 which merged 2017-08-23. There was no formal bug created to track this into an errata, however I would expect it to have been built and released with https://access.redhat.com/errata/RHBA-2017:1828 a week later in atomic-openshift-clients-3.5.5.31.19-1.git.0.b23f57a.el7.x86_64.rpm although that could have been built before the merge and released after so we might need to look a little later to be exact. I can't see the specific version of the client in this case to see if it's as recent as that. It is the client that defines the projects with or without an empty node selector so there is no need to make server-side updates, you should just be able to test with an updated client. If it doesn't work with a more recent 3.5 (or even 3.6/3.7) client then we need to figure out why the fix isn't being included. @Luke, apologies, you didn't see the exact version the customer is running because I failed to include it, but they recently upgraded to 3.5.5.31 and are still seeing this behavior which makes me think we likely did not include it in that release. I saw 3.5.5.31 in the client version, but the errata release is more specific: 3.5.5.31.19-1 -- that's why I don't know if this is expected or not. I would be surprised if the most recent 3.5 client exhibited this problem though. It's also fixed in 3.6+ Verified this bug on oc v3.5.5.31.60 it has been fixed steps: 1. changed the master-config.yaml defaultNodeSelector: "test=zzhao" 2. restart the master service 3. run 'oadm diagnostics NetworkCheck' 4. Check the pod will be scheduled on the node. So, it seems this was fixed with previous errata. @Luke, do you know what version of OpenShift the oc v3.5.5.31.60 came with? I got my customer to give me the following: $ rpm -qa | grep -ie openshift -ie ocp -ie ansible -ie ose atomic-openshift-master-3.5.5.31-1.git.0.b6f55a2.el7.x86_64 openshift-ansible-filter-plugins-3.5.78-1.git.0.f7be576.el7.noarch tuned-profiles-atomic-openshift-node-3.5.5.31-1.git.0.b6f55a2.el7.x86_64 openshift-ansible-3.5.78-1.git.0.f7be576.el7.noarch atomic-openshift-excluder-3.5.5.31.36-1.git.0.fd415e7.el7.noarch atomic-openshift-node-3.5.5.31-1.git.0.b6f55a2.el7.x86_64 openshift-ansible-lookup-plugins-3.5.78-1.git.0.f7be576.el7.noarch openshift-ansible-playbooks-3.5.78-1.git.0.f7be576.el7.noarch atomic-openshift-docker-excluder-3.5.5.31.36-1.git.0.fd415e7.el7.noarch atomic-openshift-3.5.5.31-1.git.0.b6f55a2.el7.x86_64 openshift-ansible-callback-plugins-3.5.78-1.git.0.f7be576.el7.noarch atomic-openshift-utils-3.5.78-1.git.0.f7be576.el7.noarch atomic-openshift-clients-3.5.5.31-1.git.0.b6f55a2.el7.x86_64 openshift-ansible-docs-3.5.78-1.git.0.f7be576.el7.noarch ansible-2.2.3.0-1.el7.noarch atomic-openshift-sdn-ovs-3.5.5.31-1.git.0.b6f55a2.el7.x86_64 openshift-ansible-roles-3.5.78-1.git.0.f7be576.el7.noarch (In reply to Eric Jones from comment #11) > @Luke, do you know what version of OpenShift the oc v3.5.5.31.60 came with? It doesn't look like 3.5.5.31.60 is released yet but 3.5.5.31.48 was released two months ago and that should be fine: https://access.redhat.com/errata/RHBA-2017:3438 https://access.redhat.com/downloads/content/rhel---7/x86_64/5801/atomic-openshift-clients/3.5.5.31.48-1.git.0.245c039.el7/x86_64/fd431d51/package Again, this is purely a client-side fix. > atomic-openshift-clients-3.5.5.31-1.git.0.b6f55a2.el7.x86_64 I think that's the client at GA. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1106 |