Bug 1336939
Summary: | OpenShift master continues to attempt to schedule on nodes after they are disabled | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Max Whittingham <mwhittin> |
Component: | Node | Assignee: | Abhishek Gupta <abhgupta> |
Status: | CLOSED ERRATA | QA Contact: | Weihua Meng <wmeng> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.2.0 | CC: | abhgupta, anli, aos-bugs, jokerman, mmccomas, mwhittin, qixuan.wang, whearn, wmeng |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Previously, nodes marked unschedulable would have had additional pods scheduled on them during scale up operations due to outdated caches. The schedulability information is now refreshed properly ensuring that unschedulable nodes do not receive additional pods when replication controllers are scaled up.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-07-05 16:53:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1303130 |
Description
Max Whittingham
2016-05-17 20:46:55 UTC
Related and possible fix: https://github.com/kubernetes/kubernetes/pull/22568 I can't reproduced the issue. How do you upgrade masters? Are you following https://docs.openshift.com/enterprise/latest/install_config/upgrading/index.html (In reply to Anping Li from comment #2) > I can't reproduced the issue. How do you upgrade masters? Are you following > https://docs.openshift.com/enterprise/latest/install_config/upgrading/index. > html Yes, manually. Performed the yum upgrade followed by a reboot in our case but still following : https://docs.openshift.com/enterprise/latest/install_config/upgrading/manual_upgrades.html#upgrading-masters This is a bit of a corner case and hence I am listing the steps to reproduce this issue. I would suggest following these steps to reproduce the issue on a cluster that does NOT have this bug fix yet to ensure that these steps work and can successfully reproduce the issue. 1. Start with a multi-node cluster with 2 user/compute nodes 2. Create a RC with replica count 1 3. Create a corresponding service 4. Mark the node that has the pod as Unschedulable 5. Scale the RC to 2 The pod created in step 5 should fail to schedule since the predicate does a lookup of the node on which the pod belonging to the same service landed. This lookup is done to see the node labels for determining service affinity. Since the node is Unschedulable and missing from the predicate cache, the lookup fails and the scheduler returns an error. Verified on oc v3.2.1.2 Fixed. 2016-06-17 14:46:25 +0800 CST 2016-06-17 14:46:26 +0800 CST 2 database-1-5hb31 Pod Warning FailedScheduling {default-scheduler } pod (database-1-5hb31) failed to fit in any node fit failure on node (ip-172-18-1-30.ec2.internal): Region 2016-06-17 14:46:25 +0800 CST 2016-06-17 14:46:28 +0800 CST 3 database-1-5hb31 Pod Warning FailedScheduling {default-scheduler } pod (database-1-5hb31) failed to fit in any node fit failure on node (ip-172-18-1-30.ec2.internal): Region wmeng: Were you able to verify that after the fix, the pod in step 5 was created successfully on the node that was schedulable? The result depends on the nodes configuration and scheduler configuration. If the argument is configured on scheduler {"argument": {"serviceAffinity": {"labels": ["region"]}}, "name": "Region"}. replicas is 1, the running pod is on NodeA, no other schedulable nodes have same region label with NodeA. we make NodeA unschelable, scale RC replicas up, the new pods are pending. This is expected. If there are schedulable nodes have same region label with NodeA(on which node the running pod is), the new pods will be created on them when scaling up. If service affinity argument is not configured on scheduler, the new pods will be created on other schedulable nodes. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1383 |