Bug 1483119
| Summary: | Node affinity alpha feature can cause scheduling failures across the cluster. | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> | |
| Component: | Node | Assignee: | ravig <rgudimet> | |
| Status: | CLOSED ERRATA | QA Contact: | Weihua Meng <wmeng> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 3.5.0 | CC: | aos-bugs, decarr, jokerman, mmccomas, rgudimet, sjenning, wmeng | |
| Target Milestone: | --- | |||
| Target Release: | 3.6.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: As of now, pod anti-affinity is respected across projects.
Consequence: podA from project1 won't land on node where podB from project2 is running, if pod anti-affinity is enabled when scheduling pod A.
Fix: While scheduling podA check for pod anti-affinity only within the project of podA.
Result: pod anti-affinity won't be respected across projects.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1492194 (view as bug list) | Environment: | ||
| Last Closed: | 2017-10-25 13:06:40 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1492194 | |||
|
Description
Ryan Howe
2017-08-18 19:03:11 UTC
Ryan, I was able to reproduce the issue in Origin 1.5. Following are my observations: - The fix you have mentioned upstream( https://github.com/kubernetes/kubernetes/pull/45352) solves this problem. I tested it against 1.5(we are not seeing this behaviour, once the patch has been applied). - But OCP 3.6(which is based on Origin branch release-3.6) has the same problem(https://github.com/openshift/origin/blob/v3.6.0/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/algorithm/predicates/predicates.go#L1077). The cherry-pick to kube 1.6 branch happened on May5th and our kube rebase happened on April 27th. Hi, ravig Could you give detailed reproduce steps? Thanks. Hi Weihua Meng, Steps to reproduce: Create 2 node OCP 3.5 cluster(1 master + 2 nodes + (1 optional infra-node)). Use the following file http://pastebin.test.redhat.com/512145 (after saving it a sample.yaml) - oc create project sample - oc create project sample1 (Alternatively the yaml could include a namespace and avoid below 2 steps). - oc create -f sample.yaml -n sample - oc create -f sample.yaml -n sample1 If we do a - oc get pods -n sample1 We will see error related to one of pods not coming to running state. The original upstream issue could be found for Kube 1.6 at https://github.com/kubernetes/kubernetes/issues/45484(This could be used for ocp 3.6 testing). Thanks ravig It is very helpful. I tried, This bug not only cause pending for the same user, but also may cause pending for different users. Just curious whether we officially announce this alpha feature to customer in 3.5. Yes this would happen to any user. As a matter of fact, the multi-tenancy is at project level. I believe it is in tech-preview mode in 3.5(which is based on origin 1.5). https://github.com/openshift/origin/tree/release-1.5 contains a list of features table. Pod affinity and anti-affinity are in tech preview mode but I am not sure about the support terms. This is a techpreview correct, but there is no way to disable any techpreview (alpha/beta) features. Due to this any user and implement this and cause an issue in the cluster that is why we are requesting this be backported to 3.5 and 3.6. Thank you Verified on openshift v3.6.173.0.37 Fixed. All pods are scheduled in different projects. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049 |