Bug 1483119 - Node affinity alpha feature can cause scheduling failures across the cluster.
Summary: Node affinity alpha feature can cause scheduling failures across the cluster.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.6.z
Assignee: ravig
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On:
Blocks: 1492194
TreeView+ depends on / blocked
 
Reported: 2017-08-18 19:03 UTC by Ryan Howe
Modified: 2017-10-25 13:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: As of now, pod anti-affinity is respected across projects. Consequence: podA from project1 won't land on node where podB from project2 is running, if pod anti-affinity is enabled when scheduling pod A. Fix: While scheduling podA check for pod anti-affinity only within the project of podA. Result: pod anti-affinity won't be respected across projects.
Clone Of:
: 1492194 (view as bug list)
Environment:
Last Closed: 2017-10-25 13:06:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3049 0 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 15:57:15 UTC

Description Ryan Howe 2017-08-18 19:03:11 UTC
Description of problem:

Node affinitiy is an alpha feature but can not be disabled in OpenShift. As a result a user in one project can set this causing scheduling issues across the cluster.

The fix for the issue was merged in 3.6 with this upstream PR. 

https://github.com/kubernetes/kubernetes/pull/45352


Version-Release number of selected component (if applicable):
3.5

Additional info:


57s       4m        15        backend-27-mc7jp    Pod                                                   Warning   FailedScheduling   {default-scheduler }                  pod (backend-2-xxxx) failed to fit in any node
fit failure summary on nodes : CheckServiceAffinity (12), MatchInterPodAffinity (5), MatchNodeSelector (12)

When increasing to log level 10 the master controllers log shows.

Cannot schedule project2/backend-2-xxxx onto node node1.example.com,because of PodAntiAffinityTerm &{&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[{deploymentconfig In [backend]}],} [] 

For every node that matches the default node selector region=east. 

The affinity rule of another user and project is causing the scheduling failure:

dc/backend in project bugtest-1:

   scheduler.alpha.kubernetes.io/affinity: |
          {
            "podAntiAffinity": {
              "requiredDuringSchedulingIgnoredDuringExecution": [{
                "labelSelector": {
                  "matchExpressions": [{
                    "key": "deploymentconfig",
                    "operator": "In",
                    "values":["backend"]
                  }]
                },
                "topologyKey": "kubernetes.io/hostname"
              }]
            }
          }

Comment 1 ravig 2017-08-25 16:39:01 UTC
Ryan,

I was able to reproduce the issue in Origin 1.5. 

Following are my observations:

- The fix you have mentioned upstream(
https://github.com/kubernetes/kubernetes/pull/45352) solves this problem. I tested it against 1.5(we are not seeing this behaviour, once the patch has been applied).

- But OCP 3.6(which is based on Origin branch release-3.6) has the same problem(https://github.com/openshift/origin/blob/v3.6.0/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/algorithm/predicates/predicates.go#L1077). The cherry-pick to kube 1.6 branch happened on May5th and our kube rebase happened on April 27th.

Comment 3 Weihua Meng 2017-08-29 09:41:23 UTC
Hi, ravig
Could you give detailed reproduce steps?
Thanks.

Comment 4 ravig 2017-08-29 16:04:50 UTC
Hi Weihua Meng,

Steps to reproduce:

Create 2 node OCP 3.5 cluster(1 master + 2 nodes + (1 optional infra-node)). Use the following file http://pastebin.test.redhat.com/512145 (after saving it a sample.yaml)

- oc create project sample

- oc create project sample1
(Alternatively the yaml could include a namespace and avoid below 2 steps).

- oc create -f sample.yaml -n sample

- oc create -f sample.yaml -n sample1

If we do a 
- oc get pods -n sample1
We will see error related to one of pods not coming to running state.

The original upstream issue could be found for Kube 1.6 at https://github.com/kubernetes/kubernetes/issues/45484(This could be used for ocp 3.6 testing).

Comment 5 Weihua Meng 2017-08-30 10:14:32 UTC
Thanks ravig
It is very helpful. 
I tried, This bug not only cause pending for the same user, but also may cause pending for different users.
Just curious whether we officially announce this alpha feature to customer in 3.5.

Comment 6 ravig 2017-08-30 12:53:55 UTC
Yes this would happen to any user. As a matter of fact, the multi-tenancy is at project level.

I believe it is in tech-preview mode in 3.5(which is based on origin 1.5).

https://github.com/openshift/origin/tree/release-1.5 contains a list of features table. Pod affinity and anti-affinity are in tech preview mode but I am not sure about the support terms.

Comment 7 Ryan Howe 2017-09-01 18:00:25 UTC
This is a techpreview correct, but there is no way to disable any techpreview (alpha/beta) features. Due to this any user and implement this and cause an issue in the cluster that is why we are requesting this be backported to 3.5 and 3.6.

Thank you

Comment 12 Weihua Meng 2017-09-28 01:00:20 UTC
Verified on openshift v3.6.173.0.37
Fixed.
All pods are scheduled in different projects.

Comment 14 errata-xmlrpc 2017-10-25 13:06:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049


Note You need to log in before you can comment on or make changes to this bug.