Bug 1483119

Summary:	Node affinity alpha feature can cause scheduling failures across the cluster.
Product:	OpenShift Container Platform	Reporter:	Ryan Howe <rhowe>
Component:	Node	Assignee:	ravig <rgudimet>
Status:	CLOSED ERRATA	QA Contact:	Weihua Meng <wmeng>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.5.0	CC:	aos-bugs, decarr, jokerman, mmccomas, rgudimet, sjenning, wmeng
Target Milestone:	---
Target Release:	3.6.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: As of now, pod anti-affinity is respected across projects. Consequence: podA from project1 won't land on node where podB from project2 is running, if pod anti-affinity is enabled when scheduling pod A. Fix: While scheduling podA check for pod anti-affinity only within the project of podA. Result: pod anti-affinity won't be respected across projects.	Story Points:	---
Clone Of:
Clones:	1492194 (view as bug list)		Environment:
Last Closed:	2017-10-25 13:06:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1492194

Description Ryan Howe 2017-08-18 19:03:11 UTC

Description of problem:

Node affinitiy is an alpha feature but can not be disabled in OpenShift. As a result a user in one project can set this causing scheduling issues across the cluster.

The fix for the issue was merged in 3.6 with this upstream PR. 

https://github.com/kubernetes/kubernetes/pull/45352


Version-Release number of selected component (if applicable):
3.5

Additional info:


57s       4m        15        backend-27-mc7jp    Pod                                                   Warning   FailedScheduling   {default-scheduler }                  pod (backend-2-xxxx) failed to fit in any node
fit failure summary on nodes : CheckServiceAffinity (12), MatchInterPodAffinity (5), MatchNodeSelector (12)

When increasing to log level 10 the master controllers log shows.

Cannot schedule project2/backend-2-xxxx onto node node1.example.com,because of PodAntiAffinityTerm &{&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[{deploymentconfig In [backend]}],} [] 

For every node that matches the default node selector region=east. 

The affinity rule of another user and project is causing the scheduling failure:

dc/backend in project bugtest-1:

   scheduler.alpha.kubernetes.io/affinity: |
          {
            "podAntiAffinity": {
              "requiredDuringSchedulingIgnoredDuringExecution": [{
                "labelSelector": {
                  "matchExpressions": [{
                    "key": "deploymentconfig",
                    "operator": "In",
                    "values":["backend"]
                  }]
                },
                "topologyKey": "kubernetes.io/hostname"
              }]
            }
          }

Comment 1 ravig 2017-08-25 16:39:01 UTC

Ryan,

I was able to reproduce the issue in Origin 1.5. 

Following are my observations:

- The fix you have mentioned upstream(
https://github.com/kubernetes/kubernetes/pull/45352) solves this problem. I tested it against 1.5(we are not seeing this behaviour, once the patch has been applied).

- But OCP 3.6(which is based on Origin branch release-3.6) has the same problem(https://github.com/openshift/origin/blob/v3.6.0/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/algorithm/predicates/predicates.go#L1077). The cherry-pick to kube 1.6 branch happened on May5th and our kube rebase happened on April 27th.

Comment 3 Weihua Meng 2017-08-29 09:41:23 UTC

Hi, ravig
Could you give detailed reproduce steps?
Thanks.

Comment 4 ravig 2017-08-29 16:04:50 UTC

Hi Weihua Meng,

Steps to reproduce:

Create 2 node OCP 3.5 cluster(1 master + 2 nodes + (1 optional infra-node)). Use the following file http://pastebin.test.redhat.com/512145 (after saving it a sample.yaml)

- oc create project sample

- oc create project sample1
(Alternatively the yaml could include a namespace and avoid below 2 steps).

- oc create -f sample.yaml -n sample

- oc create -f sample.yaml -n sample1

If we do a 
- oc get pods -n sample1
We will see error related to one of pods not coming to running state.

The original upstream issue could be found for Kube 1.6 at https://github.com/kubernetes/kubernetes/issues/45484(This could be used for ocp 3.6 testing).

Comment 5 Weihua Meng 2017-08-30 10:14:32 UTC

Thanks ravig
It is very helpful. 
I tried, This bug not only cause pending for the same user, but also may cause pending for different users.
Just curious whether we officially announce this alpha feature to customer in 3.5.

Comment 6 ravig 2017-08-30 12:53:55 UTC

Yes this would happen to any user. As a matter of fact, the multi-tenancy is at project level.

I believe it is in tech-preview mode in 3.5(which is based on origin 1.5).

https://github.com/openshift/origin/tree/release-1.5 contains a list of features table. Pod affinity and anti-affinity are in tech preview mode but I am not sure about the support terms.

Comment 7 Ryan Howe 2017-09-01 18:00:25 UTC

This is a techpreview correct, but there is no way to disable any techpreview (alpha/beta) features. Due to this any user and implement this and cause an issue in the cluster that is why we are requesting this be backported to 3.5 and 3.6.

Thank you

Comment 12 Weihua Meng 2017-09-28 01:00:20 UTC

Verified on openshift v3.6.173.0.37
Fixed.
All pods are scheduled in different projects.

Comment 14 errata-xmlrpc 2017-10-25 13:06:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049