Bug 1336939

Summary:	OpenShift master continues to attempt to schedule on nodes after they are disabled
Product:	OpenShift Container Platform	Reporter:	Max Whittingham <mwhittin>
Component:	Node	Assignee:	Abhishek Gupta <abhgupta>
Status:	CLOSED ERRATA	QA Contact:	Weihua Meng <wmeng>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.2.0	CC:	abhgupta, anli, aos-bugs, jokerman, mmccomas, mwhittin, qixuan.wang, whearn, wmeng
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously, nodes marked unschedulable would have had additional pods scheduled on them during scale up operations due to outdated caches. The schedulability information is now refreshed properly ensuring that unschedulable nodes do not receive additional pods when replication controllers are scaled up.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-07-05 16:53:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1303130

Description Max Whittingham 2016-05-17 20:46:55 UTC

Description of problem:

OpenShift Master continues to try and schedule on disabled nodes.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Upgraded masters to 3.2
2. Attempting to upgrade registry on new nodes, oc deploy --latest dc/docker-registry
3.

Actual results:

Masters attempting to deploy to the old disabled nodes rather than the new active nodes. 

Expected results:
Deploying the registry to the active nodes

Additional info:
oc get events -w
FIRSTSEEN                       LASTSEEN                        COUNT     NAME                       KIND      SUBOBJECT   TYPE      REASON             SOURCE                 MESSAGE
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12-t1p6w   Pod                   Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12-gemjq   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-112.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12   ReplicationController             Normal    SuccessfulCreate   {replication-controller }   Created pod: docker-registry-12-gemjq
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12-nnj5q   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-112.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12   ReplicationController             Normal    SuccessfulCreate   {replication-controller }   Created pod: docker-registry-12-t1p6w
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12-j9dxr   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12   ReplicationController             Normal    SuccessfulCreate   {replication-controller }   Created pod: docker-registry-12-nnj5q
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:56 -0400 EDT   1         docker-registry-12   ReplicationController             Normal    SuccessfulCreate   {replication-controller }   Created pod: docker-registry-12-j9dxr
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:57 -0400 EDT   2         docker-registry-12-t1p6w   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:17:57 -0400 EDT   1         docker-registry-12-gemjq   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:17:57 -0400 EDT   1         docker-registry-12-nnj5q   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:57 -0400 EDT   2         docker-registry-12-j9dxr   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:59 -0400 EDT   3         docker-registry-12-t1p6w   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:59 -0400 EDT   2         docker-registry-12-gemjq   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-112.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:17:59 -0400 EDT   2         docker-registry-12-nnj5q   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:17:59 -0400 EDT   3         docker-registry-12-j9dxr   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:47 -0400 EDT   2016-05-17 15:17:59 -0400 EDT   4         docker-registry-9-5g6ga   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:59 -0400 EDT   2016-05-17 15:17:59 -0400 EDT   1         docker-registry-9   ReplicationController             Normal    SuccessfulDelete   {replication-controller }   Deleted pod: docker-registry-9-5g6ga
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:18:03 -0400 EDT   4         docker-registry-12-t1p6w   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:18:03 -0400 EDT   2         docker-registry-12-gemjq   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:18:03 -0400 EDT   3         docker-registry-12-nnj5q   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:18:03 -0400 EDT   4         docker-registry-12-j9dxr   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 10:44:50 -0400 EDT   2016-05-17 15:18:05 -0400 EDT   672       docker-registry-9-xkbzq   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:18:11 -0400 EDT   2016-05-17 15:18:11 -0400 EDT   1         docker-registry-12-t1p6w   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-112.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:18:11 -0400 EDT   3         docker-registry-12-gemjq   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:18:11 -0400 EDT   4         docker-registry-12-nnj5q   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:18:11 -0400 EDT   2016-05-17 15:18:11 -0400 EDT   1         docker-registry-12-j9dxr   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-112.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:18:27 -0400 EDT   5         docker-registry-12-t1p6w   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:18:27 -0400 EDT   3         docker-registry-12-gemjq   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-112.ec2.internal' is not in cache
2016-05-17 15:17:56 -0400 EDT   2016-05-17 15:18:27 -0400 EDT   5         docker-registry-12-j9dxr   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache
2016-05-17 15:17:57 -0400 EDT   2016-05-17 15:18:27 -0400 EDT   5         docker-registry-12-nnj5q   Pod                 Warning   FailedScheduling   {default-scheduler }   node 'ip-172-31-9-113.ec2.internal' is not in cache

Comment 1 Max Whittingham 2016-05-17 20:57:14 UTC

Related and possible fix: https://github.com/kubernetes/kubernetes/pull/22568

Comment 2 Anping Li 2016-05-20 01:19:19 UTC

I can't reproduced the issue. How do you upgrade masters? Are you following https://docs.openshift.com/enterprise/latest/install_config/upgrading/index.html

Comment 3 Max Whittingham 2016-05-20 13:49:38 UTC

(In reply to Anping Li from comment #2)
> I can't reproduced the issue. How do you upgrade masters? Are you following
> https://docs.openshift.com/enterprise/latest/install_config/upgrading/index.
> html

Yes, manually. Performed the yum upgrade followed by a reboot in our case but still following : https://docs.openshift.com/enterprise/latest/install_config/upgrading/manual_upgrades.html#upgrading-masters

Comment 6 Abhishek Gupta 2016-06-16 22:43:16 UTC

This is a bit of a corner case and hence I am listing the steps to reproduce this issue. I would suggest following these steps to reproduce the issue on a cluster that does NOT have this bug fix yet to ensure that these steps work and can successfully reproduce the issue.

1. Start with a multi-node cluster with 2 user/compute nodes
2. Create a RC with replica count 1
3. Create a corresponding service
4. Mark the node that has the pod as Unschedulable
5. Scale the RC to 2

The pod created in step 5 should fail to schedule since the predicate does a lookup of the node on which the pod belonging to the same service landed. This lookup is done to see the node labels for determining service affinity. Since the node is Unschedulable and missing from the predicate cache, the lookup fails and the scheduler returns an error.

Comment 7 Weihua Meng 2016-06-17 06:52:20 UTC

Verified on oc v3.2.1.2
Fixed.

2016-06-17 14:46:25 +0800 CST   2016-06-17 14:46:26 +0800 CST   2         database-1-5hb31   Pod                 Warning   FailedScheduling   {default-scheduler }   pod (database-1-5hb31) failed to fit in any node
fit failure on node (ip-172-18-1-30.ec2.internal): Region

2016-06-17 14:46:25 +0800 CST   2016-06-17 14:46:28 +0800 CST   3         database-1-5hb31   Pod                 Warning   FailedScheduling   {default-scheduler }   pod (database-1-5hb31) failed to fit in any node
fit failure on node (ip-172-18-1-30.ec2.internal): Region

Comment 8 Abhishek Gupta 2016-06-17 20:47:08 UTC

wmeng: Were you able to verify that after the fix, the pod in step 5 was created successfully on the node that was schedulable?

Comment 9 Weihua Meng 2016-06-20 03:13:27 UTC

The result depends on the nodes configuration and scheduler configuration.
If the argument is configured on scheduler
{"argument": {"serviceAffinity": {"labels": ["region"]}}, "name": "Region"}.
replicas is 1, the running pod is on NodeA, no other schedulable nodes have same region label with NodeA.
we make NodeA unschelable, scale RC replicas up, the new pods are pending.
This is expected.

If there are schedulable nodes have same region label with NodeA(on which node the running pod is), the new pods will be created on them when scaling up.

If service affinity argument is not configured on scheduler, the new pods will be created on other schedulable nodes.

Comment 11 errata-xmlrpc 2016-07-05 16:53:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1383