Bug 1703943

Summary:	router pods are always running on same node in fresh install AWS env
Product:	OpenShift Container Platform	Reporter:	Hongan Li <hongli>
Component:	Networking	Assignee:	Dan Mace <dmace>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	high	CC:	aos-bugs, bbennett, wking
Version:	4.1.0
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:48:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hongan Li 2019-04-29 07:45:43 UTC

Description of problem:
Checked some fresh install env on AWS (IPI and UPI) and found that the router pods are always running on same node, although no functional impact.
And after scale down and scale up, then the two router pods running on different nodes. 

Version-Release number of selected component (if applicable):
4.1.0-0.nightly-2019-04-28-064010

How reproducible:
100%

Steps to Reproduce:
1. fresh install on AWS
2. check the router pods
   $ oc get pod -o wide -n openshift-ingress
3. scale down
   $ oc -n openshift-ingress-operator patch ingresscontroller/default -p '{"spec":{"replicas": 0}}' --type=merge
4. scale up
   $ oc -n openshift-ingress-operator patch ingresscontroller/default -p '{"spec":{"replicas": 2}}' --type=merge
5. check the router pods again
   $ oc get pod -o wide -n openshift-ingress

Actual results:
step 2:
$ oc get pod -o wide -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE     IP           NODE                                                NOMINATED NODE   READINESS GATES
router-default-84c7f9d456-fgxqp   1/1     Running   0          4h35m   10.131.0.4   ip-172-31-134-171.ap-northeast-2.compute.internal   <none>           <none>
router-default-84c7f9d456-sl7k2   1/1     Running   0          4h35m   10.131.0.3   ip-172-31-134-171.ap-northeast-2.compute.internal   <none>           <none>

step 5:
$ oc get pod -o wide -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE   IP           NODE                                                NOMINATED NODE   READINESS GATES
router-default-84c7f9d456-2wvzv   1/1     Running   0          50s   10.128.2.9   ip-172-31-141-240.ap-northeast-2.compute.internal   <none>           <none>
router-default-84c7f9d456-d9ndb   1/1     Running   0          50s   10.129.2.9   ip-172-31-150-49.ap-northeast-2.compute.internal    <none>           <none>


Expected results:
the two router pods are running on different nodes

Additional info:

Comment 1 W. Trevor King 2019-04-29 21:53:10 UTC

Because we don't redistribute pods unless they are evicted, maybe this is just:

1. Ingress requests pods, but we have no compute nodes yet.
2. Machine-API operator creates the first compute node.
3. Scheduler rejoices and drops both router pods on that node.
4. Machine API creates additional nodes, but since the router pods are already scheduled, it's too late for the scheduler to point an ingress pod at them.

This seems like something that should have a generic Kubernetes rebalancing solution.  I don't know if one exists or not, but if not, a short-term fix might be having your operator monitor for this and then kill one of the pods if it notices this condition.

Comment 3 Hongan Li 2019-05-05 06:03:55 UTC

verified with 4.1.0-0.nightly-2019-05-04-210601 on AWS and issue has been fixed.

$ oc get pod -o wide -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE     IP           NODE                                               NOMINATED NODE   READINESS GATES
router-default-75956b9c8d-6bmf2   1/1     Running   0          4h56m   10.128.2.3   ip-172-31-159-55.ap-southeast-1.compute.internal   <none>           <none>
router-default-75956b9c8d-sg4z7   1/1     Running   0          4h56m   10.131.0.4   ip-172-31-169-47.ap-southeast-1.compute.internal   <none>           <none>

Comment 5 errata-xmlrpc 2019-06-04 10:48:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758