Bug 1703943
Summary: | router pods are always running on same node in fresh install AWS env | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hongan Li <hongli> |
Component: | Networking | Assignee: | Dan Mace <dmace> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | aos-bugs, bbennett, wking |
Version: | 4.1.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:48:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hongan Li
2019-04-29 07:45:43 UTC
Because we don't redistribute pods unless they are evicted, maybe this is just: 1. Ingress requests pods, but we have no compute nodes yet. 2. Machine-API operator creates the first compute node. 3. Scheduler rejoices and drops both router pods on that node. 4. Machine API creates additional nodes, but since the router pods are already scheduled, it's too late for the scheduler to point an ingress pod at them. This seems like something that should have a generic Kubernetes rebalancing solution. I don't know if one exists or not, but if not, a short-term fix might be having your operator monitor for this and then kill one of the pods if it notices this condition. verified with 4.1.0-0.nightly-2019-05-04-210601 on AWS and issue has been fixed. $ oc get pod -o wide -n openshift-ingress NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-75956b9c8d-6bmf2 1/1 Running 0 4h56m 10.128.2.3 ip-172-31-159-55.ap-southeast-1.compute.internal <none> <none> router-default-75956b9c8d-sg4z7 1/1 Running 0 4h56m 10.131.0.4 ip-172-31-169-47.ap-southeast-1.compute.internal <none> <none> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |