Bug 1744370 - Ingress erroneously reports available on cloud platforms when all nodes are masters
Summary: Ingress erroneously reports available on cloud platforms when all nodes are m...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.3.0
Assignee: Andrew McDermott
QA Contact: Hongan Li
Depends On:
TreeView+ depends on / blocked
Reported: 2019-08-22 01:13 UTC by Dan Mace
Modified: 2019-09-24 20:21 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-09-24 20:17:09 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Dan Mace 2019-08-22 01:13:49 UTC
Description of problem:

On a cloud platform like AWS and given node topology like:

NAME                                         STATUS   ROLES           AGE   VERSION
ip-10-0-139-134.us-east-2.compute.internal   Ready    master,worker   21h   v1.14.0+17b784327
ip-10-0-157-164.us-east-2.compute.internal   Ready    master,worker   21h   v1.14.0+17b784327
ip-10-0-167-50.us-east-2.compute.internal    Ready    master,worker   21h   v1.14.0+17b784327

The default ingress controller will be scheduled to masters even though the master nodes will never be registered as ELB instances. This means ingress is effectively broken, but the ingress operator reports available and not-degraded.

The fact that ingress controllers are scheduled to masters at all when using LoadBalancer publishing strategy seems like a bug, as Kubernetes won't support the cloud balancer wiring. If the ingress controllers couldn't schedule on masters, the ingress operator would correctly report unavailable.

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 2 Hongan Li 2019-08-22 09:57:00 UTC
Since the problem always affects some special installations and misleads to other components (e.g. auth and console), so I expect we can fix it in 4.2.

Comment 3 Dan Mace 2019-08-22 14:00:08 UTC
Ingress is not supported for the topology under test in 4.2, so I do not agree with a decision to block the release.

I do agree we should add more signals to inform the user that the cluster is in a degraded state, and stepping back, docs should have set the users' expectations in the first place.

Comment 4 Dan Mace 2019-09-24 11:38:19 UTC
One option would be to change a node placement of the default ingresscontroller to explicitly exclude masters when using the LoadBalancer publishing strategy. That would codify the upstream rules.

Comment 5 Clayton Coleman 2019-09-24 20:02:55 UTC
Note that in 4.3 we will allow this, so don't do anything that complicates 4.3

Comment 6 Dan Mace 2019-09-24 20:17:09 UTC
https://github.com/kubernetes/kubernetes/pull/80238 and https://github.com/openshift/cluster-kube-apiserver-operator/pull/572 will actually finally enable this use case, so looking forward the current won't actually be a bug.

Note You need to log in before you can comment on or make changes to this bug.