Bug 1671136
Summary: | openshift-ingress router-default pods do not tolerate masters | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Networking | Assignee: | Dan Mace <dmace> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED NOTABUG | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aos-bugs, bbennett, ccoleman |
Version: | 4.1.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-03-21 15:39:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
W. Trevor King
2019-01-30 20:35:22 UTC
Kube explicitly prohibits masters from service load balancer target pools[1]. Given that, even if we allowed the routers to be scheduled on masters, no traffic would make it to them through the provisioned ELB. For other non-cloud platforms (e.g. libvirt), we don't use load balancer services (instead using host-networked routers with no managed LB; something we refer to as 'user defined' high availability). Given all this, should our operator add a master toleration only when using 'user defined' cluster ingress high availability? [1] https://github.com/kubernetes/kubernetes/issues/65618 > Given all this, should our operator add a master toleration only when using 'user defined' cluster ingress high availability? Currently we have:
Possibly? If only to set a more-specific ClusterOperator reason "master can't run a useable router" vs. our current "ingress "default" not available":
$ oc get clusteroperator -o yaml openshift-ingress-operator
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
creationTimestamp: 2019-01-30T20:00:05Z
generation: 1
name: openshift-ingress-operator
resourceVersion: "7055"
selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-ingress-operator
uid: a346c6e0-24c9-11e9-8d1a-52fdfc072182
spec: {}
status:
conditions:
- lastTransitionTime: 2019-01-30T20:00:05Z
status: "False"
type: Failing
- lastTransitionTime: 2019-01-30T20:00:05Z
status: "False"
type: Progressing
- lastTransitionTime: 2019-01-30T20:00:05Z
message: ingress "default" not available
reason: IngressUnavailable
status: "False"
type: Available
extension: null
version: 0.0.1
Given that 4.0 is AWS only, I'm marking this a 4.1 bug. > Given that 4.0 is AWS only, I'm marking this a 4.1 bug. Is all-in-one (zero compute nodes) not a target for 4.0? I think resource constraints make a stronger case for that on libvirt, but folks trying to run AWS clusters on the cheap may also be interested in dropping compute nodes. And maybe the Kubernetes issue linked from comment 1 make zero compute nodes infeasible in the short-term anyway. So while punting to future targets may be appropriate, this is fundamentally an issue for all platforms. I'm going to close this one, because: 1. Our default is consistent with upstream. When publishing an ingress controller with a LoadBalancer Service, masters are excluded from LB target pools by design in k8s. To change this assumption, I think we should take the discussion upstream. 2. Our defaults can be overridden. Admins can control ingress controller scheduling via .spec.nodePlacement. If someone wants to schedule ingress controllers on masters or non-linux hosts, they can. We just won't by default. Please feel free to re-open if you feel closing this was a mistake! Today we landed [1], which should allow ingress/routing on the control-plane machines if you have no compute nodes. I'm not entirely clear on what happens when you have a single compute node; are we still prohibiting colocation ([2], bug 1703943)? We might be stuck there without scheduleable control-plane machines (because we have a compute node), but without enough compute nodes for the full ingress deployment. [1]: https://github.com/openshift/installer/pull/2004 [2]: https://github.com/openshift/cluster-ingress-operator/pull/222 |