Bug 1671136 - openshift-ingress router-default pods do not tolerate masters
Summary: openshift-ingress router-default pods do not tolerate masters
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.0
Assignee: Dan Mace
QA Contact: Hongan Li
Depends On:
TreeView+ depends on / blocked
Reported: 2019-01-30 20:35 UTC by W. Trevor King
Modified: 2019-07-30 22:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-03-21 15:39:42 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description W. Trevor King 2019-01-30 20:35:22 UTC
Description of problem:

The router pod should, like all CVO managed components without a special exception, tolerate masters, but it does not.

Version-Release number of selected component (if applicable):

$ KUBECONFIG=wking/auth/kubeconfig oc adm release info --commits | grep ingress
  cluster-ingress-operator                      https://github.com/openshift/cluster-ingress-operator                      9478e28af89922fa4d54389b1ae8ae6fafb2662b

How reproducible:

Every time.

Steps to Reproduce:

1. Break your Machine API provider, e.g. by running libvirt with a non-standard volume pool before [1] lands.
2. Launch a cluster.
3. Wait for things to stabilize.


$ oc get pods --all-namespaces | grep Pending
openshift-ingress                            router-default-7688479d99-nbnj8                            0/1       Pending     0          31m
openshift-monitoring                         prometheus-operator-647d84b5c6-rsplb                       0/1       Pending     0          31m
openshift-operator-lifecycle-manager         olm-operators-sf5sm                                        0/1       Pending     0          36m
$ oc get pod -o "jsonpath={.status.conditions}{'\n'}" -n openshift-ingress router-default-7688479d99-nbnj8
[map[type:PodScheduled status:False lastProbeTime:<nil> lastTransitionTime:2019-01-30T20:00:04Z reason:Unschedulable message:0/1 nodes are available: 1 node(s) didn't match node selector.]]
$ oc get -o yaml deployment -n openshift-ingress router-default
apiVersion: extensions/v1beta1
kind: Deployment
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: 2019-01-30T20:00:04Z
  generation: 1
    app: router
  name: router-default
  namespace: openshift-ingress
  resourceVersion: "12646"
  selfLink: /apis/extensions/v1beta1/namespaces/openshift-ingress/deployments/router-default
  uid: a2a9a529-24c9-11e9-8d1a-52fdfc072182
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
      app: router
      router: router-default
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
      creationTimestamp: null
        app: router
        router: router-default
      - env:
        - name: STATS_PORT
          value: "1936"
          value: openshift-ingress
          value: /etc/pki/tls/private
        - name: ROUTER_SERVICE_NAME
          value: default
          value: apps.wking.installer.testing
        image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-30-150036@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b
        imagePullPolicy: IfNotPresent
          failureThreshold: 3
            host: localhost
            path: /healthz
            port: 1936
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: router
        - containerPort: 80
          hostPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          hostPort: 443
          name: https
          protocol: TCP
        - containerPort: 1936
          hostPort: 1936
          name: stats
          protocol: TCP
          failureThreshold: 3
            host: localhost
            path: /healthz
            port: 1936
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        - mountPath: /etc/pki/tls/private
          name: default-certificate
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
        node-role.kubernetes.io/worker: ""
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: router
      serviceAccountName: router
      terminationGracePeriodSeconds: 30
      - name: default-certificate
          defaultMode: 420
          secretName: router-certs-default
  - lastTransitionTime: 2019-01-30T20:00:04Z
    lastUpdateTime: 2019-01-30T20:00:04Z
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: 2019-01-30T20:10:05Z
    lastUpdateTime: 2019-01-30T20:10:05Z
    message: ReplicaSet "router-default-7688479d99" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

Actual results:

Pending pod with "0/1 nodes are available: 1 node(s) didn't match node selector."

Expected results:

A running pod.

Additional info:

"high" severity is based on Clayton's request [2].

[1]: https://github.com/openshift/cluster-api-provider-libvirt/pull/45
[2]: https://github.com/openshift/installer/pull/1146#issuecomment-459037176

Comment 1 Dan Mace 2019-01-30 20:45:33 UTC
Kube explicitly prohibits masters from service load balancer target pools[1]. Given that, even if we allowed the routers to be scheduled on masters, no traffic would make it to them through the provisioned ELB. For other non-cloud platforms (e.g. libvirt), we don't use load balancer services (instead using host-networked routers with no managed LB; something we refer to as 'user defined' high availability).

Given all this, should our operator add a master toleration only when using 'user defined' cluster ingress high availability?

[1] https://github.com/kubernetes/kubernetes/issues/65618

Comment 2 W. Trevor King 2019-01-30 21:41:20 UTC
> Given all this, should our operator add a master toleration only when using 'user defined' cluster ingress high availability?  Currently we have:

Possibly?  If only to set a more-specific ClusterOperator reason "master can't run a useable router" vs. our current "ingress "default" not available":

$ oc get clusteroperator -o yaml openshift-ingress-operator
apiVersion: config.openshift.io/v1
kind: ClusterOperator
  creationTimestamp: 2019-01-30T20:00:05Z
  generation: 1
  name: openshift-ingress-operator
  resourceVersion: "7055"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-ingress-operator
  uid: a346c6e0-24c9-11e9-8d1a-52fdfc072182
spec: {}
  - lastTransitionTime: 2019-01-30T20:00:05Z
    status: "False"
    type: Failing
  - lastTransitionTime: 2019-01-30T20:00:05Z
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-01-30T20:00:05Z
    message: ingress "default" not available
    reason: IngressUnavailable
    status: "False"
    type: Available
  extension: null
  version: 0.0.1

Comment 3 Ben Bennett 2019-01-31 19:09:14 UTC
Given that 4.0 is AWS only, I'm marking this a 4.1 bug.

Comment 4 W. Trevor King 2019-01-31 19:16:19 UTC
> Given that 4.0 is AWS only, I'm marking this a 4.1 bug.

Is all-in-one (zero compute nodes) not a target for 4.0?  I think resource constraints make a stronger case for that on libvirt, but folks trying to run AWS clusters on the cheap may also be interested in dropping compute nodes.  And maybe the Kubernetes issue linked from comment 1 make zero compute nodes infeasible in the short-term anyway.  So while punting to future targets may be appropriate, this is fundamentally an issue for all platforms.

Comment 5 Dan Mace 2019-03-21 15:39:42 UTC
I'm going to close this one, because:

1. Our default is consistent with upstream. When publishing an ingress controller with a LoadBalancer Service, masters are excluded from LB target pools by design in k8s. To change this assumption, I think we should take the discussion upstream.
2. Our defaults can be overridden. Admins can control ingress controller scheduling via .spec.nodePlacement. If someone wants to schedule ingress controllers on masters or non-linux hosts, they can. We just won't by default.

Please feel free to re-open if you feel closing this was a mistake!

Comment 6 W. Trevor King 2019-07-30 22:22:22 UTC
Today we landed [1], which should allow ingress/routing on the control-plane machines if you have no compute nodes.  I'm not entirely clear on what happens when you have a single compute node; are we still prohibiting colocation ([2], bug 1703943)?  We might be stuck there without scheduleable control-plane machines (because we have a compute node), but without enough compute nodes for the full ingress deployment.

[1]: https://github.com/openshift/installer/pull/2004
[2]: https://github.com/openshift/cluster-ingress-operator/pull/222

Note You need to log in before you can comment on or make changes to this bug.