Bug 1794839 - Service load balancers cannot be used with pods on Azure master nodes
Summary: Service load balancers cannot be used with pods on Azure master nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Clayton Coleman
QA Contact: Hongan Li
URL:
Whiteboard:
: 1812662 1818023 1820800 1830293 (view as bug list)
Depends On:
Blocks: 1812662 1818023
TreeView+ depends on / blocked
 
Reported: 2020-01-24 20:12 UTC by Cesar Wong
Modified: 2022-08-04 22:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Azure only allows a node's network interface card (NIC) to be associated with a single load-balancer at any point in time. On master nodes, the NIC was associated with a load balancer for the API, which prevented the NIC from being associated with an additional load-balancer for "LoadBalancer"-type Services. Consequence: Service load-balancers could not include Azure master nodes. In particular, this broke ingress on compact clusters, where the worker nodes (which host the ingress controller's pod replicas) are also master nodes. Fix: The installer was changed to create a unified load-balancer and network security group that are used for both the API and "LoadBalancer"-type Services. Result: Service load-balancers can include master nodes on Azure, and ingress can work on compact clusters.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:13:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3440 0 None closed Bug 1794839: Azure masters should correctly support service load balancers 2021-01-14 00:34:31 UTC
Github openshift installer pull 3561 0 None closed Bug 1794839: data/azure: Use a single network security group for Azure clusters 2021-01-14 00:34:33 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:13:57 UTC

Description Cesar Wong 2020-01-24 20:12:23 UTC
Description of problem:

Most tests that require the oauth server failed because the server does not seem to be running:

Jan 24 18:41:44.926: INFO: OAuth server pod is not ready: 
Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) {
 (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2020-01-24 18:41:33 +0000 UTC,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:registry.svc.ci.openshift.org/ocp/4.3-2020-01-21-121240@sha256:98ebde80813be5465888692fb5d699eef4438df29dd7f376a60668204755f243,ImageID:registry.svc.ci.openshift.org/ocp/4.3-2020-01-21-121240@sha256:98ebde80813be5465888692fb5d699eef4438df29dd7f376a60668204755f243,ContainerID:cri-o://f13b967b807d28a6ee7d77e044237aee7709deb2e7b2f24535159c8ce5a0e8b9,Started:*true,}
}

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.3/94

Comment 1 Standa Laznicka 2020-01-30 13:46:15 UTC
From the logs I can see that the oauth-server pod actually came to life and was responding to health checks, but the linked tests fail on i/o timeouts while trying to reach the server via its route. Moving to routing.

Comment 2 Dan Mace 2020-01-30 16:41:40 UTC
Took a quick look through the logs, nothing stands out yet. Highly unlikely this is a release blocker.

Comment 3 Dan Mace 2020-02-18 16:46:18 UTC
Had a chance today to reproduce this. Looks like the LBs created by K8s for LoadBalancer services on Azure have empty backend pools, which means ingress is totally broken and probably has always been in this topology. There were other fixes upstream in the service controller and GCP cloud provider to support the topology, but apparently no work was done for Azure.

Need to investigate the Azure cloud provider implementation.

Comment 4 Dan Mace 2020-02-18 16:49:53 UTC
Looks like the Azure cloud provider uses a `excludeMasterFromStandardLB` cloud provider configuration key to decide whether masters should be excluded, and the default is `true`.

Comment 5 Dan Mace 2020-02-20 22:01:11 UTC
I think the Azure cloud provider code upstream needs some refactoring to honor LegacyNodeRoleBehavior (specifically, by only honoring node role labels when LegacyNodeRoleBehavior=True). I'm not sure we're going to have time to take that on for the release, but I'll leave the issue in 4.4 for now until I've had a chance to talk over my findings with some others on the team.

Since Azure is still tech preview, we won't block the release on a fix for this topology.

Comment 6 Clayton Coleman 2020-04-11 06:10:39 UTC
I think I found both the root cause and solution.

Azure cloud provider expects to create a load balancer and backend pool for all eligible nodes and then add traffic to it.  The upstream thinks it is filtering out master nodes, but it's using the old label (kubernetes.io/role) which we don't set and no one is supposed to use anyway (all the upstream code for filtering on the correct label was removed, I'll track removing it).  The controller reconciles and adds things so it is non-disruptive to any other rules on the LB.

This is important because in Azure a single NIC can only be attached to one load balancer at a time. Fortunately, Azure load balancers are designed to support multiple inputs and backends, and health checks (and thus pool membership) are determined per input - so you can have both public API Server traffic and service load balancer traffic on the same backend pool and same Azure LB, but with different frontend IPs.  Since in the long run we are looking to always use SLB even for the kube-apiserver, we are actually able to leverage that behavior on Azure by renaming the LB we create to just "cluster_name" and the cloud controller will automatically keep all nodes in the pool.  The current health check on 6443 for apiserver will filter the list of nodes down to the masters (although if someone listens on port 6443 on a node they could potentially DoS the frontend by injecting endpoints, but those endpoints would not be able to impersonate a master without the TLS cert.  This is currently a problem with the router on any cloud.  I think this needs a bit of discussion but is reasonable on the surface.

PR will be opened and I'll start discussion with ARO.

Comment 7 Sam Batschelet 2020-04-15 21:18:41 UTC
*** Bug 1820800 has been marked as a duplicate of this bug. ***

Comment 10 Hongan Li 2020-05-06 07:37:09 UTC
Checked the latest CI and still found many OAuth Server failed test.

fail [github.com/openshift/origin/test/extended/oauth/expiration.go:30]: Unexpected error:
    <*errors.errorString | 0xc0001d4970>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

see also: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.5/77


Checked the CI of compact cluster on AWS, Azure and GCP below, and just see the similar error on Azure.
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-origin-installer-e2e-azure-compact-4.5

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-origin-installer-e2e-aws-compact-4.5

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-origin-installer-e2e-gcp-compact-4.5

Comment 12 Ben Bennett 2020-05-08 20:03:02 UTC
*** Bug 1818023 has been marked as a duplicate of this bug. ***

Comment 13 Hongan Li 2020-05-09 05:05:10 UTC
Checked the latest CI and oauth server tests are passed.

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.5/80

And also confirmed that ingress works in the three-node cluster.
$ oc get node
NAME                          STATUS   ROLES           AGE    VERSION
hongli-pl601-hvmwm-master-0   Ready    master,worker   122m   v1.18.0-rc.1
hongli-pl601-hvmwm-master-1   Ready    master,worker   122m   v1.18.0-rc.1
hongli-pl601-hvmwm-master-2   Ready    master,worker   122m   v1.18.0-rc.1

$ oc get co ingress authentication console
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
ingress          4.5.0-0.nightly-2020-05-08-222601   True        False         False      67m
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication   4.5.0-0.nightly-2020-05-08-222601   True        False         False      97m
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
console          4.5.0-0.nightly-2020-05-08-222601   True        False         False      83m

Comment 14 Daneyon Hansen 2020-05-21 17:05:56 UTC
*** Bug 1812662 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2020-07-13 17:13:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 17 Sam Batschelet 2020-08-17 21:34:49 UTC
*** Bug 1830293 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.