1908389 – Loadbalancer Sync failing on Azure

Bug 1908389 - Loadbalancer Sync failing on Azure

Summary: Loadbalancer Sync failing on Azure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Stephen Greene
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	1908052 1908489 1909006 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-16 15:19 UTC by Fabian von Feilitzsch
Modified:	2022-08-04 22:30 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:	operator conditions authentication operator conditions console operator conditions ingress operator install authentication operator install console operator install ingress
Last Closed:	2021-02-24 15:45:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubernetes kubernetes issues 97352	None	closed	Azure LB NIC regex too restrictive	2021-02-18 09:46:16 UTC
Github	kubernetes kubernetes issues 97375	None	closed	Azure LB Availability Set assumptions too restrictive	2021-02-18 09:46:16 UTC
Github	openshift kubernetes pull 500	None	closed	Bug 1908389: UPSTREAM: 97635: Cherry pick 443 and 448 from cloud provider azure	2021-02-18 09:46:16 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:45:41 UTC

Description Fabian von Feilitzsch 2020-12-16 15:19:44 UTC

Description of problem:
Ingress failing on Azure with 'SyncLoadBalancerFailed'
Azure cluster setup fails because ingress is broken. KCM reports:

level=error msg=Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: ingresscontroller "default" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-xsr7hy3v-9b656-xrfd2-rg/providers/Microsoft.Network/networkInterfaces/ci-op-xsr7hy3v-9b656-xrfd2-master0-nic/ipConfigurations/pipConfig


Version-Release number of selected component (if applicable):
4.7

How reproducible:
100%

Additional info:
Example failing job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-4.7/1339191619219361792

https://search.ci.openshift.org/?search=failed+to+ensure+load+balancer%3A+invalid+ip+config+ID&maxAge=336h&context=1&type=bug%2Bjunit&name=azure&maxMatches=5&maxBytes=20971520&groupBy=job

First appeared shortly after the 1.20 rebase: https://github.com/openshift/kubernetes/pull/471#event-4110268165

Comment 1 Maciej Szulik 2020-12-16 15:26:26 UTC

Sending this over to network team who own the ingress operator to identify what is missing and needs updating after getting k8s 1.20

Comment 2 aaleman 2020-12-16 16:10:22 UTC

*** Bug 1908052 has been marked as a duplicate of this bug. ***

Comment 6 Michael Gugino 2020-12-17 15:44:46 UTC

We created an issue upstream: https://github.com/kubernetes/enhancements/pull/1116

The person that introduced the breaking change has assigned themselves.  Not sure on time table, we might want a patch to land downstream first with an upstream fix hopefully in the works.

Comment 12 W. Trevor King 2020-12-18 01:15:57 UTC

I've filed [1] upstream with the Availability Set issue.

[1]: https://github.com/kubernetes/kubernetes/issues/97375

Comment 13 Stephen Greene 2020-12-18 16:35:39 UTC

*** Bug 1909006 has been marked as a duplicate of this bug. ***

Comment 15 Scott Dodson 2021-01-05 14:16:29 UTC

*** Bug 1908489 has been marked as a duplicate of this bug. ***

Comment 16 Haseeb Tariq 2021-01-06 22:30:47 UTC

Commenting for the benefit of build watchers and Sippy to link this BZ to the following tests that are currently failing because of the failed cluster installation in an azure environment.
- operator conditions authentication
- operator conditions console
- operator conditions ingress
- operator install authentication
- operator install console
- operator install ingress

Latest failure: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.7/1346875212452335616

Comment 18 Hongan Li 2021-01-07 12:03:56 UTC

verified with 4.7.0-0.nightly-2021-01-07-080803 and passed.

# oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.104.234   52.252.144.92   80:32233/TCP,443:32292/TCP   36m
router-internal-default   ClusterIP      172.30.208.255   <none>          80/TCP,443/TCP,1936/TCP      36m

# oc get co/ingress
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
ingress   4.7.0-0.nightly-2021-01-07-080803   True        False         False      30m

creating one custom ingresscontroller also works well
# oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.104.234   52.252.144.92   80:32233/TCP,443:32292/TCP   38m
router-internal-default   ClusterIP      172.30.208.255   <none>          80/TCP,443/TCP,1936/TCP      38m
router-internal-test      ClusterIP      172.30.211.21    <none>          80/TCP,443/TCP,1936/TCP      53s
router-test               LoadBalancer   172.30.211.65    10.0.32.7       80:30966/TCP,443:32636/TCP   53s

Comment 21 errata-xmlrpc 2021-02-24 15:45:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.