Bug 1908389 - Loadbalancer Sync failing on Azure
Summary: Loadbalancer Sync failing on Azure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: Stephen Greene
QA Contact: Hongan Li
URL:
Whiteboard:
: 1908052 1908489 1909006 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-16 15:19 UTC by Fabian von Feilitzsch
Modified: 2022-08-04 22:30 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
operator conditions authentication operator conditions console operator conditions ingress operator install authentication operator install console operator install ingress
Last Closed: 2021-02-24 15:45:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes issues 97352 0 None closed Azure LB NIC regex too restrictive 2021-02-18 09:46:16 UTC
Github kubernetes kubernetes issues 97375 0 None closed Azure LB Availability Set assumptions too restrictive 2021-02-18 09:46:16 UTC
Github openshift kubernetes pull 500 0 None closed Bug 1908389: UPSTREAM: 97635: Cherry pick 443 and 448 from cloud provider azure 2021-02-18 09:46:16 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:45:41 UTC

Description Fabian von Feilitzsch 2020-12-16 15:19:44 UTC
Description of problem:
Ingress failing on Azure with 'SyncLoadBalancerFailed'
Azure cluster setup fails because ingress is broken. KCM reports:

level=error msg=Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: ingresscontroller "default" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-xsr7hy3v-9b656-xrfd2-rg/providers/Microsoft.Network/networkInterfaces/ci-op-xsr7hy3v-9b656-xrfd2-master0-nic/ipConfigurations/pipConfig


Version-Release number of selected component (if applicable):
4.7

How reproducible:
100%

Additional info:
Example failing job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-4.7/1339191619219361792

https://search.ci.openshift.org/?search=failed+to+ensure+load+balancer%3A+invalid+ip+config+ID&maxAge=336h&context=1&type=bug%2Bjunit&name=azure&maxMatches=5&maxBytes=20971520&groupBy=job

First appeared shortly after the 1.20 rebase: https://github.com/openshift/kubernetes/pull/471#event-4110268165

Comment 1 Maciej Szulik 2020-12-16 15:26:26 UTC
Sending this over to network team who own the ingress operator to identify what is missing and needs updating after getting k8s 1.20

Comment 2 aaleman 2020-12-16 16:10:22 UTC
*** Bug 1908052 has been marked as a duplicate of this bug. ***

Comment 6 Michael Gugino 2020-12-17 15:44:46 UTC
We created an issue upstream: https://github.com/kubernetes/enhancements/pull/1116

The person that introduced the breaking change has assigned themselves.  Not sure on time table, we might want a patch to land downstream first with an upstream fix hopefully in the works.

Comment 12 W. Trevor King 2020-12-18 01:15:57 UTC
I've filed [1] upstream with the Availability Set issue.

[1]: https://github.com/kubernetes/kubernetes/issues/97375

Comment 13 Stephen Greene 2020-12-18 16:35:39 UTC
*** Bug 1909006 has been marked as a duplicate of this bug. ***

Comment 15 Scott Dodson 2021-01-05 14:16:29 UTC
*** Bug 1908489 has been marked as a duplicate of this bug. ***

Comment 16 Haseeb Tariq 2021-01-06 22:30:47 UTC
Commenting for the benefit of build watchers and Sippy to link this BZ to the following tests that are currently failing because of the failed cluster installation in an azure environment.
- operator conditions authentication
- operator conditions console
- operator conditions ingress
- operator install authentication
- operator install console
- operator install ingress

Latest failure: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.7/1346875212452335616

Comment 18 Hongan Li 2021-01-07 12:03:56 UTC
verified with 4.7.0-0.nightly-2021-01-07-080803 and passed.

# oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.104.234   52.252.144.92   80:32233/TCP,443:32292/TCP   36m
router-internal-default   ClusterIP      172.30.208.255   <none>          80/TCP,443/TCP,1936/TCP      36m

# oc get co/ingress
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
ingress   4.7.0-0.nightly-2021-01-07-080803   True        False         False      30m

creating one custom ingresscontroller also works well
# oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.104.234   52.252.144.92   80:32233/TCP,443:32292/TCP   38m
router-internal-default   ClusterIP      172.30.208.255   <none>          80/TCP,443/TCP,1936/TCP      38m
router-internal-test      ClusterIP      172.30.211.21    <none>          80/TCP,443/TCP,1936/TCP      53s
router-test               LoadBalancer   172.30.211.65    10.0.32.7       80:30966/TCP,443:32636/TCP   53s

Comment 21 errata-xmlrpc 2021-02-24 15:45:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.