1909006 – [OCP4.7] Installation fails in Azure environment with "Error syncing load balancer: failed to ensure load balancer: invalid ip config ID" errors

Bug 1909006 - [OCP4.7] Installation fails in Azure environment with "Error syncing load balancer: failed to ensure load balancer: invalid ip config ID" errors

Summary: [OCP4.7] Installation fails in Azure environment with "Error syncing load bal...

Keywords:
Status:	CLOSED DUPLICATE of bug 1908389
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	aos-network-edge-staff
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-18 07:07 UTC by Arvind iyengar
Modified:	2022-08-04 22:30 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-18 16:35:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Reference must-gather data from one of the failing clusters (11.90 MB, application/x-bzip) 2020-12-18 08:01 UTC, Arvind iyengar	no flags	Details
View All

Description Arvind iyengar 2020-12-18 07:07:59 UTC

Description of problem:
Installation in Azure environment ends with a failure. This is consistently noted across multiple installation attempts with different nightly images made where the following error is most commonly noted in the ingress controller deployment logs:
-----
2020-12-18T01:49:09.371Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T01:49:09.568Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/xxia18az-djhkb-rg/providers/Microsoft.Network/networkInterfaces/xxia18az-djhkb-master0-nic/ipConfigurations/pipConfig\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
-----

The problem is not seen for other envs like AWS/GCP.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-17-224915
4.7.0-0.nightly-2020-12-17-201522


How reproducible:
Frequently

Steps to Reproduce:
1. Initiate deployment of cluster using latest ocp v4.7 nightly images in Azure environment

Actual results:
The deployment will end up in a failure and the following errors could be seen in the ingress operator logs:
-----
2020-12-18T04:37:18.108Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "48.906300155s", "error": "IngressController
 may become degraded soon: LoadBalancerReady=False, CanaryChecksSucceeding=False"}
2020-12-18T04:38:07.002Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T04:38:07.118Z        ERROR   operator.canary_controller      wait/wait.go:155        error performing canary route check     {"error": "error sending canary HTTP request: DNS erro
r: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.op
enshift.com on 172.30.0.10:53: no such host"}
2020-12-18T04:38:07.150Z        INFO    operator.status_controller      controller/controller.go:235    Reconciling     {"request": "openshift-ingress-operator/default"}

2020-12-18T04:39:07.301Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
2020-12-18T04:40:07.301Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T04:40:07.344Z        ERROR   operator.canary_controller      wait/wait.go:155        error performing canary route check     {"error": "error sending canary HTTP request: DNS error: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com on 172.30.0.10:53: no such host"}
2020-12-18T04:40:07.462Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/hongli-az47-b4tcb-rg/providers/Microsoft.Network/networkInterfaces/hongli-az47-b4tcb-bootstrap-nic/ipConfigurations/bootstrap-nic-ip-v4\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
------

Expected results:
The installation should succeed in the Azure environment.

Comment 1 Arvind iyengar 2020-12-18 08:01:20 UTC

Created attachment 1740167 [details]
Reference must-gather data from one of the failing clusters

Comment 2 Stephen Greene 2020-12-18 16:35:40 UTC


*** This bug has been marked as a duplicate of bug 1908389 ***

Note You need to log in before you can comment on or make changes to this bug.