Bug 1909006 - [OCP4.7] Installation fails in Azure environment with "Error syncing load balancer: failed to ensure load balancer: invalid ip config ID" errors
Summary: [OCP4.7] Installation fails in Azure environment with "Error syncing load bal...
Keywords:
Status: CLOSED DUPLICATE of bug 1908389
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: aos-network-edge-staff
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-18 07:07 UTC by Arvind iyengar
Modified: 2022-08-04 22:30 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-18 16:35:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reference must-gather data from one of the failing clusters (11.90 MB, application/x-bzip)
2020-12-18 08:01 UTC, Arvind iyengar
no flags Details

Description Arvind iyengar 2020-12-18 07:07:59 UTC
Description of problem:
Installation in Azure environment ends with a failure. This is consistently noted across multiple installation attempts with different nightly images made where the following error is most commonly noted in the ingress controller deployment logs:
-----
2020-12-18T01:49:09.371Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T01:49:09.568Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/xxia18az-djhkb-rg/providers/Microsoft.Network/networkInterfaces/xxia18az-djhkb-master0-nic/ipConfigurations/pipConfig\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
-----

The problem is not seen for other envs like AWS/GCP.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-17-224915
4.7.0-0.nightly-2020-12-17-201522


How reproducible:
Frequently

Steps to Reproduce:
1. Initiate deployment of cluster using latest ocp v4.7 nightly images in Azure environment

Actual results:
The deployment will end up in a failure and the following errors could be seen in the ingress operator logs:
-----
2020-12-18T04:37:18.108Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "48.906300155s", "error": "IngressController
 may become degraded soon: LoadBalancerReady=False, CanaryChecksSucceeding=False"}
2020-12-18T04:38:07.002Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T04:38:07.118Z        ERROR   operator.canary_controller      wait/wait.go:155        error performing canary route check     {"error": "error sending canary HTTP request: DNS erro
r: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.op
enshift.com on 172.30.0.10:53: no such host"}
2020-12-18T04:38:07.150Z        INFO    operator.status_controller      controller/controller.go:235    Reconciling     {"request": "openshift-ingress-operator/default"}

2020-12-18T04:39:07.301Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
2020-12-18T04:40:07.301Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T04:40:07.344Z        ERROR   operator.canary_controller      wait/wait.go:155        error performing canary route check     {"error": "error sending canary HTTP request: DNS error: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com on 172.30.0.10:53: no such host"}
2020-12-18T04:40:07.462Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/hongli-az47-b4tcb-rg/providers/Microsoft.Network/networkInterfaces/hongli-az47-b4tcb-bootstrap-nic/ipConfigurations/bootstrap-nic-ip-v4\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
------

Expected results:
The installation should succeed in the Azure environment.

Comment 1 Arvind iyengar 2020-12-18 08:01:20 UTC
Created attachment 1740167 [details]
Reference must-gather data from one of the failing clusters

Comment 2 Stephen Greene 2020-12-18 16:35:40 UTC

*** This bug has been marked as a duplicate of bug 1908389 ***


Note You need to log in before you can comment on or make changes to this bug.