Bug 1909006

Summary:

[OCP4.7] Installation fails in Azure environment with "Error syncing load balancer: failed to ensure load balancer: invalid ip config ID" errors

Product:

OpenShift Container Platform

Reporter:

Arvind iyengar <aiyengar>

Component:

Networking

Assignee:

aos-network-edge-staff <aos-network-edge-staff>

Networking sub component:

router

QA Contact:

Hongan Li <hongli>

Status:

CLOSED DUPLICATE

Docs Contact:

Severity:

urgent

Priority:

unspecified

CC:

aos-bugs, sgreene

Version:

4.7

Keywords:

TestBlocker

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-12-18 16:35:40 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Reference must-gather data from one of the failing clusters	none

Description Arvind iyengar 2020-12-18 07:07:59 UTC

Description of problem:
Installation in Azure environment ends with a failure. This is consistently noted across multiple installation attempts with different nightly images made where the following error is most commonly noted in the ingress controller deployment logs:
-----
2020-12-18T01:49:09.371Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T01:49:09.568Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/xxia18az-djhkb-rg/providers/Microsoft.Network/networkInterfaces/xxia18az-djhkb-master0-nic/ipConfigurations/pipConfig\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
-----

The problem is not seen for other envs like AWS/GCP.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-17-224915
4.7.0-0.nightly-2020-12-17-201522


How reproducible:
Frequently

Steps to Reproduce:
1. Initiate deployment of cluster using latest ocp v4.7 nightly images in Azure environment

Actual results:
The deployment will end up in a failure and the following errors could be seen in the ingress operator logs:
-----
2020-12-18T04:37:18.108Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "48.906300155s", "error": "IngressController
 may become degraded soon: LoadBalancerReady=False, CanaryChecksSucceeding=False"}
2020-12-18T04:38:07.002Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T04:38:07.118Z        ERROR   operator.canary_controller      wait/wait.go:155        error performing canary route check     {"error": "error sending canary HTTP request: DNS erro
r: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.op
enshift.com on 172.30.0.10:53: no such host"}
2020-12-18T04:38:07.150Z        INFO    operator.status_controller      controller/controller.go:235    Reconciling     {"request": "openshift-ingress-operator/default"}

2020-12-18T04:39:07.301Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
2020-12-18T04:40:07.301Z        INFO    operator.ingress_controller     controller/controller.go:235    reconciling     {"request": "openshift-ingress-operator/default"}
2020-12-18T04:40:07.344Z        ERROR   operator.canary_controller      wait/wait.go:155        error performing canary route check     {"error": "error sending canary HTTP request: DNS error: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com on 172.30.0.10:53: no such host"}
2020-12-18T04:40:07.462Z        ERROR   operator.ingress_controller     controller/controller.go:235    got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/hongli-az47-b4tcb-rg/providers/Microsoft.Network/networkInterfaces/hongli-az47-b4tcb-bootstrap-nic/ipConfigurations/bootstrap-nic-ip-v4\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
------

Expected results:
The installation should succeed in the Azure environment.

Comment 1 Arvind iyengar 2020-12-18 08:01:20 UTC

Created attachment 1740167 [details]
Reference must-gather data from one of the failing clusters

Comment 2 Stephen Greene 2020-12-18 16:35:40 UTC


*** This bug has been marked as a duplicate of bug 1908389 ***