Bug 1909006
| Summary: | [OCP4.7] Installation fails in Azure environment with "Error syncing load balancer: failed to ensure load balancer: invalid ip config ID" errors | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Arvind iyengar <aiyengar> | ||||
| Component: | Networking | Assignee: | aos-network-edge-staff <aos-network-edge-staff> | ||||
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> | ||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||
| Severity: | urgent | ||||||
| Priority: | unspecified | CC: | aos-bugs, sgreene | ||||
| Version: | 4.7 | Keywords: | TestBlocker | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-12-18 16:35:40 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1740167 [details]
Reference must-gather data from one of the failing clusters
*** This bug has been marked as a duplicate of bug 1908389 *** |
Description of problem: Installation in Azure environment ends with a failure. This is consistently noted across multiple installation attempts with different nightly images made where the following error is most commonly noted in the ingress controller deployment logs: ----- 2020-12-18T01:49:09.371Z INFO operator.ingress_controller controller/controller.go:235 reconciling {"request": "openshift-ingress-operator/default"} 2020-12-18T01:49:09.568Z ERROR operator.ingress_controller controller/controller.go:235 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/xxia18az-djhkb-rg/providers/Microsoft.Network/networkInterfaces/xxia18az-djhkb-master0-nic/ipConfigurations/pipConfig\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"} ----- The problem is not seen for other envs like AWS/GCP. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-12-17-224915 4.7.0-0.nightly-2020-12-17-201522 How reproducible: Frequently Steps to Reproduce: 1. Initiate deployment of cluster using latest ocp v4.7 nightly images in Azure environment Actual results: The deployment will end up in a failure and the following errors could be seen in the ingress operator logs: ----- 2020-12-18T04:37:18.108Z ERROR operator.ingress_controller controller/controller.go:235 got retryable error; requeueing {"after": "48.906300155s", "error": "IngressController may become degraded soon: LoadBalancerReady=False, CanaryChecksSucceeding=False"} 2020-12-18T04:38:07.002Z INFO operator.ingress_controller controller/controller.go:235 reconciling {"request": "openshift-ingress-operator/default"} 2020-12-18T04:38:07.118Z ERROR operator.canary_controller wait/wait.go:155 error performing canary route check {"error": "error sending canary HTTP request: DNS erro r: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.op enshift.com on 172.30.0.10:53: no such host"} 2020-12-18T04:38:07.150Z INFO operator.status_controller controller/controller.go:235 Reconciling {"request": "openshift-ingress-operator/default"} 2020-12-18T04:39:07.301Z ERROR operator.ingress_controller controller/controller.go:235 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"} 2020-12-18T04:40:07.301Z INFO operator.ingress_controller controller/controller.go:235 reconciling {"request": "openshift-ingress-operator/default"} 2020-12-18T04:40:07.344Z ERROR operator.canary_controller wait/wait.go:155 error performing canary route check {"error": "error sending canary HTTP request: DNS error: Get \"http://canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.hongli-az47.qe.azure.devcluster.openshift.com on 172.30.0.10:53: no such host"} 2020-12-18T04:40:07.462Z ERROR operator.ingress_controller controller/controller.go:235 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: invalid ip config ID /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/hongli-az47-b4tcb-rg/providers/Microsoft.Network/networkInterfaces/hongli-az47-b4tcb-bootstrap-nic/ipConfigurations/bootstrap-nic-ip-v4\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"} ------ Expected results: The installation should succeed in the Azure environment.