Description of problem: In case the API lbaas gets to a provisiong status with ERROR, the reconciliation is triggered and the same lbaas with ERROR status is used on the creation of the next lbaas resources, e.g. pools, listeners and members. As shown in the following logs: 2019/09/17 16:36:09 Failed to reconcile platform networking resources: failed to create OpenShift API loadbalancer: Timed out waiting for the LB 571bedae-2c08-4391-adb7-b3dfa6bf9167 to become ready [581/1818] 2019/09/17 16:36:09 Updated ClusterOperator with conditions: - lastTransitionTime: "2019-09-17T16:36:09Z" message: 'Internal error while reconciling platform networking resources: failed to create OpenShift API loadbalancer: Timed out waiting for the LB 571bedae-2c08-4391-adb7-b3dfa6bf9167 to become ready' reason: BootstrapError status: "True" type: Degraded - lastTransitionTime: "2019-09-17T16:30:57Z" status: "True" type: Upgradeable 2019/09/17 16:36:10 Reconciling Network.operator.openshift.io cluster 2019/09/17 16:36:10 Detected uplink MTU 1450 2019/09/17 16:36:10 Kuryr bootstrap started 2019/09/17 16:36:11 Using openshiftClusterID=ostest-xxvsx as resources tag 2019/09/17 16:36:11 Ensuring services network 2019/09/17 16:36:11 Services network 96891c50-1e13-41d4-9244-d6c029437d42 present 2019/09/17 16:36:11 Ensuring services subnet with 172.30.0.0/15 CIDR (services from 172.30.0.0/16) and 172.31.255.254 gateway with allocation pools [{Start:172.31.0.0 End:172.31.255.253}] 2019/09/17 16:36:11 Services subnet aeeac861-d423-40de-af44-10c8bca85b1e present 2019/09/17 16:36:11 Ensuring pod subnetpool with following CIDRs: [10.128.0.0/14] 2019/09/17 16:36:11 Pod subnetpool 6e03139d-5dbc-4f37-a25c-0bc461920c87 present 2019/09/17 16:36:11 Found worker nodes subnet e5d78311-b872-472f-b2fa-0c78db7762ad 2019/09/17 16:36:12 Found worker nodes router ee3e3e8c-7f69-4e4e-a31e-0d8cb9537737 2019/09/17 16:36:12 Found master nodes security group 446cd130-3037-4d96-92ad-b10ef8ce6e8b 2019/09/17 16:36:12 Found worker nodes security group 8749b9e7-8a71-4898-b43b-22f0effd627b 2019/09/17 16:36:12 Ensuring pods security group 2019/09/17 16:36:12 Pods security group 6ef77c00-5ecf-4f34-8bd7-5c1f0fe8fc58 present 2019/09/17 16:36:12 Allowing traffic from masters and nodes to pods 2019/09/17 16:36:12 Allowing traffic from pod to pod 2019/09/17 16:36:13 All requried traffic allowed 2019/09/17 16:36:13 Creating OpenShift API loadbalancer with IP 172.30.0.1 2019/09/17 16:36:13 OpenShift API loadbalancer 571bedae-2c08-4391-adb7-b3dfa6bf9167 present 2019/09/17 16:36:13 Creating OpenShift API loadbalancer pool 2019/09/17 16:36:14 Failed to reconcile platform networking resources: failed to create OpenShift API loadbalancer pool: failed to create LB pool: Expected HTTP response code [] when accessing [POST http://10.46 .22.140:9876/v2.0/lbaas/pools], but got 409 instead {"debuginfo": null, "faultcode": "Client", "faultstring": "Load Balancer 571bedae-2c08-4391-adb7-b3dfa6bf9167 is immutable and cannot be updated."} The operator will hang in a loop constantly trying to finish Kuryr bootstraping phase and will never succeed, making the installation to fail. Version-Release number of selected component (if applicable): How reproducible: Only when octavia fails to provision the lbaas Steps to Reproduce: 1. trigger installation with 4.2.0-0.nightly-2019-09-16 2. 3. Actual results: Expected results: Installation to finish successfully. Additional info:
Verified on 4.2.0-0.nightly-2019-10-02-150642 on top of OSP 13 2019-10-01.1 puddle. Steps: 1. Install OSP 13 with Octavia 2. Run OCP 4.2 installer with Kuryr 3. Once it's deployed, induce the API LB to ERROR status 4. It needs to be re-created with different ID by the network-operator API LB after fresh deployment (`openstack loadbalancer list`): | dd466800-535b-4ef9-9187-a1350cbef135 | ostest-mp284-kuryr-api-loadbalancer | 4d589eb96cb04a4598056bc3679b63dc | 172.30.0.1 | ACTIVE | octavia | Induce the LB to ERROR status (by changing it in Octavia DB): | dd466800-535b-4ef9-9187-a1350cbef135 | ostest-mp284-kuryr-api-loadbalancer | 4d589eb96cb04a4598056bc3679b63dc | 172.30.0.1 | ERROR | octavia | Logs from network-operator: 2019/10/04 15:15:14 Creating OpenShift API loadbalancer with IP 172.30.0.1 2019/10/04 15:15:14 Deleting Openstack LoadBalancer: dd466800-535b-4ef9-9187-a1350cbef135 2019/10/04 15:16:27 OpenShift API loadbalancer 9bb9d1db-8204-4fd2-8da7-927f1c62fc02 present 2019/10/04 15:16:27 Creating OpenShift API loadbalancer pool 2019/10/04 15:16:28 OpenShift API loadbalancer pool 3d482c92-b1ef-44cb-af69-30e7f3924e37 present 2019/10/04 15:16:28 Creating OpenShift API loadbalancer health monitor 2019/10/04 15:16:28 OpenShift API loadbalancer health monitor d00708db-0941-46a7-bc52-d5b707b4a7cc present 2019/10/04 15:16:28 Creating OpenShift API loadbalancer listener New API LB (`openstack loadbalancer list`): | 9bb9d1db-8204-4fd2-8da7-927f1c62fc02 | ostest-mp284-kuryr-api-loadbalancer | 4d589eb96cb04a4598056bc3679b63dc | 172.30.0.1 | ACTIVE | octavia |
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922