Bug 1753014 - Kuryr bootstrap not finishing due to API lbaas with ERROR status
Summary: Kuryr bootstrap not finishing due to API lbaas with ERROR status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.2.0
Assignee: Maysa Macedo
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-17 19:13 UTC by Maysa Macedo
Modified: 2019-10-16 06:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:41:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 319 0 None closed Bug 1753014: Kuryr: Fix kuryr bootstrap when lbaas API has ERROR status 2021-02-21 06:50:00 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:41:41 UTC

Description Maysa Macedo 2019-09-17 19:13:35 UTC
Description of problem:

In case the API lbaas gets to a provisiong status with ERROR, the reconciliation is triggered and the same lbaas with ERROR status is used on the creation of the next lbaas resources, e.g. pools, listeners and members. As shown in the following logs:

2019/09/17 16:36:09 Failed to reconcile platform networking resources: failed to create OpenShift API loadbalancer: Timed out waiting for the LB 571bedae-2c08-4391-adb7-b3dfa6bf9167 to become ready    [581/1818]
2019/09/17 16:36:09 Updated ClusterOperator with conditions:
- lastTransitionTime: "2019-09-17T16:36:09Z"
  message: 'Internal error while reconciling platform networking resources: failed
    to create OpenShift API loadbalancer: Timed out waiting for the LB 571bedae-2c08-4391-adb7-b3dfa6bf9167
    to become ready'
  reason: BootstrapError
  status: "True"
  type: Degraded
- lastTransitionTime: "2019-09-17T16:30:57Z"
  status: "True"
  type: Upgradeable
2019/09/17 16:36:10 Reconciling Network.operator.openshift.io cluster
2019/09/17 16:36:10 Detected uplink MTU 1450
2019/09/17 16:36:10 Kuryr bootstrap started
2019/09/17 16:36:11 Using openshiftClusterID=ostest-xxvsx as resources tag
2019/09/17 16:36:11 Ensuring services network
2019/09/17 16:36:11 Services network 96891c50-1e13-41d4-9244-d6c029437d42 present
2019/09/17 16:36:11 Ensuring services subnet with 172.30.0.0/15 CIDR (services from 172.30.0.0/16) and 172.31.255.254 gateway with allocation pools [{Start:172.31.0.0 End:172.31.255.253}]
2019/09/17 16:36:11 Services subnet aeeac861-d423-40de-af44-10c8bca85b1e present
2019/09/17 16:36:11 Ensuring pod subnetpool with following CIDRs: [10.128.0.0/14]
2019/09/17 16:36:11 Pod subnetpool 6e03139d-5dbc-4f37-a25c-0bc461920c87 present
2019/09/17 16:36:11 Found worker nodes subnet e5d78311-b872-472f-b2fa-0c78db7762ad
2019/09/17 16:36:12 Found worker nodes router ee3e3e8c-7f69-4e4e-a31e-0d8cb9537737
2019/09/17 16:36:12 Found master nodes security group 446cd130-3037-4d96-92ad-b10ef8ce6e8b
2019/09/17 16:36:12 Found worker nodes security group 8749b9e7-8a71-4898-b43b-22f0effd627b
2019/09/17 16:36:12 Ensuring pods security group
2019/09/17 16:36:12 Pods security group 6ef77c00-5ecf-4f34-8bd7-5c1f0fe8fc58 present
2019/09/17 16:36:12 Allowing traffic from masters and nodes to pods
2019/09/17 16:36:12 Allowing traffic from pod to pod
2019/09/17 16:36:13 All requried traffic allowed
2019/09/17 16:36:13 Creating OpenShift API loadbalancer with IP 172.30.0.1
2019/09/17 16:36:13 OpenShift API loadbalancer 571bedae-2c08-4391-adb7-b3dfa6bf9167 present
2019/09/17 16:36:13 Creating OpenShift API loadbalancer pool
2019/09/17 16:36:14 Failed to reconcile platform networking resources: failed to create OpenShift API loadbalancer pool: failed to create LB pool: Expected HTTP response code [] when accessing [POST http://10.46
.22.140:9876/v2.0/lbaas/pools], but got 409 instead
{"debuginfo": null, "faultcode": "Client", "faultstring": "Load Balancer 571bedae-2c08-4391-adb7-b3dfa6bf9167 is immutable and cannot be updated."}

The operator will hang in a loop constantly trying to finish Kuryr bootstraping phase and will never succeed, making the installation to fail. 


Version-Release number of selected component (if applicable):


How reproducible: Only when octavia fails to provision the lbaas


Steps to Reproduce:
1. trigger installation with 4.2.0-0.nightly-2019-09-16
2.
3.

Actual results:


Expected results: Installation to finish successfully. 


Additional info:

Comment 2 Jon Uriarte 2019-10-04 15:35:47 UTC
Verified on 4.2.0-0.nightly-2019-10-02-150642 on top of OSP 13 2019-10-01.1 puddle.

Steps:
1. Install OSP 13 with Octavia
2. Run OCP 4.2 installer with Kuryr
3. Once it's deployed, induce the API LB to ERROR status
4. It needs to be re-created with different ID by the network-operator

API LB after fresh deployment (`openstack loadbalancer list`):
| dd466800-535b-4ef9-9187-a1350cbef135 | ostest-mp284-kuryr-api-loadbalancer | 4d589eb96cb04a4598056bc3679b63dc | 172.30.0.1     | ACTIVE | octavia  |

Induce the LB to ERROR status (by changing it in Octavia DB):
| dd466800-535b-4ef9-9187-a1350cbef135 | ostest-mp284-kuryr-api-loadbalancer | 4d589eb96cb04a4598056bc3679b63dc | 172.30.0.1     | ERROR | octavia  |

Logs from network-operator:

2019/10/04 15:15:14 Creating OpenShift API loadbalancer with IP 172.30.0.1
2019/10/04 15:15:14 Deleting Openstack LoadBalancer: dd466800-535b-4ef9-9187-a1350cbef135
2019/10/04 15:16:27 OpenShift API loadbalancer 9bb9d1db-8204-4fd2-8da7-927f1c62fc02 present
2019/10/04 15:16:27 Creating OpenShift API loadbalancer pool
2019/10/04 15:16:28 OpenShift API loadbalancer pool 3d482c92-b1ef-44cb-af69-30e7f3924e37 present
2019/10/04 15:16:28 Creating OpenShift API loadbalancer health monitor
2019/10/04 15:16:28 OpenShift API loadbalancer health monitor d00708db-0941-46a7-bc52-d5b707b4a7cc present
2019/10/04 15:16:28 Creating OpenShift API loadbalancer listener

New API LB (`openstack loadbalancer list`):
| 9bb9d1db-8204-4fd2-8da7-927f1c62fc02 | ostest-mp284-kuryr-api-loadbalancer  | 4d589eb96cb04a4598056bc3679b63dc | 172.30.0.1     | ACTIVE | octavia  |

Comment 3 errata-xmlrpc 2019-10-16 06:41:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.