Bug 1840187

Summary: [Kuryr] LB resources not deleted when are transitioned to ERROR
Product: OpenShift Container Platform Reporter: Maysa Macedo <mdemaced>
Component: NetworkingAssignee: Maysa Macedo <mdemaced>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: rlobillo
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1840611 (view as bug list) Environment:
Last Closed: 2020-07-13 17:41:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1840611    

Description Maysa Macedo 2020-05-26 14:20:06 UTC
Description of problem:

When a Load Balancer resource, in this case pool, transition to ERROR status Kuryr does not attempt to recreate it and consequently any resource creation that depends on the one with ERROR will fail causing the controller to restart continuously.

[stack@undercloud-0 ~]$ openstack loadbalancer pool show caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d
+----------------------+-----------------------------------------------------+
| Field | Value |
+----------------------+-----------------------------------------------------+
| admin_state_up | True |
| created_at | 2020-05-16T23:37:16 |
| description | |
| healthmonitor_id | |
| id | caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d |
| lb_algorithm | ROUND_ROBIN |
| listeners | 39933e05-d9c6-4ed1-9a04-e4994e3ea4d3 |
| loadbalancers | 6218ba3d-0290-4c25-a5e1-502f374f8b6c |
| members | |
| name | openshift-machine-api/machine-api-operator:TCP:8443 |
| operating_status | ONLINE |
| project_id | f6b6420743ce45ac868c102a523ffde6 |
| protocol | TCP |
| provisioning_status | ERROR |
| session_persistence | None |
| updated_at | 2020-05-16T23:37:20 |
| tls_container_ref | None |
| ca_tls_container_ref | None |
| crl_container_ref | None |
| tls_enabled | False |
+----------------------+-----------------------------------------------------+

2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'ADDED', 'object': {'kind': 'Endpoints', 'apiVersion': 'v1', 'metadata': {'name': 'machine-api-operator', 'namespace': 'openshift-machin
e-api', 'selfLink': '/api/v1/namespaces/openshift-machine-api/endpoints/machine-api-operator', 'uid': 'c87b2853-180c-40ef-8086-1946c099476b', 'resourceVersion': '25606', 'creationTimestamp': '2020-05-16T23:23:16Z', 'labels': {'k8s-app': '
machine-api-operator'}, 'annotations': {'endpoints.kubernetes.io/last-change-trigger-time': '2020-05-16T23:35:05Z', 'openstack.org/kuryr-lbaas-spec': '{"versioned_object.data": {"ip": "172.30.53.190", "lb_ip": null, "ports": [{"versioned_
object.data": {"name": "https", "port": 8443, "protocol": "TCP", "targetPort": "https"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}], "project_id": "f6b64
20743ce45ac868c102a523ffde6", "security_groups_ids": [], "subnet_id": "0e811b62-6954-4459-9d02-a232e06c040c", "type": "ClusterIP"}, "versioned_object.name": "LBaaSServiceSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_
object.version": "1.0"}'}, 'managedFields': [{'manager': 'python-requests', 'operation': 'Update', 'apiVersion': 'v1', 'time': '2020-05-16T23:26:23Z', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'f:openstack.or
g/kuryr-lbaas-spec': {}}}}}, {'manager': 'kube-controller-manager', 'operation': 'Update', 'apiVersion': 'v1', 'time': '2020-05-16T23:35:05Z', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'.': {}, 'f:endpoints.k
ubernetes.io/last-change-trigger-time': {}}, 'f:labels': {'.': {}, 'f:k8s-app': {}}}, 'f:subsets': {}}}]}, 'subsets': [{'addresses': [{'ip': '10.128.56.38', 'nodeName': 'ostest-zm4t6-master-1', 'targetRef': {'kind': 'Pod', 'namespace': 'o
penshift-machine-api', 'name': 'machine-api-operator-7dddccbf5-wcdf5', 'uid': '549d414d-9b8c-4adf-84f4-d101dbc8fe98', 'resourceVersion': '25604'}}], 'ports': [{'name': 'https', 'port': 8443, 'protocol': 'TCP'}]}]}}: kuryr_kubernetes.excep
tions.ResourceNotReady: Resource not ready: LBaaSMember(id=<?>,ip=10.128.56.38,name='openshift-machine-api/machine-api-operator-7dddccbf5-wcdf5:8443',pool_id=caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d,port=8443,project_id='f6b6420743ce45ac868c1
02a523ffde6',subnet_id=0e811b62-6954-4459-9d02-a232e06c040c)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 93, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._handler.set_liveness(alive=False)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self.force_reraise()
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging six.reraise(self.type_, self.value, self.tb)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging raise value
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 78, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 90, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self.on_present(obj)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 190, in on_present
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging if self._sync_lbaas_members(endpoints, lbaas_state, lbaas_spec):
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 281, in _sync_lbaas_members
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._add_new_members(endpoints, lbaas_state, lbaas_spec)):
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 388, in _add_new_members
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging listener_port=listener_port)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 450, in ensure_member
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._find_member)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 688, in _ensure_provisioned
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging raise k_exc.ResourceNotReady(obj)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: LBaaSMember(id=<?>,ip=10.128.56.38,name='openshift-machine-api/machine-api-operator-7dddccbf5-wcdf5:8443',
pool_id=caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d,port=8443,project_id='f6b6420743ce45ac868c102a523ffde6',subnet_id=0e811b62-6954-4459-9d02-a232e06c040c)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Transition a LB resource (pool/listener) to ERROR while the LB creation is being handling
2.
3.

Actual results:

Kuryr-Controller continuously restarted due to unable to create others LB resources to complete handling the event.

Expected results: 

Load Balancer resource to get recreated.

Additional info:

Comment 3 rlobillo 2020-06-02 08:54:13 UTC
Verified on OSP16 - RHOS_TRUNK-16.0-RHEL-8-20200513.n.1 and 4.5.0-0.nightly-2020-06-01-043833

1 Create a pod on test namespace:

(overcloud) [stack@undercloud-0 ~]$ oc run --image kuryr/demo demo
pod/demo created
(overcloud) [stack@undercloud-0 ~]$ oc run --image kuryr/demo demo-caller
pod/demo-caller created
(overcloud) [stack@undercloud-0 ~]$ oc expose pod/demo --port 80 --target-port 8080
service/demo exposed

(overcloud) [stack@undercloud-0 ~]$ oc get pods,svc -o wide --show-labels
NAME                      READY   STATUS    RESTARTS   AGE   IP               NODE                        NOMINATED NODE   READINESS GATES   LABELS
pod/demo                  1/1     Running   0          38s   10.128.115.234   ostest-k7pdd-worker-8cf4l   <none>           <none>            run=demo
pod/demo-caller           1/1     Running   0          27s   10.128.115.247   ostest-k7pdd-worker-jpsbt   <none>           <none>            run=demo-caller

NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE   SELECTOR   LABELS
service/demo   ClusterIP   172.30.202.242   <none>        80/TCP    22s   run=demo   run=demo

(overcloud) [stack@undercloud-0 ~]$ oc rsh demo-caller curl  172.30.202.242
demo: HELLO! I AM ALIVE!!!


2 Search for the LB resources:

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list | grep demo
| d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 | test/demo                                                                   | 758d38e2352449eaa9d6ae554d0650e9 | 172.30.202.242 | ACTIVE              | ovn      |
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer show d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2020-06-01T14:25:55                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| listeners           | 9ebd024c-0872-40db-a056-5f514920b8ef |
| name                | test/demo                            |
| operating_status    | ONLINE                               |
| pools               | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
| project_id          | 758d38e2352449eaa9d6ae554d0650e9     |
| provider            | ovn                                  |
| provisioning_status | ACTIVE                               |
| updated_at          | 2020-06-01T14:26:13                  |
| vip_address         | 172.30.202.242                       |
| vip_network_id      | 0adf99a6-4b4e-4909-9537-680d4031de65 |
| vip_port_id         | 5e249dce-ca31-42c2-be80-e2fa4ac2ff35 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | 27dec9e5-623f-499d-a62d-512759da16cd |
+---------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer member list 80248153-89a8-4f4a-ad03-7a71d7ca5898
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name           | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f | test/demo:8080 | 758d38e2352449eaa9d6ae554d0650e9 | ACTIVE              | 10.128.115.234 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+

3 Set LB resources to ERROR:

MariaDB [octavia]> update member set provisioning_status='ERROR' where id='a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f';
Query OK, 1 row affected (0.002 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [octavia]> update pool set provisioning_status='ERROR'  where id='80248153-89a8-4f4a-ad03-7a71d7ca5898';
Query OK, 1 row affected (0.002 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [octavia]> update listener set provisioning_status='ERROR'  where id='9ebd024c-0872-40db-a056-5f514920b8ef';
Query OK, 1 row affected (0.003 sec)
Rows matched: 1  Changed: 1  Warnings: 0

4 Check ERROR status:

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer member list 80248153-89a8-4f4a-ad03-7a71d7ca5898
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name           | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f | test/demo:8080 | 758d38e2352449eaa9d6ae554d0650e9 | ERROR               | 10.128.115.234 |          8080 | NO_MONITOR       |      1 |

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer pool show 80248153-89a8-4f4a-ad03-7a71d7ca5898
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| admin_state_up       | True                                 |
| created_at           | 2020-06-01T14:26:11                  |
| description          |                                      |
| healthmonitor_id     |                                      |
| id                   | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
| lb_algorithm         | SOURCE_IP_PORT                       |
| listeners            | 9ebd024c-0872-40db-a056-5f514920b8ef |
| loadbalancers        | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| members              | a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f |
| name                 | test/demo:TCP:80                     |
| operating_status     | ONLINE                               |
| project_id           | 758d38e2352449eaa9d6ae554d0650e9     |
| protocol             | TCP                                  |
| provisioning_status  | ERROR                                |
| session_persistence  | None                                 |
| updated_at           | 2020-06-01T14:26:13                  |
| tls_container_ref    | None                                 |
| ca_tls_container_ref | None                                 |
| crl_container_ref    | None                                 |
| tls_enabled          | False                                |
+----------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer listener show 9ebd024c-0872-40db-a056-5f514920b8ef
+-----------------------------+--------------------------------------+
| Field                       | Value                                |
+-----------------------------+--------------------------------------+
| admin_state_up              | True                                 |
| connection_limit            | -1                                   |
| created_at                  | 2020-06-01T14:26:09                  |
| default_pool_id             | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
| default_tls_container_ref   | None                                 |
| description                 |                                      |
| id                          | 9ebd024c-0872-40db-a056-5f514920b8ef |
| insert_headers              | None                                 |
| l7policies                  |                                      |
| loadbalancers               | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| name                        | test/demo:TCP:80                     |
| operating_status            | ONLINE                               |
| project_id                  | 758d38e2352449eaa9d6ae554d0650e9     |
| protocol                    | TCP                                  |
| protocol_port               | 80                                   |
| provisioning_status         | ERROR                                |
| sni_container_refs          | []                                   |
| timeout_client_data         | 50000                                |
| timeout_member_connect      | 5000                                 |
| timeout_member_data         | 50000                                |
| timeout_tcp_inspect         | 0                                    |
| updated_at                  | 2020-06-01T14:26:13                  |
| client_ca_tls_container_ref | None                                 |
| client_authentication       | NONE                                 |
| client_crl_container_ref    | None                                 |
| allowed_cidrs               | None                                 |
+-----------------------------+--------------------------------------+



5 Trigger regeneration by generating a kuryr event: 

oc edit endpoints -n test

Remove  openstack.org/kuryr-lbaas-state element.

6 Check new status:

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list | grep demo
| d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 | test/demo                                                                   | 758d38e2352449eaa9d6ae554d0650e9 | 172.30.202.242 | ACTIVE              | ovn      |
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer show d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2020-06-01T14:25:55                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| listeners           | e665ca54-768c-4d5e-b9a9-5b27b9c34389 |
| name                | test/demo                            |
| operating_status    | ONLINE                               |
| pools               | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
|                     | 868bb9dc-241b-45b2-95e4-9d2cf34f87d7 |
| project_id          | 758d38e2352449eaa9d6ae554d0650e9     |
| provider            | ovn                                  |
| provisioning_status | ACTIVE                               |
| updated_at          | 2020-06-01T14:30:12                  |
| vip_address         | 172.30.202.242                       |
| vip_network_id      | 0adf99a6-4b4e-4909-9537-680d4031de65 |
| vip_port_id         | 5e249dce-ca31-42c2-be80-e2fa4ac2ff35 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | 27dec9e5-623f-499d-a62d-512759da16cd |
+---------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer member list  868bb9dc-241b-45b2-95e4-9d2cf34f87d7
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name           | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 2dbc2049-fe36-4ea2-ad1f-b451759bf014 | test/demo:8080 | 758d38e2352449eaa9d6ae554d0650e9 | ACTIVE              | 10.128.115.234 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+


7 Check that connectivity is working:

(overcloud) [stack@undercloud-0 ~]$ oc rsh demo-caller curl  172.30.202.242
demo: HELLO! I AM ALIVE!!!

No errors observed in kuryr-controller logs.

Comment 4 errata-xmlrpc 2020-07-13 17:41:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409