Bug 1840187 - [Kuryr] LB resources not deleted when are transitioned to ERROR
Summary: [Kuryr] LB resources not deleted when are transitioned to ERROR
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Maysa Macedo
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks: 1840611
TreeView+ depends on / blocked
 
Reported: 2020-05-26 14:20 UTC by Maysa Macedo
Modified: 2020-07-13 17:41 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1840611 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:41:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 245 0 None closed Bug 1840187:Ensure LB resources with ERROR status are deleted 2020-06-24 03:21:03 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:41:57 UTC

Description Maysa Macedo 2020-05-26 14:20:06 UTC
Description of problem:

When a Load Balancer resource, in this case pool, transition to ERROR status Kuryr does not attempt to recreate it and consequently any resource creation that depends on the one with ERROR will fail causing the controller to restart continuously.

[stack@undercloud-0 ~]$ openstack loadbalancer pool show caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d
+----------------------+-----------------------------------------------------+
| Field | Value |
+----------------------+-----------------------------------------------------+
| admin_state_up | True |
| created_at | 2020-05-16T23:37:16 |
| description | |
| healthmonitor_id | |
| id | caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d |
| lb_algorithm | ROUND_ROBIN |
| listeners | 39933e05-d9c6-4ed1-9a04-e4994e3ea4d3 |
| loadbalancers | 6218ba3d-0290-4c25-a5e1-502f374f8b6c |
| members | |
| name | openshift-machine-api/machine-api-operator:TCP:8443 |
| operating_status | ONLINE |
| project_id | f6b6420743ce45ac868c102a523ffde6 |
| protocol | TCP |
| provisioning_status | ERROR |
| session_persistence | None |
| updated_at | 2020-05-16T23:37:20 |
| tls_container_ref | None |
| ca_tls_container_ref | None |
| crl_container_ref | None |
| tls_enabled | False |
+----------------------+-----------------------------------------------------+

2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'ADDED', 'object': {'kind': 'Endpoints', 'apiVersion': 'v1', 'metadata': {'name': 'machine-api-operator', 'namespace': 'openshift-machin
e-api', 'selfLink': '/api/v1/namespaces/openshift-machine-api/endpoints/machine-api-operator', 'uid': 'c87b2853-180c-40ef-8086-1946c099476b', 'resourceVersion': '25606', 'creationTimestamp': '2020-05-16T23:23:16Z', 'labels': {'k8s-app': '
machine-api-operator'}, 'annotations': {'endpoints.kubernetes.io/last-change-trigger-time': '2020-05-16T23:35:05Z', 'openstack.org/kuryr-lbaas-spec': '{"versioned_object.data": {"ip": "172.30.53.190", "lb_ip": null, "ports": [{"versioned_
object.data": {"name": "https", "port": 8443, "protocol": "TCP", "targetPort": "https"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}], "project_id": "f6b64
20743ce45ac868c102a523ffde6", "security_groups_ids": [], "subnet_id": "0e811b62-6954-4459-9d02-a232e06c040c", "type": "ClusterIP"}, "versioned_object.name": "LBaaSServiceSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_
object.version": "1.0"}'}, 'managedFields': [{'manager': 'python-requests', 'operation': 'Update', 'apiVersion': 'v1', 'time': '2020-05-16T23:26:23Z', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'f:openstack.or
g/kuryr-lbaas-spec': {}}}}}, {'manager': 'kube-controller-manager', 'operation': 'Update', 'apiVersion': 'v1', 'time': '2020-05-16T23:35:05Z', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'.': {}, 'f:endpoints.k
ubernetes.io/last-change-trigger-time': {}}, 'f:labels': {'.': {}, 'f:k8s-app': {}}}, 'f:subsets': {}}}]}, 'subsets': [{'addresses': [{'ip': '10.128.56.38', 'nodeName': 'ostest-zm4t6-master-1', 'targetRef': {'kind': 'Pod', 'namespace': 'o
penshift-machine-api', 'name': 'machine-api-operator-7dddccbf5-wcdf5', 'uid': '549d414d-9b8c-4adf-84f4-d101dbc8fe98', 'resourceVersion': '25604'}}], 'ports': [{'name': 'https', 'port': 8443, 'protocol': 'TCP'}]}]}}: kuryr_kubernetes.excep
tions.ResourceNotReady: Resource not ready: LBaaSMember(id=<?>,ip=10.128.56.38,name='openshift-machine-api/machine-api-operator-7dddccbf5-wcdf5:8443',pool_id=caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d,port=8443,project_id='f6b6420743ce45ac868c1
02a523ffde6',subnet_id=0e811b62-6954-4459-9d02-a232e06c040c)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 93, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._handler.set_liveness(alive=False)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self.force_reraise()
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging six.reraise(self.type_, self.value, self.tb)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging raise value
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 78, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 90, in __call__
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self.on_present(obj)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 190, in on_present
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging if self._sync_lbaas_members(endpoints, lbaas_state, lbaas_spec):
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 281, in _sync_lbaas_members
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._add_new_members(endpoints, lbaas_state, lbaas_spec)):
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 388, in _add_new_members
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging listener_port=listener_port)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 450, in ensure_member
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging self._find_member)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 688, in _ensure_provisioned
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging raise k_exc.ResourceNotReady(obj)
2020-05-17 17:32:25.734 1 ERROR kuryr_kubernetes.handlers.logging kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: LBaaSMember(id=<?>,ip=10.128.56.38,name='openshift-machine-api/machine-api-operator-7dddccbf5-wcdf5:8443',
pool_id=caea3e31-db8d-4d1a-afd3-0dc3fa02fd4d,port=8443,project_id='f6b6420743ce45ac868c102a523ffde6',subnet_id=0e811b62-6954-4459-9d02-a232e06c040c)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Transition a LB resource (pool/listener) to ERROR while the LB creation is being handling
2.
3.

Actual results:

Kuryr-Controller continuously restarted due to unable to create others LB resources to complete handling the event.

Expected results: 

Load Balancer resource to get recreated.

Additional info:

Comment 3 rlobillo 2020-06-02 08:54:13 UTC
Verified on OSP16 - RHOS_TRUNK-16.0-RHEL-8-20200513.n.1 and 4.5.0-0.nightly-2020-06-01-043833

1 Create a pod on test namespace:

(overcloud) [stack@undercloud-0 ~]$ oc run --image kuryr/demo demo
pod/demo created
(overcloud) [stack@undercloud-0 ~]$ oc run --image kuryr/demo demo-caller
pod/demo-caller created
(overcloud) [stack@undercloud-0 ~]$ oc expose pod/demo --port 80 --target-port 8080
service/demo exposed

(overcloud) [stack@undercloud-0 ~]$ oc get pods,svc -o wide --show-labels
NAME                      READY   STATUS    RESTARTS   AGE   IP               NODE                        NOMINATED NODE   READINESS GATES   LABELS
pod/demo                  1/1     Running   0          38s   10.128.115.234   ostest-k7pdd-worker-8cf4l   <none>           <none>            run=demo
pod/demo-caller           1/1     Running   0          27s   10.128.115.247   ostest-k7pdd-worker-jpsbt   <none>           <none>            run=demo-caller

NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE   SELECTOR   LABELS
service/demo   ClusterIP   172.30.202.242   <none>        80/TCP    22s   run=demo   run=demo

(overcloud) [stack@undercloud-0 ~]$ oc rsh demo-caller curl  172.30.202.242
demo: HELLO! I AM ALIVE!!!


2 Search for the LB resources:

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list | grep demo
| d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 | test/demo                                                                   | 758d38e2352449eaa9d6ae554d0650e9 | 172.30.202.242 | ACTIVE              | ovn      |
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer show d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2020-06-01T14:25:55                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| listeners           | 9ebd024c-0872-40db-a056-5f514920b8ef |
| name                | test/demo                            |
| operating_status    | ONLINE                               |
| pools               | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
| project_id          | 758d38e2352449eaa9d6ae554d0650e9     |
| provider            | ovn                                  |
| provisioning_status | ACTIVE                               |
| updated_at          | 2020-06-01T14:26:13                  |
| vip_address         | 172.30.202.242                       |
| vip_network_id      | 0adf99a6-4b4e-4909-9537-680d4031de65 |
| vip_port_id         | 5e249dce-ca31-42c2-be80-e2fa4ac2ff35 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | 27dec9e5-623f-499d-a62d-512759da16cd |
+---------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer member list 80248153-89a8-4f4a-ad03-7a71d7ca5898
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name           | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f | test/demo:8080 | 758d38e2352449eaa9d6ae554d0650e9 | ACTIVE              | 10.128.115.234 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+

3 Set LB resources to ERROR:

MariaDB [octavia]> update member set provisioning_status='ERROR' where id='a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f';
Query OK, 1 row affected (0.002 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [octavia]> update pool set provisioning_status='ERROR'  where id='80248153-89a8-4f4a-ad03-7a71d7ca5898';
Query OK, 1 row affected (0.002 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [octavia]> update listener set provisioning_status='ERROR'  where id='9ebd024c-0872-40db-a056-5f514920b8ef';
Query OK, 1 row affected (0.003 sec)
Rows matched: 1  Changed: 1  Warnings: 0

4 Check ERROR status:

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer member list 80248153-89a8-4f4a-ad03-7a71d7ca5898
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name           | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f | test/demo:8080 | 758d38e2352449eaa9d6ae554d0650e9 | ERROR               | 10.128.115.234 |          8080 | NO_MONITOR       |      1 |

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer pool show 80248153-89a8-4f4a-ad03-7a71d7ca5898
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| admin_state_up       | True                                 |
| created_at           | 2020-06-01T14:26:11                  |
| description          |                                      |
| healthmonitor_id     |                                      |
| id                   | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
| lb_algorithm         | SOURCE_IP_PORT                       |
| listeners            | 9ebd024c-0872-40db-a056-5f514920b8ef |
| loadbalancers        | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| members              | a45180e1-d15e-4ef3-9ed6-ac6a7067ac3f |
| name                 | test/demo:TCP:80                     |
| operating_status     | ONLINE                               |
| project_id           | 758d38e2352449eaa9d6ae554d0650e9     |
| protocol             | TCP                                  |
| provisioning_status  | ERROR                                |
| session_persistence  | None                                 |
| updated_at           | 2020-06-01T14:26:13                  |
| tls_container_ref    | None                                 |
| ca_tls_container_ref | None                                 |
| crl_container_ref    | None                                 |
| tls_enabled          | False                                |
+----------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer listener show 9ebd024c-0872-40db-a056-5f514920b8ef
+-----------------------------+--------------------------------------+
| Field                       | Value                                |
+-----------------------------+--------------------------------------+
| admin_state_up              | True                                 |
| connection_limit            | -1                                   |
| created_at                  | 2020-06-01T14:26:09                  |
| default_pool_id             | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
| default_tls_container_ref   | None                                 |
| description                 |                                      |
| id                          | 9ebd024c-0872-40db-a056-5f514920b8ef |
| insert_headers              | None                                 |
| l7policies                  |                                      |
| loadbalancers               | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| name                        | test/demo:TCP:80                     |
| operating_status            | ONLINE                               |
| project_id                  | 758d38e2352449eaa9d6ae554d0650e9     |
| protocol                    | TCP                                  |
| protocol_port               | 80                                   |
| provisioning_status         | ERROR                                |
| sni_container_refs          | []                                   |
| timeout_client_data         | 50000                                |
| timeout_member_connect      | 5000                                 |
| timeout_member_data         | 50000                                |
| timeout_tcp_inspect         | 0                                    |
| updated_at                  | 2020-06-01T14:26:13                  |
| client_ca_tls_container_ref | None                                 |
| client_authentication       | NONE                                 |
| client_crl_container_ref    | None                                 |
| allowed_cidrs               | None                                 |
+-----------------------------+--------------------------------------+



5 Trigger regeneration by generating a kuryr event: 

oc edit endpoints -n test

Remove  openstack.org/kuryr-lbaas-state element.

6 Check new status:

(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list | grep demo
| d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 | test/demo                                                                   | 758d38e2352449eaa9d6ae554d0650e9 | 172.30.202.242 | ACTIVE              | ovn      |
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer show d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2020-06-01T14:25:55                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | d9b5421d-366a-4dab-9d0f-7cd6a3cb7bb9 |
| listeners           | e665ca54-768c-4d5e-b9a9-5b27b9c34389 |
| name                | test/demo                            |
| operating_status    | ONLINE                               |
| pools               | 80248153-89a8-4f4a-ad03-7a71d7ca5898 |
|                     | 868bb9dc-241b-45b2-95e4-9d2cf34f87d7 |
| project_id          | 758d38e2352449eaa9d6ae554d0650e9     |
| provider            | ovn                                  |
| provisioning_status | ACTIVE                               |
| updated_at          | 2020-06-01T14:30:12                  |
| vip_address         | 172.30.202.242                       |
| vip_network_id      | 0adf99a6-4b4e-4909-9537-680d4031de65 |
| vip_port_id         | 5e249dce-ca31-42c2-be80-e2fa4ac2ff35 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | 27dec9e5-623f-499d-a62d-512759da16cd |
+---------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer member list  868bb9dc-241b-45b2-95e4-9d2cf34f87d7
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name           | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 2dbc2049-fe36-4ea2-ad1f-b451759bf014 | test/demo:8080 | 758d38e2352449eaa9d6ae554d0650e9 | ACTIVE              | 10.128.115.234 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+----------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+


7 Check that connectivity is working:

(overcloud) [stack@undercloud-0 ~]$ oc rsh demo-caller curl  172.30.202.242
demo: HELLO! I AM ALIVE!!!

No errors observed in kuryr-controller logs.

Comment 4 errata-xmlrpc 2020-07-13 17:41:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.