Bug 1939961 - Kuryr controller continuously crashing with CRITICAL error "Asynchronous handler terminated abnormally"
Summary: Kuryr controller continuously crashing with CRITICAL error "Asynchronous hand...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 3.11.z
Assignee: Maysa Macedo
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-17 11:02 UTC by Mohammad
Modified: 2021-04-21 20:21 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-21 20:21:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 485 0 None closed Bug 1939961: Ensure LB member is updated if a conflict happens 2021-03-24 21:49:14 UTC
Red Hat Product Errata RHBA-2021:1147 0 None None None 2021-04-21 20:21:40 UTC

Description Mohammad 2021-03-17 11:02:05 UTC
Description of problem: Kuryr controller continuously crashing with CRITICAL error "Asynchronous handler terminated abnormally"


Version-Release number of selected component (if applicable): 3.11.394


How reproducible: This is hard to reproduce. Maybe a way to do it is to remove pod details in the endpoint annotations and restart the kuryr contoller so that when it starts up again it will report the same problem.

This is the error that is likely the cause of this CRITICAL error:

kuyr-controller logs:
------------------

2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher Traceback (most recent call last):
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/watcher.py", line 181, in _watch
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     for event in self._client.watch(path):
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/k8s_client.py", line 265, in watch
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     for line in response.iter_lines():
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/requests/models.py", line 793, in iter_lines
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/requests/models.py", line 749, in generate
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     for chunk in self.raw.stream(chunk_size, decode_content=True):
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 572, in stream
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     for line in self.read_chunked(amt, decode_content=decode_content):
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 793, in read_chunked
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     self._original_response.close()
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     self.gen.throw(type, value, traceback)
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher   File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 449, in _error_catcher
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher     raise SSLError(e)
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher SSLError: ('timed out',)
2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher
2021-03-16 22:49:26.755 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/namespaces'.: SSLError: ('timed out',)
2021-03-16 22:49:27.423 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified.
2021-03-16 22:49:27.423 1 INFO werkzeug [-] XX.XX.192.214 - - [16/Mar/2021 22:49:27] "GET /ready HTTP/1.1" 200 -
2021-03-16 22:49:29.755 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/namespaces'
2021-03-16 22:49:34.448 1 ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating member: {"debuginfo": null, "faultcode": "Client", "faultstring": "Another member on this pool is already using ip XX.XX.7.218 on protocol_port 9201"}
~~~

Comment 9 Jon Uriarte 2021-04-16 16:02:59 UTC
Verified in openshift-ansible-3.11.420 on top of OSP 13.0.15 (2021-03-24.1).

Verification steps:

+ Create a project and a pod
$ oc new-project ns1
$ oc run --image kuryr/demo pod1
deploymentconfig.apps.openshift.io/pod1 created


+ Create a service
$ oc expose dc/pod1 --port 80 --target-port 8080
service/pod1 exposed

$ oc get pods,svc
NAME               READY     STATUS    RESTARTS   AGE
pod/pod1-1-hjr62   1/1       Running   0          1h

NAME           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/pod1   ClusterIP   172.30.37.46   <none>        80/TCP    1h


+ List the LB, the pool, and the pool member
$ openstack loadbalancer list
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
| id                                   | name                                           | project_id                       | vip_address    | provisioning_status | provider |
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
...
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
| 9be53f10-f613-432a-963d-1c8a8fece9c6 | ns1/pod1                                       | fb279e1b675448f79f6bda5aa86643c0 | 172.30.37.46   | ACTIVE              | amphora  |
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
$ openstack loadbalancer pool list --loadbalancer 9be53f10-f613-432a-963d-1c8a8fece9c6
+--------------------------------------+-----------------+----------------------------------+---------------------+----------+--------------+----------------+
| id                                   | name            | project_id                       | provisioning_status | protocol | lb_algorithm | admin_state_up |
+--------------------------------------+-----------------+----------------------------------+---------------------+----------+--------------+----------------+
| 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f | ns1/pod1:TCP:80 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE              | TCP      | ROUND_ROBIN  | True           |
+--------------------------------------+-----------------+----------------------------------+---------------------+----------+--------------+----------------+
$ openstack loadbalancer member list 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f
+--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| id                                   | name                  | project_id                       | provisioning_status | address      | protocol_port | operating_status | weight |
+--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| 248ab9e9-06ed-4d27-8977-166cf40a7241 | ns1/pod1-1-hjr62:8080 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE              | 10.11.13.100 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+


+ Get the endpoint openstack.org/kuryr-lbaas-state annotation for the service (notice the member name ns1/pod1-1-hjr62:8080)

[openshift@master-0 ~]$ oc -n ns1 get endpoints pod1 -o yaml
...
    openstack.org/kuryr-lbaas-state: '{"versioned_object.data": {"listeners": [{"versioned_object.changes":
      ["id"], "versioned_object.data": {"id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4",
      "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80",
      "port": 80, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"},
      "versioned_object.name": "LBaaSListener", "versioned_object.namespace": "kuryr_kubernetes",
      "versioned_object.version": "1.0"}], "loadbalancer": {"versioned_object.data":
      {"id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "ip": "172.30.37.46", "name":
      "ns1/pod1", "port_id": "22bb950e-c5a0-4bca-a8ab-fe988cdd18df", "project_id":
      "fb279e1b675448f79f6bda5aa86643c0", "provider": "amphora", "security_groups":
      ["af8554fb-20e7-4b18-9829-74e391645bc0", "cb6248ab-19e5-48c1-b9bc-ef37355145da"],
      "subnet_id": "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name":
      "LBaaSLoadBalancer", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.3"}, "members": [{"versioned_object.changes": ["id"], "versioned_object.data":
      {"id": "248ab9e9-06ed-4d27-8977-166cf40a7241", "ip": "10.11.13.100", "name":
      "ns1/pod1-1-hjr62:8080", "pool_id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f",        <<<----
      "port": 8080, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "subnet_id":
      "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name": "LBaaSMember",
      "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.0"}], "pools": [{"versioned_object.changes": ["id"], "versioned_object.data":
      {"id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f", "listener_id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4",
      "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80",
      "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"}, "versioned_object.name":
      "LBaaSPool", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.1"}], "service_pub_ip_info": null}, "versioned_object.name": "LBaaSState",
      "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.0"}'
...


+ Update the LB member name to a different name
$ openstack loadbalancer member set --name ns1/pod1-123-xxx45:8080 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f 248ab9e9-06ed-4d27-8977-166cf40a7241


+ Check the new LB member name in the pool (notice new name ns1/pod1-123-xxx45:8080)
$ openstack loadbalancer member list 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f
+--------------------------------------+-------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| id                                   | name                    | project_id                       | provisioning_status | address      | protocol_port | operating_status | weight |
+--------------------------------------+-------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| 248ab9e9-06ed-4d27-8977-166cf40a7241 | ns1/pod1-123-xxx45:8080 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE              | 10.11.13.100 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+-------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+


+ Remove the endpoint openstack.org/kuryr-lbaas-state annotation (which still refers to correct LB name)
[openshift@master-0 ~]$ oc -n ns1 edit endpoints pod1
(remove openstack.org/kuryr-lbaas-state annotation)
endpoints/pod1 edited


+ Check LB member name is correctly set to original name (with target pod name) by Kuryr
$ openstack loadbalancer member list 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f
+--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| id                                   | name                  | project_id                       | provisioning_status | address      | protocol_port | operating_status | weight |
+--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| 248ab9e9-06ed-4d27-8977-166cf40a7241 | ns1/pod1-1-hjr62:8080 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE              | 10.11.13.100 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+


+ Check the LB member is annotated in the endpoint
[openshift@master-0 ~]$ oc -n ns1 get endpoints pod1 -o yaml
...
    openstack.org/kuryr-lbaas-state: '{"versioned_object.data": {"listeners": [{"versioned_object.changes":
      ["id"], "versioned_object.data": {"id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4",
      "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80",
      "port": 80, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"},
      "versioned_object.name": "LBaaSListener", "versioned_object.namespace": "kuryr_kubernetes",
      "versioned_object.version": "1.0"}], "loadbalancer": {"versioned_object.data":
      {"id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "ip": "172.30.37.46", "name":
      "ns1/pod1", "port_id": "22bb950e-c5a0-4bca-a8ab-fe988cdd18df", "project_id":
      "fb279e1b675448f79f6bda5aa86643c0", "provider": "amphora", "security_groups":
      ["af8554fb-20e7-4b18-9829-74e391645bc0", "cb6248ab-19e5-48c1-b9bc-ef37355145da"],
      "subnet_id": "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name":
      "LBaaSLoadBalancer", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.3"}, "members": [{"versioned_object.changes": ["id", "name"], "versioned_object.data":
      {"id": "248ab9e9-06ed-4d27-8977-166cf40a7241", "ip": "10.11.13.100", "name":
      "ns1/pod1-1-hjr62:8080", "pool_id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f",        <<<----
      "port": 8080, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "subnet_id":
      "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name": "LBaaSMember",
      "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.0"}], "pools": [{"versioned_object.changes": ["id"], "versioned_object.data":
      {"id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f", "listener_id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4",
      "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80",
      "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"}, "versioned_object.name":
      "LBaaSPool", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.1"}], "service_pub_ip_info": null}, "versioned_object.name": "LBaaSState",
      "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
      "1.0"}'
...


+ Check Kuryr controller pod wasn't restarted. Next error messages are expected as the result of updating the member name for an existing load balancer, pool and member:

ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating loadbalancer: {"debuginfo": null, "faultcode": "Server", "faultstring": "Provider 'amphora' reports error: IP address 172.30.37.46 already allocated in subnet e88d0c86-3263-4602-b862-f0ca56abcb6f\nNeutron server returns request_ids: ['req-e35ff60f-cbcb-484e-a709-90bb10957fe1']"}
ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating listener: {"debuginfo": null, "faultcode": "Client", "faultstring": "Another Listener on this Load Balancer is already using protocol TCP
and protocol_port 80"}
ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating pool: {"debuginfo": null, "faultcode": "Client", "faultstring": "This listener already has a default pool"}
ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating member: {"debuginfo": null, "faultcode": "Client", "faultstring": "Another member on this pool is already using ip 10.11.13.100 on protocol_port 8080"}

Comment 12 errata-xmlrpc 2021-04-21 20:21:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.420 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1147


Note You need to log in before you can comment on or make changes to this bug.