Description of problem: Kuryr controller continuously crashing with CRITICAL error "Asynchronous handler terminated abnormally" Version-Release number of selected component (if applicable): 3.11.394 How reproducible: This is hard to reproduce. Maybe a way to do it is to remove pod details in the endpoint annotations and restart the kuryr contoller so that when it starts up again it will report the same problem. This is the error that is likely the cause of this CRITICAL error: kuyr-controller logs: ------------------ 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher Traceback (most recent call last): 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/watcher.py", line 181, in _watch 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher for event in self._client.watch(path): 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/k8s_client.py", line 265, in watch 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher for line in response.iter_lines(): 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib/python2.7/site-packages/requests/models.py", line 793, in iter_lines 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode): 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib/python2.7/site-packages/requests/models.py", line 749, in generate 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher for chunk in self.raw.stream(chunk_size, decode_content=True): 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 572, in stream 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher for line in self.read_chunked(amt, decode_content=decode_content): 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 793, in read_chunked 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher self._original_response.close() 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher self.gen.throw(type, value, traceback) 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 449, in _error_catcher 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher raise SSLError(e) 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher SSLError: ('timed out',) 2021-03-16 22:49:26.754 1 ERROR kuryr_kubernetes.watcher 2021-03-16 22:49:26.755 1 WARNING kuryr_kubernetes.watcher [-] Restarting(0) watching '/api/v1/namespaces'.: SSLError: ('timed out',) 2021-03-16 22:49:27.423 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified. 2021-03-16 22:49:27.423 1 INFO werkzeug [-] XX.XX.192.214 - - [16/Mar/2021 22:49:27] "GET /ready HTTP/1.1" 200 - 2021-03-16 22:49:29.755 1 INFO kuryr_kubernetes.watcher [-] Started watching '/api/v1/namespaces' 2021-03-16 22:49:34.448 1 ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating member: {"debuginfo": null, "faultcode": "Client", "faultstring": "Another member on this pool is already using ip XX.XX.7.218 on protocol_port 9201"} ~~~
Verified in openshift-ansible-3.11.420 on top of OSP 13.0.15 (2021-03-24.1). Verification steps: + Create a project and a pod $ oc new-project ns1 $ oc run --image kuryr/demo pod1 deploymentconfig.apps.openshift.io/pod1 created + Create a service $ oc expose dc/pod1 --port 80 --target-port 8080 service/pod1 exposed $ oc get pods,svc NAME READY STATUS RESTARTS AGE pod/pod1-1-hjr62 1/1 Running 0 1h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/pod1 ClusterIP 172.30.37.46 <none> 80/TCP 1h + List the LB, the pool, and the pool member $ openstack loadbalancer list +--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+ ... +--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+ | 9be53f10-f613-432a-963d-1c8a8fece9c6 | ns1/pod1 | fb279e1b675448f79f6bda5aa86643c0 | 172.30.37.46 | ACTIVE | amphora | +--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+ $ openstack loadbalancer pool list --loadbalancer 9be53f10-f613-432a-963d-1c8a8fece9c6 +--------------------------------------+-----------------+----------------------------------+---------------------+----------+--------------+----------------+ | id | name | project_id | provisioning_status | protocol | lb_algorithm | admin_state_up | +--------------------------------------+-----------------+----------------------------------+---------------------+----------+--------------+----------------+ | 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f | ns1/pod1:TCP:80 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE | TCP | ROUND_ROBIN | True | +--------------------------------------+-----------------+----------------------------------+---------------------+----------+--------------+----------------+ $ openstack loadbalancer member list 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f +--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight | +--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | 248ab9e9-06ed-4d27-8977-166cf40a7241 | ns1/pod1-1-hjr62:8080 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE | 10.11.13.100 | 8080 | NO_MONITOR | 1 | +--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ + Get the endpoint openstack.org/kuryr-lbaas-state annotation for the service (notice the member name ns1/pod1-1-hjr62:8080) [openshift@master-0 ~]$ oc -n ns1 get endpoints pod1 -o yaml ... openstack.org/kuryr-lbaas-state: '{"versioned_object.data": {"listeners": [{"versioned_object.changes": ["id"], "versioned_object.data": {"id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4", "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80", "port": 80, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"}, "versioned_object.name": "LBaaSListener", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}], "loadbalancer": {"versioned_object.data": {"id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "ip": "172.30.37.46", "name": "ns1/pod1", "port_id": "22bb950e-c5a0-4bca-a8ab-fe988cdd18df", "project_id": "fb279e1b675448f79f6bda5aa86643c0", "provider": "amphora", "security_groups": ["af8554fb-20e7-4b18-9829-74e391645bc0", "cb6248ab-19e5-48c1-b9bc-ef37355145da"], "subnet_id": "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name": "LBaaSLoadBalancer", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.3"}, "members": [{"versioned_object.changes": ["id"], "versioned_object.data": {"id": "248ab9e9-06ed-4d27-8977-166cf40a7241", "ip": "10.11.13.100", "name": "ns1/pod1-1-hjr62:8080", "pool_id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f", <<<---- "port": 8080, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "subnet_id": "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name": "LBaaSMember", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}], "pools": [{"versioned_object.changes": ["id"], "versioned_object.data": {"id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f", "listener_id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4", "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80", "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"}, "versioned_object.name": "LBaaSPool", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}], "service_pub_ip_info": null}, "versioned_object.name": "LBaaSState", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}' ... + Update the LB member name to a different name $ openstack loadbalancer member set --name ns1/pod1-123-xxx45:8080 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f 248ab9e9-06ed-4d27-8977-166cf40a7241 + Check the new LB member name in the pool (notice new name ns1/pod1-123-xxx45:8080) $ openstack loadbalancer member list 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f +--------------------------------------+-------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight | +--------------------------------------+-------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | 248ab9e9-06ed-4d27-8977-166cf40a7241 | ns1/pod1-123-xxx45:8080 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE | 10.11.13.100 | 8080 | NO_MONITOR | 1 | +--------------------------------------+-------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ + Remove the endpoint openstack.org/kuryr-lbaas-state annotation (which still refers to correct LB name) [openshift@master-0 ~]$ oc -n ns1 edit endpoints pod1 (remove openstack.org/kuryr-lbaas-state annotation) endpoints/pod1 edited + Check LB member name is correctly set to original name (with target pod name) by Kuryr $ openstack loadbalancer member list 68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f +--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight | +--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | 248ab9e9-06ed-4d27-8977-166cf40a7241 | ns1/pod1-1-hjr62:8080 | fb279e1b675448f79f6bda5aa86643c0 | ACTIVE | 10.11.13.100 | 8080 | NO_MONITOR | 1 | +--------------------------------------+-----------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ + Check the LB member is annotated in the endpoint [openshift@master-0 ~]$ oc -n ns1 get endpoints pod1 -o yaml ... openstack.org/kuryr-lbaas-state: '{"versioned_object.data": {"listeners": [{"versioned_object.changes": ["id"], "versioned_object.data": {"id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4", "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80", "port": 80, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"}, "versioned_object.name": "LBaaSListener", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}], "loadbalancer": {"versioned_object.data": {"id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "ip": "172.30.37.46", "name": "ns1/pod1", "port_id": "22bb950e-c5a0-4bca-a8ab-fe988cdd18df", "project_id": "fb279e1b675448f79f6bda5aa86643c0", "provider": "amphora", "security_groups": ["af8554fb-20e7-4b18-9829-74e391645bc0", "cb6248ab-19e5-48c1-b9bc-ef37355145da"], "subnet_id": "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name": "LBaaSLoadBalancer", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.3"}, "members": [{"versioned_object.changes": ["id", "name"], "versioned_object.data": {"id": "248ab9e9-06ed-4d27-8977-166cf40a7241", "ip": "10.11.13.100", "name": "ns1/pod1-1-hjr62:8080", "pool_id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f", <<<---- "port": 8080, "project_id": "fb279e1b675448f79f6bda5aa86643c0", "subnet_id": "e88d0c86-3263-4602-b862-f0ca56abcb6f"}, "versioned_object.name": "LBaaSMember", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}], "pools": [{"versioned_object.changes": ["id"], "versioned_object.data": {"id": "68d0d610-1247-4c3b-b4b4-06cfa0cb8a2f", "listener_id": "bd7773d2-a052-41e2-8dfa-0e844e1b00c4", "loadbalancer_id": "9be53f10-f613-432a-963d-1c8a8fece9c6", "name": "ns1/pod1:TCP:80", "project_id": "fb279e1b675448f79f6bda5aa86643c0", "protocol": "TCP"}, "versioned_object.name": "LBaaSPool", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}], "service_pub_ip_info": null}, "versioned_object.name": "LBaaSState", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}' ... + Check Kuryr controller pod wasn't restarted. Next error messages are expected as the result of updating the member name for an existing load balancer, pool and member: ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating loadbalancer: {"debuginfo": null, "faultcode": "Server", "faultstring": "Provider 'amphora' reports error: IP address 172.30.37.46 already allocated in subnet e88d0c86-3263-4602-b862-f0ca56abcb6f\nNeutron server returns request_ids: ['req-e35ff60f-cbcb-484e-a709-90bb10957fe1']"} ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating listener: {"debuginfo": null, "faultcode": "Client", "faultstring": "Another Listener on this Load Balancer is already using protocol TCP and protocol_port 80"} ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating pool: {"debuginfo": null, "faultcode": "Client", "faultstring": "This listener already has a default pool"} ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating member: {"debuginfo": null, "faultcode": "Client", "faultstring": "Another member on this pool is already using ip 10.11.13.100 on protocol_port 8080"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.420 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1147