Bug 1786141 - Timeout adding member with host networking causes continuous kuryr-controller restart
Summary: Timeout adding member with host networking causes continuous kuryr-controller...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.3.0
Assignee: Maysa Macedo
QA Contact: GenadiC
URL:
Whiteboard:
Depends On: 1786140
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-23 16:01 UTC by Luis Tomas Bolivar
Modified: 2020-01-23 11:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1786140
Environment:
Last Closed: 2020-01-23 11:19:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 125 0 None closed [release-4.3] Bug 1786141: Ensure LB member is removed upon pod removal 2020-01-27 10:27:40 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:20:00 UTC

Description Luis Tomas Bolivar 2019-12-23 16:01:19 UTC
+++ This bug was initially created as a clone of Bug #1786140 +++

When pods that run on host network are recreated, if they are pointed by a svc, the removal of the load balancer member is ignored, as right now only the IP of the pod is considered and as pods are using host networking the IPs match. That leads to an error when trying to create the new member as there is already one member from the same IP and port in the pool. And at the end it makes the kuryr-controller to restart due to a timeout adding that member:

2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'ADDED', 'object': {'kind': 'Endpoints', 'apiVersion': 'v1', 'metadata': {'name': 'router-internal-default', 'namespace': 'openshift-ingress', 'selfLink
': '/api/v1/namespaces/openshift-ingress/endpoints/router-internal-default', 'uid': '437a211f-f600-41aa-950f-ef1dbe1c086f', 'resourceVersion': '616957', 'creationTimestamp': '2019-12-16T13:55:14Z', 'labels': {'ingresscontroller.operator.openshift.io/owning-ingresscontroller': 'default'}, 'annotations': {'openstack.org/kuryr-lbaas-spec': '{"versioned_object.data": {"ip": "172.30.166.47", "lb_ip": null, "ports": [{"versioned_object.data": {"name": "http", "port": 80, "protocol": "TCP", "targetPort": "http
"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}, {"versioned_object.data": {"name": "https", "port": 443, "protocol": "TCP", "targetPort": "https"}, "versioned_object.name
": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}, {"versioned_object.data": {"name": "metrics", "port": 1936, "protocol": "TCP", "targetPort": "1936"}, "versioned_object.name": "LBaaSPortSpec", "ver
sioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}], "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "security_groups_ids": ["e57be5a7-233e-4967-9f01-db93ee88cfe4"], "subnet_id": "5a3c9983-6941-49c7-a3a3-cfeefd7206ef", "ty
pe": "ClusterIP"}, "versioned_object.name": "LBaaSServiceSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}', 'openstack.org/kuryr-lbaas-state': '{"versioned_object.data": {"listeners": [{"versioned_object.changes
": ["id"], "versioned_object.data": {"id": "9f37a80e-3321-48bd-9747-ab593fd3c28d", "loadbalancer_id": "657bbd30-0d06-4973-bced-c805c7d6f9b4", "name": "openshift-ingress/router-internal-default:TCP:80", "port": 80, "project_id": "2e8ae5df09fe4be7a87766c48
40c44a9", "protocol": "TCP"}, "versioned_object.name": "LBaaSListener", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}, {"versioned_object.changes": ["id"], "versioned_object.data": {"id": "cbdc346f-9c61-4ba3-8ece-20
da0d02f24a", "loadbalancer_id": "657bbd30-0d06-4973-bced-c805c7d6f9b4", "name": "openshift-ingress/router-internal-default:TCP:443", "port": 443, "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "protocol": "TCP"}, "versioned_object.name": "LBaaSListene
r", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}, {"versioned_object.changes": ["id"], "versioned_object.data": {"id": "2d02e0a4-d2cc-48fe-a90c-505d2c032c03", "loadbalancer_id": "657bbd30-0d06-4973-bced-c805c7d6f9b
4", "name": "openshift-ingress/router-internal-default:TCP:1936", "port": 1936, "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "protocol": "TCP"}, "versioned_object.name": "LBaaSListener", "versioned_object.namespace": "kuryr_kubernetes", "versioned_o
bject.version": "1.0"}], "loadbalancer": {"versioned_object.data": {"id": "657bbd30-0d06-4973-bced-c805c7d6f9b4", "ip": "172.30.166.47", "name": "openshift-ingress/router-internal-default", "port_id": "aea8425e-d7a4-4643-aa89-76042f6b9527", "project_id":
 "2e8ae5df09fe4be7a87766c4840c44a9", "provider": "octavia", "security_groups": ["e57be5a7-233e-4967-9f01-db93ee88cfe4"], "subnet_id": "5a3c9983-6941-49c7-a3a3-cfeefd7206ef"}, "versioned_object.name": "LBaaSLoadBalancer", "versioned_object.namespace": "ku
ryr_kubernetes", "versioned_object.version": "1.3"}, "members": [{"versioned_object.data": {"id": "25329e4d-2465-47ad-9cb0-8d55b60cb253", "ip": "10.196.0.29", "name": "openshift-ingress/router-default-65bb9fc54f-wxx2p:80", "pool_id": "7b1a80d6-c692-45be-
90fd-1e5605d1bb2c", "port": 80, "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "subnet_id": "5a3c9983-6941-49c7-a3a3-cfeefd7206ef"}, "versioned_object.name": "LBaaSMember", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
"1.0"}, {"versioned_object.data": {"id": "c0f4ebbe-4aa5-45bf-88c0-6307c3fba090", "ip": "10.196.0.29", "name": "openshift-ingress/router-default-65bb9fc54f-wxx2p:1936", "pool_id": "12832d99-de79-4aeb-ac48-9a9e2db74d29", "port": 1936, "project_id": "2e8ae5
df09fe4be7a87766c4840c44a9", "subnet_id": "5a3c9983-6941-49c7-a3a3-cfeefd7206ef"}, "versioned_object.name": "LBaaSMember", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}, {"versioned_object.data": {"id": "59b91e5e-64
06-4985-bf06-99ee5fd68c3b", "ip": "10.196.0.29", "name": "openshift-ingress/router-default-65bb9fc54f-wxx2p:443", "pool_id": "42e6575a-b823-4bb3-9251-ac3bf19be8fc", "port": 443, "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "subnet_id": "5a3c9983-694
1-49c7-a3a3-cfeefd7206ef"}, "versioned_object.name": "LBaaSMember", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}], "pools": [{"versioned_object.changes": ["id"], "versioned_object.data": {"id": "7b1a80d6-c692-45be-
90fd-1e5605d1bb2c", "listener_id": "9f37a80e-3321-48bd-9747-ab593fd3c28d", "loadbalancer_id": "657bbd30-0d06-4973-bced-c805c7d6f9b4", "name": "openshift-ingress/router-internal-default:TCP:80", "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "protocol"
: "TCP"}, "versioned_object.name": "LBaaSPool", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}, {"versioned_object.changes": ["id"], "versioned_object.data": {"id": "42e6575a-b823-4bb3-9251-ac3bf19be8fc", "listener_i
d": "cbdc346f-9c61-4ba3-8ece-20da0d02f24a", "loadbalancer_id": "657bbd30-0d06-4973-bced-c805c7d6f9b4", "name": "openshift-ingress/router-internal-default:TCP:443", "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "protocol": "TCP"}, "versioned_object.na
me": "LBaaSPool", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}, {"versioned_object.changes": ["id"], "versioned_object.data": {"id": "12832d99-de79-4aeb-ac48-9a9e2db74d29", "listener_id": "2d02e0a4-d2cc-48fe-a90c-5
05d2c032c03", "loadbalancer_id": "657bbd30-0d06-4973-bced-c805c7d6f9b4", "name": "openshift-ingress/router-internal-default:TCP:1936", "project_id": "2e8ae5df09fe4be7a87766c4840c44a9", "protocol": "TCP"}, "versioned_object.name": "LBaaSPool", "versioned_
object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.1"}], "service_pub_ip_info": null}, "versioned_object.name": "LBaaSState", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}'}}, 'subsets': [{'addres
ses': [{'ip': '10.196.0.18', 'nodeName': 'ostest-2sf2t-worker-6vfqx', 'targetRef': {'kind': 'Pod', 'namespace': 'openshift-ingress', 'name': 'router-default-65bb9fc54f-8cwlf', 'uid': 'c3bc3998-9235-4f62-84c7-ce5d1dd6a195', 'resourceVersion': '76385'}}, {
'ip': '10.196.0.29', 'nodeName': 'ostest-2sf2t-worker-5r9z9', 'targetRef': {'kind': 'Pod', 'namespace': 'openshift-ingress', 'name': 'router-default-65bb9fc54f-wxx2p', 'uid': '9a145551-932a-4343-b73d-16461f3419f8', 'resourceVersion': '32199'}}], 'ports':
 [{'name': 'http', 'port': 80, 'protocol': 'TCP'}, {'name': 'metrics', 'port': 1936, 'protocol': 'TCP'}, {'name': 'https', 'port': 443, 'protocol': 'TCP'}]}]}}: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: LBaaSMember(id=<?>,ip=10.19
6.0.18,name='openshift-ingress/router-default-65bb9fc54f-8cwlf:80',pool_id=7b1a80d6-c692-45be-90fd-1e5605d1bb2c,port=80,project_id='2e8ae5df09fe4be7a87766c4840c44a9',subnet_id=5a3c9983-6941-49c7-a3a3-cfeefd7206ef)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 90, in __call__
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging self._handler.set_liveness(alive=False)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging self.force_reraise()
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging six.reraise(self.type_, self.value, self.tb)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging raise value
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 78, in __call__
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 75, in __call__
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging self.on_present(obj)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 183, in on_present
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging if self._sync_lbaas_members(endpoints, lbaas_state, lbaas_spec):
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 274, in _sync_lbaas_members
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging self._add_new_members(endpoints, lbaas_state, lbaas_spec)):
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 394, in _add_new_members
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging listener_port=listener_port)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 494, in ensure_member
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging self._find_member)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 723, in _ensure_provisioned
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging raise k_exc.ResourceNotReady(obj)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: LBaaSMember(id=<?>,ip=10.196.0.18,name='openshift-ingress/router-default-65bb9fc54f-8cwlf:80',pool_id=7b1a80d6-c692-45be-9
0fd-1e5605d1bb2c,port=80,project_id='2e8ae5df09fe4be7a87766c4840c44a9',subnet_id=5a3c9983-6941-49c7-a3a3-cfeefd7206ef)
2019-12-18 12:56:32.040 1 ERROR kuryr_kubernetes.handlers.logging
2019-12-18 12:56:36.350 1 INFO werkzeug [-] 10.196.0.30 - - [18/Dec/2019 12:56:36] "GET /alive HTTP/1.1" 500 -

Comment 2 Itzik Brown 2020-01-05 08:48:43 UTC
Hi,
Please add information about how to verify.

Comment 3 Luis Tomas Bolivar 2020-01-07 08:11:25 UTC
Remove a pod with hostnetworking that is part of a service (like the router pods) and check that the new member is properly added instead of the previous one being there. Also check the kuryr-controller does not show the above error message

Comment 4 Itzik Brown 2020-01-07 15:14:35 UTC
Removed one of router pods and didn't see any error message like the one above.
OSP13
OCP 4.3.0-0.nightly-2020-01-06-185654

Comment 6 errata-xmlrpc 2020-01-23 11:19:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.