Bug 1933880
Summary: | Kuryr-Controller crashes when it's missing the status object | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Brendan Shephard <bshephar> | ||||
Component: | Networking | Assignee: | Michał Dulko <mdulko> | ||||
Networking sub component: | kuryr | QA Contact: | rlobillo | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | juriarte, mdulko, openshift-bugzilla-robot, pmannidi | ||||
Version: | 4.7 | Keywords: | Triaged | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-07-27 22:48:44 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1949540, 1949541 | ||||||
Attachments: |
|
Description
Brendan Shephard
2021-03-01 22:38:04 UTC
Putting this on medium sev/prio as we have an easy workaround - just make sure to put {} as status if you want to clear it. *** Bug 1949540 has been marked as a duplicate of this bug. *** Failed on OCP4.8.0-0.nightly-2021-04-17-044339 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia After performing the procedure to replace a lb, kuryr controller got restarted. Given this klb: apiVersion: openstack.org/v1 kind: KuryrLoadBalancer metadata: creationTimestamp: "2021-04-17T09:56:51Z" finalizers: - kuryr.openstack.org/kuryrloadbalancer-finalizers generation: 890 name: demo namespace: demo resourceVersion: "4498869" uid: 3da6ac76-469d-4e9f-a8f8-dbec06f6f516 spec: endpointSlices: - endpoints: - addresses: - 10.128.124.60 conditions: ready: true targetRef: kind: Pod name: demo-7897db69cc-jwc42 namespace: demo resourceVersion: "1798942" uid: 3031075f-8ace-4410-90a0-de46baff5383 - addresses: - 10.128.124.75 conditions: ready: true targetRef: kind: Pod name: demo-7897db69cc-84k96 namespace: demo resourceVersion: "1799470" uid: 0a63bd82-9aa1-4188-b86f-2f09909a61df - addresses: - 10.128.125.112 conditions: ready: true targetRef: kind: Pod name: demo-7897db69cc-gckm8 namespace: demo resourceVersion: "1799771" uid: 782b581a-ef0b-4af4-87f1-3cd1a991c304 ports: - port: 8080 protocol: TCP ip: 172.30.176.102 ports: - port: 80 protocol: TCP targetPort: "8080" project_id: b3a48f657fc144e18838c3dc5db2fac6 provider: ovn security_groups_ids: - 3121fad5-3d32-4c8b-a205-f8f6cbe316e4 subnet_id: 49fbfb3b-f432-4f80-8286-cab626764e85 timeout_client_data: 0 timeout_member_data: 0 type: ClusterIP status: listeners: - id: fb2df0af-bdbc-48b1-a6a1-b24e64a6c5fa loadbalancer_id: 0d194642-1189-4a81-bad4-503c705aaee7 name: demo/demo:TCP:80 port: 80 project_id: b3a48f657fc144e18838c3dc5db2fac6 protocol: TCP timeout_client_data: 50000 timeout_member_data: 50000 loadbalancer: id: 0d194642-1189-4a81-bad4-503c705aaee7 ip: 172.30.176.102 name: demo/demo port_id: 6b7e2e09-6589-42fe-a966-9f6d86f5e70f project_id: b3a48f657fc144e18838c3dc5db2fac6 provider: ovn security_groups: - 3121fad5-3d32-4c8b-a205-f8f6cbe316e4 subnet_id: 49fbfb3b-f432-4f80-8286-cab626764e85 members: - id: 9819b690-de3a-4f83-91cb-b8015113063e ip: 10.128.124.60 name: demo/demo-7897db69cc-jwc42:8080 pool_id: abed4e5a-1a48-4b0f-b244-b86344702957 port: 8080 project_id: b3a48f657fc144e18838c3dc5db2fac6 subnet_id: 537193b4-6ca3-47ed-a759-653f26de318f - id: ad17c769-d857-41d2-860e-020c623a88e5 ip: 10.128.124.75 name: demo/demo-7897db69cc-84k96:8080 pool_id: abed4e5a-1a48-4b0f-b244-b86344702957 port: 8080 project_id: b3a48f657fc144e18838c3dc5db2fac6 subnet_id: 537193b4-6ca3-47ed-a759-653f26de318f - id: 62947cbc-dab9-4bf1-a347-1e18ad25a84b ip: 10.128.125.112 name: demo/demo-7897db69cc-gckm8:8080 pool_id: abed4e5a-1a48-4b0f-b244-b86344702957 port: 8080 project_id: b3a48f657fc144e18838c3dc5db2fac6 subnet_id: 537193b4-6ca3-47ed-a759-653f26de318f pools: - id: abed4e5a-1a48-4b0f-b244-b86344702957 listener_id: fb2df0af-bdbc-48b1-a6a1-b24e64a6c5fa loadbalancer_id: 0d194642-1189-4a81-bad4-503c705aaee7 name: demo/demo:TCP:80 project_id: b3a48f657fc144e18838c3dc5db2fac6 protocol: TCP Below steps were done: openstack loadbalancer delete demo/demo --cascade oc edit -n demo klb/demo # Remove From Status to the end of the file. After some minutes, kuryr-controller crashes: 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer [-] Error updating KuryrLoadbalancer CRD {'apiVersion': 'openstack.org/v1', 'kind': 'KuryrLoadBalancer', 'metadata': {'creationTimestamp': '2021-04-17T09:56:51Z', 'finalizers': ['kuryr.openstack.org/kuryrloadbalancer-finalizers'], 'generation': 899, 'managedFields': [{'apiVersion': 'openstack.org/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {}, 'v:"kuryr.openstack.org/kuryrloadbalancer-finalizers"': {}}}, 'f:spec': {'.': {}, 'f:endpointSlices': {}, 'f:ip': {}, 'f:ports': {}, 'f:project_id': {}, 'f:provider': {}, 'f:security_groups_ids': {}, 'f:subnet_id': {}, 'f:timeout_client_data': {}, 'f:timeout_member_data': {}, 'f:type': {}}, 'f:status': {}}, 'manager': 'python-requests', 'operation': 'Update', 'time': '2021-04-22T12:15:44Z'}], 'name': 'demo', 'namespace': 'demo', 'resourceVersion': '4499113', 'uid': '3da6ac76-469d-4e9f-a8f8-dbec06f6f516'}, 'spec': {'endpointSlices': [{'endpoints': [{'addresses': ['10.128.124.60'], 'conditions': {'ready': True}, 'targetRef': {'kind': 'Pod', 'name': 'demo-7897db69cc-jwc42', 'namespace': 'demo', 'resourceVersion': '1798942', 'uid': '3031075f-8ace-4410-90a0-de46baff5383'}}, {'addresses': ['10.128.124.75'], 'conditions': {'ready': True}, 'targetRef': {'kind': 'Pod', 'name': 'demo-7897db69cc-84k96', 'namespace': 'demo', 'resourceVersion': '1799470', 'uid': '0a63bd82-9aa1-4188-b86f-2f09909a61df'}}, {'addresses': ['10.128.125.112'], 'conditions': {'ready': True}, 'targetRef': {'kind': 'Pod', 'name': 'demo-7897db69cc-gckm8', 'namespace': 'demo', 'resourceVersion': '1799771', 'uid': '782b581a-ef0b-4af4-87f1-3cd1a991c304'}}], 'ports': [{'port': 8080, 'protocol': 'TCP'}]}], 'ip': '172.30.176.102', 'ports': [{'port': 80, 'protocol': 'TCP', 'targetPort': '8080'}], 'project_id': 'b3a48f657fc144e18838c3dc5db2fac6', 'provider': 'ovn', 'security_groups_ids': ['3121fad5-3d32-4c8b-a205-f8f6cbe316e4'], 'subnet_id': '49fbfb3b-f432-4f80-8286-cab626764e85', 'timeout_client_data': 0, 'timeout_member_data': 0, 'type': 'ClusterIP'}, 'status': {'loadbalancer': {'name': 'demo/demo', 'project_id': 'b3a48f657fc144e18838c3dc5db2fac6', 'subnet_id': '49fbfb3b-f432-4f80-8286-cab626764e85', 'ip': '172.30.176.102', 'security_groups': [], 'provider': 'ovn', 'id': 'f09c3033-4d1b-496c-b557-8bd4d08fb1e9', 'port_id': '6c04636a-b7db-4248-a62a-32e6500d248d'}, 'listeners': [{'name': 'demo/demo:TCP:80', 'project_id': 'b3a48f657fc144e18838c3dc5db2fac6', 'loadbalancer_id': 'f09c3033-4d1b-496c-b557-8bd4d08fb1e9', 'protocol': 'TCP', 'port': 80, 'id': '3fab2f4a-1fc5-444e-ab01-d152ecafd3e4'}]}}: kuryr_kubernetes.exceptions.K8sUnprocessableEntity: Unprocessable: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server rejected our request due to an error in our request","reason":"Invalid","details":{},"code":422}\n' 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer Traceback (most recent call last): 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 659, in _add_new_listeners 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer loadbalancer_crd['status']) 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 195, in patch_crd 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer self._raise_from_response(response) 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 110, in _raise_from_response 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer raise exc.K8sUnprocessableEntity(response.text) 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer kuryr_kubernetes.exceptions.K8sUnprocessableEntity: Unprocessable: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server rejected our request due to an error in our request","reason":"Invalid","details":{},"code":422}\n' 2021-04-22 12:28:41.915 1 ERROR kuryr_kubernetes.controller.handlers.loadbalancer ^[[00m 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry [-] Report handler unhealthy KuryrLoadBalancerHandler: kuryr_kubernetes.exceptions.K8sUnprocessableEntity: Unprocessable: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server rejected our request due to an error in our request","reason":"Invalid","details":{},"code":422}\n' 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry Traceback (most recent call last): 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 80, in __call__ 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry self._handler(event, *args, **kwargs) 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__ 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry self.on_present(obj) 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 93, in on_present 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry if self._sync_lbaas_members(loadbalancer_crd): 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 189, in _sync_lbaas_members 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry if self._sync_lbaas_pools(loadbalancer_crd): 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 490, in _sync_lbaas_pools 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry if self._sync_lbaas_listeners(loadbalancer_crd): 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 590, in _sync_lbaas_listeners 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry if self._add_new_listeners(loadbalancer_crd): 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 659, in _add_new_listeners 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry loadbalancer_crd['status']) 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 195, in patch_crd 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry self._raise_from_response(response) 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 110, in _raise_from_response 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry raise exc.K8sUnprocessableEntity(response.text) 2021-04-22 12:28:41.922 1 ERROR kuryr_kubernetes.handlers.retry kuryr_kubernetes.exceptions.K8sUnprocessableEntity: Unprocessable: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server rejected our request due to an error in our request","reason":"Invalid","details":{},"code":422}\n' and after the kuryr-controller restart, the lb is succesfully recreated on OSP and the klb status is fulfilled. Attaching Kuryr Controller logs. Created attachment 1774485 [details]
kuryr controller logs
Verified on OCP4.8.0-0.nightly-2021-05-29-114625 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia enabled. loadbalancer replacement procedure worked fine. Given below project: $ oc get all -n demo NAME READY STATUS RESTARTS AGE pod/demo-7897db69cc-c2nrz 1/1 Running 0 43h pod/demo-7897db69cc-m8hsd 1/1 Running 0 43h pod/demo-7897db69cc-n8zcw 1/1 Running 0 43h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/demo ClusterIP 172.30.64.198 <none> 80/TCP 43h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/demo 3/3 3 3 43h NAME DESIRED CURRENT READY AGE replicaset.apps/demo-7897db69cc 3 3 3 43h (shiftstack) [stack@undercloud-0 ~]$ oc rsh -n demo pod/demo-7897db69cc-c2nrz curl 172.30.64.198 demo-7897db69cc-m8hsd: HELLO! I AM ALIVE!!! Destroy the loadbalancer and remove the status section from the klb resource: $ openstack loadbalancer delete demo/demo --cascade $ oc edit -n demo klb/demo kuryrloadbalancer.openstack.org/demo edited $ oc rsh -n demo pod/demo-7897db69cc-c2nrz curl 172.30.64.198 ^Ccommand terminated with exit code 130 Triggers the replacement of the loadbalancer after few minutes: $ oc rsh -n demo pod/demo-7897db69cc-c2nrz curl 172.30.64.198 demo-7897db69cc-n8zcw: HELLO! I AM ALIVE!!! During this process, kuryr-controller remains stable: $ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-2fbw7 1/1 Running 0 44h kuryr-cni-dtsqx 1/1 Running 0 44h kuryr-cni-ngnsw 1/1 Running 0 45h kuryr-cni-qmw74 1/1 Running 0 45h kuryr-cni-v9sbw 1/1 Running 0 44h kuryr-cni-xr7k5 1/1 Running 0 45h kuryr-controller-7f67c7ffd9-mhrqd 1/1 Running 0 72m and the status section is updated on the klb resource: $ oc get klb -n demo demo -o json | jq .status { "listeners": [ { "id": "f254bafb-d452-4d1b-b4b0-8cc12f8f7390", "loadbalancer_id": "5de8028b-04a6-4415-885f-3ec3097986a8", "name": "demo/demo:TCP:80", "port": 80, "project_id": "c1ac743dc7274e31b3f9fb7c6fa0b4b4", "protocol": "TCP" } ], "loadbalancer": { "id": "5de8028b-04a6-4415-885f-3ec3097986a8", "ip": "172.30.64.198", "name": "demo/demo", "port_id": "675e50fa-c31d-4c3c-b52a-00bf2f06aa7c", "project_id": "c1ac743dc7274e31b3f9fb7c6fa0b4b4", "provider": "ovn", "security_groups": [ "9d27b2e0-ea78-4b26-b853-4310ed166751" ], "subnet_id": "a69803aa-fdb8-4c34-b8c3-ee149e508d9f" }, "members": [ { "id": "4eea3096-5ac9-415b-986a-504b97e00678", "ip": "10.128.124.232", "name": "demo/demo-7897db69cc-m8hsd:8080", "pool_id": "7f38f694-8d66-487c-a101-27465c1a315a", "port": 8080, "project_id": "c1ac743dc7274e31b3f9fb7c6fa0b4b4", "subnet_id": "1bd0f571-eb74-4108-b00f-248f216c1604" }, { "id": "a906cbb8-ff29-4635-b5e8-18d95c3437a8", "ip": "10.128.125.251", "name": "demo/demo-7897db69cc-n8zcw:8080", "pool_id": "7f38f694-8d66-487c-a101-27465c1a315a", "port": 8080, "project_id": "c1ac743dc7274e31b3f9fb7c6fa0b4b4", "subnet_id": "1bd0f571-eb74-4108-b00f-248f216c1604" }, { "id": "88b8c178-25f5-4674-af7f-9353701c7b08", "ip": "10.128.125.76", "name": "demo/demo-7897db69cc-c2nrz:8080", "pool_id": "7f38f694-8d66-487c-a101-27465c1a315a", "port": 8080, "project_id": "c1ac743dc7274e31b3f9fb7c6fa0b4b4", "subnet_id": "1bd0f571-eb74-4108-b00f-248f216c1604" } ], "pools": [ { "id": "7f38f694-8d66-487c-a101-27465c1a315a", "listener_id": "f254bafb-d452-4d1b-b4b0-8cc12f8f7390", "loadbalancer_id": "5de8028b-04a6-4415-885f-3ec3097986a8", "name": "demo/demo:TCP:80", "project_id": "c1ac743dc7274e31b3f9fb7c6fa0b4b4", "protocol": "TCP" } ] } Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |