+++ This bug was initially created as a clone of Bug #1933880 +++ Description of problem: In some situations, it is necessary to forcefully delete Octavia LB's from OpenStack. In this situation, the way we prompt Kuryr to recreate them, is by removing the information from the status object in the kuryrloadbalancer CRD: Which needs be changed to: {} $ oc get kuryrloadbalancer -n openshift-monitoring grafana -o jsonpath='{.status}' | jq . {} If the user inadvertently deletes the status object though, this will force kuryr-controller to return a traceback that ultimately it is unable to recover from until the status object is returned. Version-Release number of selected component (if applicable): bash-4.4$ rpm -qa | grep kuryr python3-kuryr-lib-1.1.1-0.20190923160834.41e6964.el8ost.noarch python3-kuryr-kubernetes-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch openshift-kuryr-controller-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch openshift-kuryr-common-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch How reproducible: 100% Steps to Reproduce: 1. Edit the kuryrloadbalancer CRD for one of the LB's: oc edit kuryrloadbalancer -n openshift-monitoring grafana Remove everything from status: down. Including the status: line eg: [...] - name: https port: 3000 protocol: TCP targetPort: https project_id: e75466bcb2eb4cf590026be2d94d95ef provider: ovn security_groups_ids: - e9d30328-ea13-4434-9ed2-fe8f4ddb3173 subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd type: ClusterIP status: listeners: - id: ea42c50c-b86f-40d7-a98a-310b46f16b70 loadbalancer_id: 88648171-6441-4e16-8bd8-7959b9a52fae name: openshift-monitoring/grafana:TCP:3000 [...] To this: [...] - name: https port: 3000 protocol: TCP targetPort: https project_id: e75466bcb2eb4cf590026be2d94d95ef provider: ovn security_groups_ids: - e9d30328-ea13-4434-9ed2-fe8f4ddb3173 subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd type: ClusterIP [...] 2. Observe kuryr-controller starts failing with the following traceback: 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'MODIFIED', 'object': {'apiVersion': 'openstack.org/v1', 'kind': 'KuryrLoadBalancer', 'metadata': {'creationTimestamp': '2021-03-01T06:08:28Z', 'finalizers': ['kuryr.openstack.org/kuryrloadbalancer-finalizers'], 'generation': 34, 'managedFields': [{'apiVersion': 'openstack.org/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {}, 'v:"kuryr.openstack.org/kuryrloadbalancer-finalizers"': {}}}, 'f:spec': {'.': {}, 'f:endpointSlices': {}, 'f:ip': {}, 'f:ports': {}, 'f:project_id': {}, 'f:provider': {}, 'f:security_groups_ids': {}, 'f:subnet_id': {}, 'f:type': {}}}, 'manager': 'python-requests', 'operation': 'Update', 'time': '2021-03-01T22:30:36Z'}], 'name': 'grafana', 'namespace': 'openshift-monitoring', 'resourceVersion': '2140553', 'selfLink': '/apis/openstack.org/v1/namespaces/openshift-monitoring/kuryrloadbalancers/grafana', 'uid': '1e8a70c2-350d-418c-b876-152cbb7d2f4b'}, 'spec': {'endpointSlices': [{'endpoints': [{'addresses': ['10.128.57.183'], 'conditions': {'ready': True}, 'targetRef': {'kind': 'Pod', 'name': 'grafana-6f4d96d7fd-vm8sv', 'namespace': 'openshift-monitoring', 'resourceVersion': '63165', 'uid': '04630764-2c7e-4e86-a4e8-f986f26931cd'}}], 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP'}]}], 'ip': '172.30.88.169', 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP', 'targetPort': 'https'}], 'project_id': 'e75466bcb2eb4cf590026be2d94d95ef', 'provider': 'ovn', 'security_groups_ids': ['e9d30328-ea13-4434-9ed2-fe8f4ddb3173'], 'subnet_id': '0b048882-9b6c-4a5d-97eb-e613645c90fd', 'type': 'ClusterIP'}}}: KeyError: 'status' 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last): 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__ 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event, *args, **kwargs) 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 80, in __call__ 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event, *args, **kwargs) 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__ 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging self.on_present(obj) 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 65, in on_present 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging crd_lb = loadbalancer_crd['status'].get('loadbalancer') 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'status' 2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging 2021-03-01 22:35:01.243 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrLoadBalancerHandler is dead. Actual results: kuryr-controller crashes without the status object Expected results: If the status object is required, it shouldn't be something that can be removed. Additional info: I only tested this on OCP4.7. But I suspect it would be the same on 4.6 --- Additional comment from mdulko on 2021-03-03 11:42:44 UTC --- Putting this on medium sev/prio as we have an easy workaround - just make sure to put {} as status if you want to clear it.
Verified on 4.7.0-0.nightly-2021-06-07-203428 on OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia enabled. loadbalancer replacement procedure worked fine. $ oc get pods NAME READY STATUS RESTARTS AGE demo-56c97d6845-2jfkc 1/1 Running 0 34s demo-56c97d6845-7zwtp 1/1 Running 0 34s demo-56c97d6845-ggcrf 1/1 Running 0 34s $ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144 demo-56c97d6845-2jfkc: HELLO! I AM ALIVE!!! $ openstack loadbalancer delete demo/demo --cascade $ oc edit -n demo klb/demo kuryrloadbalancer.openstack.org/demo edited # ^remove from status until the end, including key 'status'. $ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144 ^Ccommand terminated with exit code 130 # Wait few seconds and the LB is recreated: (shiftstack) [stack@undercloud-0 ~]$ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144 demo-56c97d6845-ggcrf: HELLO! I AM ALIVE!!! $ oc get klb -n demo demo -o json | jq .status { "listeners": [ { "id": "d717a73e-43cb-4655-95f4-3f09d530062d", "loadbalancer_id": "2e778a64-bb0c-4cca-9424-6fd217414b23", "name": "demo/demo:TCP:80", "port": 80, "project_id": "b20e10e10b514fb8a196b7734776b991", "protocol": "TCP" } ], "loadbalancer": { "id": "2e778a64-bb0c-4cca-9424-6fd217414b23", "ip": "172.30.36.144", "name": "demo/demo", "port_id": "d2422c84-913d-4bbd-a947-2c865343a399", "project_id": "b20e10e10b514fb8a196b7734776b991", "provider": "ovn", "security_groups": [ "aa9fc689-211e-460c-9635-fe7d0104aad2" ], "subnet_id": "9e439c38-6a46-410e-a4a5-ff892facd55a" }, "members": [ { "id": "d81fac86-a54d-4f4c-8fae-99ff484f357f", "ip": "10.128.124.136", "name": "demo/demo-56c97d6845-2jfkc:8080", "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7", "port": 8080, "project_id": "b20e10e10b514fb8a196b7734776b991", "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986" }, { "id": "511fbe93-f435-4c37-905a-f6265a49f2e7", "ip": "10.128.124.186", "name": "demo/demo-56c97d6845-ggcrf:8080", "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7", "port": 8080, "project_id": "b20e10e10b514fb8a196b7734776b991", "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986" }, { "id": "ababdff0-1cc1-4fea-aa82-d80f030ab1e7", "ip": "10.128.125.180", "name": "demo/demo-56c97d6845-7zwtp:8080", "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7", "port": 8080, "project_id": "b20e10e10b514fb8a196b7734776b991", "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986" } ], "pools": [ { "id": "36497620-6e00-46c9-b7df-6ed3f83928f7", "listener_id": "d717a73e-43cb-4655-95f4-3f09d530062d", "loadbalancer_id": "2e778a64-bb0c-4cca-9424-6fd217414b23", "name": "demo/demo:TCP:80", "project_id": "b20e10e10b514fb8a196b7734776b991", "protocol": "TCP" } ] } kuryr-controller remains stable during this process: $ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-8j96g 1/1 Running 0 99m kuryr-cni-bt4xv 1/1 Running 0 119m kuryr-cni-gvg42 1/1 Running 0 119m kuryr-cni-k5m4v 1/1 Running 0 104m kuryr-cni-nvfxs 1/1 Running 0 119m kuryr-cni-zkjv9 1/1 Running 0 105m kuryr-controller-68b6cf9567-dbzq9 1/1 Running 0 5m56s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.16 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2286