Bug 1949540

Summary: Kuryr-Controller crashes when it's missing the status object
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: sscavnic <sscavnic>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: mdulko, pmannidi
Version: 4.7   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-14 14:00:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1933880, 1949541, 1968418    
Bug Blocks:    

Description OpenShift BugZilla Robot 2021-04-14 13:58:50 UTC
+++ This bug was initially created as a clone of Bug #1933880 +++

Description of problem:
In some situations, it is necessary to forcefully delete Octavia LB's from OpenStack. In this situation, the way we prompt Kuryr to recreate them, is by removing the information from the status object in the kuryrloadbalancer CRD:

Which needs be changed to: {}
$ oc get kuryrloadbalancer -n openshift-monitoring grafana -o jsonpath='{.status}' | jq .
{}

If the user inadvertently deletes the status object though, this will force kuryr-controller to return a traceback that ultimately it is unable to recover from until the status object is returned.

Version-Release number of selected component (if applicable):
bash-4.4$ rpm -qa | grep kuryr
python3-kuryr-lib-1.1.1-0.20190923160834.41e6964.el8ost.noarch
python3-kuryr-kubernetes-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch
openshift-kuryr-controller-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch
openshift-kuryr-common-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch

How reproducible:
100%

Steps to Reproduce:
1. Edit the kuryrloadbalancer CRD for one of the LB's:
oc edit kuryrloadbalancer -n openshift-monitoring grafana
Remove everything from status: down. Including the status: line

eg:
[...]
  - name: https
    port: 3000
    protocol: TCP
    targetPort: https
  project_id: e75466bcb2eb4cf590026be2d94d95ef
  provider: ovn
  security_groups_ids:
  - e9d30328-ea13-4434-9ed2-fe8f4ddb3173
  subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd
  type: ClusterIP
status:
  listeners:
  - id: ea42c50c-b86f-40d7-a98a-310b46f16b70
    loadbalancer_id: 88648171-6441-4e16-8bd8-7959b9a52fae
    name: openshift-monitoring/grafana:TCP:3000
[...]

To this:
[...]
  - name: https
    port: 3000
    protocol: TCP
    targetPort: https
  project_id: e75466bcb2eb4cf590026be2d94d95ef
  provider: ovn
  security_groups_ids:
  - e9d30328-ea13-4434-9ed2-fe8f4ddb3173
  subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd
  type: ClusterIP
[...]


2. Observe kuryr-controller starts failing with the following traceback:
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'MODIFIED', 'object': {'apiVersion': 'openstack.org/v1', 'kind': 'KuryrLoadBalancer', 'metadata': {'creationTimestamp': '2021-03-01T06:08:28Z', 'finalizers': ['kuryr.openstack.org/kuryrloadbalancer-finalizers'], 'generation': 34, 'managedFields': [{'apiVersion': 'openstack.org/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {}, 'v:"kuryr.openstack.org/kuryrloadbalancer-finalizers"': {}}}, 'f:spec': {'.': {}, 'f:endpointSlices': {}, 'f:ip': {}, 'f:ports': {}, 'f:project_id': {}, 'f:provider': {}, 'f:security_groups_ids': {}, 'f:subnet_id': {}, 'f:type': {}}}, 'manager': 'python-requests', 'operation': 'Update', 'time': '2021-03-01T22:30:36Z'}], 'name': 'grafana', 'namespace': 'openshift-monitoring', 'resourceVersion': '2140553', 'selfLink': '/apis/openstack.org/v1/namespaces/openshift-monitoring/kuryrloadbalancers/grafana', 'uid': '1e8a70c2-350d-418c-b876-152cbb7d2f4b'}, 'spec': {'endpointSlices': [{'endpoints': [{'addresses': ['10.128.57.183'], 'conditions': {'ready': True}, 'targetRef': {'kind': 'Pod', 'name': 'grafana-6f4d96d7fd-vm8sv', 'namespace': 'openshift-monitoring', 'resourceVersion': '63165', 'uid': '04630764-2c7e-4e86-a4e8-f986f26931cd'}}], 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP'}]}], 'ip': '172.30.88.169', 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP', 'targetPort': 'https'}], 'project_id': 'e75466bcb2eb4cf590026be2d94d95ef', 'provider': 'ovn', 'security_groups_ids': ['e9d30328-ea13-4434-9ed2-fe8f4ddb3173'], 'subnet_id': '0b048882-9b6c-4a5d-97eb-e613645c90fd', 'type': 'ClusterIP'}}}: KeyError: 'status'
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 80, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self.on_present(obj)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 65, in on_present
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     crd_lb = loadbalancer_crd['status'].get('loadbalancer')
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'status'
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging
2021-03-01 22:35:01.243 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrLoadBalancerHandler is dead.


Actual results:
kuryr-controller crashes without the status object

Expected results:
If the status object is required, it shouldn't be something that can be removed.

Additional info:
I only tested this on OCP4.7. But I suspect it would be the same on 4.6

--- Additional comment from mdulko on 2021-03-03 11:42:44 UTC ---

Putting this on medium sev/prio as we have an easy workaround - just make sure to put {} as status if you want to clear it.

Comment 2 MichaƂ Dulko 2021-04-14 14:00:26 UTC

*** This bug has been marked as a duplicate of bug 1933880 ***