Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1949541

Summary: Kuryr-Controller crashes when it's missing the status object
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: MichaƂ Dulko <mdulko>
Networking sub component: kuryr QA Contact: rlobillo
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: mdemaced, mdulko, pmannidi
Version: 4.7Keywords: Triaged
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1968418 (view as bug list) Environment:
Last Closed: 2021-06-15 09:26:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1933880    
Bug Blocks: 1949540, 1968418    

Description OpenShift BugZilla Robot 2021-04-14 13:59:09 UTC
+++ This bug was initially created as a clone of Bug #1933880 +++

Description of problem:
In some situations, it is necessary to forcefully delete Octavia LB's from OpenStack. In this situation, the way we prompt Kuryr to recreate them, is by removing the information from the status object in the kuryrloadbalancer CRD:

Which needs be changed to: {}
$ oc get kuryrloadbalancer -n openshift-monitoring grafana -o jsonpath='{.status}' | jq .
{}

If the user inadvertently deletes the status object though, this will force kuryr-controller to return a traceback that ultimately it is unable to recover from until the status object is returned.

Version-Release number of selected component (if applicable):
bash-4.4$ rpm -qa | grep kuryr
python3-kuryr-lib-1.1.1-0.20190923160834.41e6964.el8ost.noarch
python3-kuryr-kubernetes-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch
openshift-kuryr-controller-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch
openshift-kuryr-common-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch

How reproducible:
100%

Steps to Reproduce:
1. Edit the kuryrloadbalancer CRD for one of the LB's:
oc edit kuryrloadbalancer -n openshift-monitoring grafana
Remove everything from status: down. Including the status: line

eg:
[...]
  - name: https
    port: 3000
    protocol: TCP
    targetPort: https
  project_id: e75466bcb2eb4cf590026be2d94d95ef
  provider: ovn
  security_groups_ids:
  - e9d30328-ea13-4434-9ed2-fe8f4ddb3173
  subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd
  type: ClusterIP
status:
  listeners:
  - id: ea42c50c-b86f-40d7-a98a-310b46f16b70
    loadbalancer_id: 88648171-6441-4e16-8bd8-7959b9a52fae
    name: openshift-monitoring/grafana:TCP:3000
[...]

To this:
[...]
  - name: https
    port: 3000
    protocol: TCP
    targetPort: https
  project_id: e75466bcb2eb4cf590026be2d94d95ef
  provider: ovn
  security_groups_ids:
  - e9d30328-ea13-4434-9ed2-fe8f4ddb3173
  subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd
  type: ClusterIP
[...]


2. Observe kuryr-controller starts failing with the following traceback:
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'MODIFIED', 'object': {'apiVersion': 'openstack.org/v1', 'kind': 'KuryrLoadBalancer', 'metadata': {'creationTimestamp': '2021-03-01T06:08:28Z', 'finalizers': ['kuryr.openstack.org/kuryrloadbalancer-finalizers'], 'generation': 34, 'managedFields': [{'apiVersion': 'openstack.org/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {}, 'v:"kuryr.openstack.org/kuryrloadbalancer-finalizers"': {}}}, 'f:spec': {'.': {}, 'f:endpointSlices': {}, 'f:ip': {}, 'f:ports': {}, 'f:project_id': {}, 'f:provider': {}, 'f:security_groups_ids': {}, 'f:subnet_id': {}, 'f:type': {}}}, 'manager': 'python-requests', 'operation': 'Update', 'time': '2021-03-01T22:30:36Z'}], 'name': 'grafana', 'namespace': 'openshift-monitoring', 'resourceVersion': '2140553', 'selfLink': '/apis/openstack.org/v1/namespaces/openshift-monitoring/kuryrloadbalancers/grafana', 'uid': '1e8a70c2-350d-418c-b876-152cbb7d2f4b'}, 'spec': {'endpointSlices': [{'endpoints': [{'addresses': ['10.128.57.183'], 'conditions': {'ready': True}, 'targetRef': {'kind': 'Pod', 'name': 'grafana-6f4d96d7fd-vm8sv', 'namespace': 'openshift-monitoring', 'resourceVersion': '63165', 'uid': '04630764-2c7e-4e86-a4e8-f986f26931cd'}}], 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP'}]}], 'ip': '172.30.88.169', 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP', 'targetPort': 'https'}], 'project_id': 'e75466bcb2eb4cf590026be2d94d95ef', 'provider': 'ovn', 'security_groups_ids': ['e9d30328-ea13-4434-9ed2-fe8f4ddb3173'], 'subnet_id': '0b048882-9b6c-4a5d-97eb-e613645c90fd', 'type': 'ClusterIP'}}}: KeyError: 'status'
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 80, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self.on_present(obj)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 65, in on_present
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     crd_lb = loadbalancer_crd['status'].get('loadbalancer')
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'status'
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging
2021-03-01 22:35:01.243 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrLoadBalancerHandler is dead.


Actual results:
kuryr-controller crashes without the status object

Expected results:
If the status object is required, it shouldn't be something that can be removed.

Additional info:
I only tested this on OCP4.7. But I suspect it would be the same on 4.6

--- Additional comment from mdulko on 2021-03-03 11:42:44 UTC ---

Putting this on medium sev/prio as we have an easy workaround - just make sure to put {} as status if you want to clear it.

Comment 4 rlobillo 2021-06-09 13:22:06 UTC
Verified on 4.7.0-0.nightly-2021-06-07-203428 on OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia enabled.

loadbalancer replacement procedure worked fine.

$ oc get pods
NAME                    READY   STATUS    RESTARTS   AGE
demo-56c97d6845-2jfkc   1/1     Running   0          34s
demo-56c97d6845-7zwtp   1/1     Running   0          34s
demo-56c97d6845-ggcrf   1/1     Running   0          34s

$ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144
demo-56c97d6845-2jfkc: HELLO! I AM ALIVE!!!

$ openstack loadbalancer delete demo/demo --cascade

$ oc edit -n demo klb/demo
kuryrloadbalancer.openstack.org/demo edited
# ^remove from status until the end, including key 'status'.

$ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144
^Ccommand terminated with exit code 130

# Wait few seconds and the LB is recreated:
(shiftstack) [stack@undercloud-0 ~]$ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144
demo-56c97d6845-ggcrf: HELLO! I AM ALIVE!!!

$ oc get klb -n demo demo -o json | jq .status
{
  "listeners": [
    {
      "id": "d717a73e-43cb-4655-95f4-3f09d530062d",
      "loadbalancer_id": "2e778a64-bb0c-4cca-9424-6fd217414b23",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "2e778a64-bb0c-4cca-9424-6fd217414b23",
    "ip": "172.30.36.144",
    "name": "demo/demo",
    "port_id": "d2422c84-913d-4bbd-a947-2c865343a399",
    "project_id": "b20e10e10b514fb8a196b7734776b991",
    "provider": "ovn",
    "security_groups": [
      "aa9fc689-211e-460c-9635-fe7d0104aad2"
    ],
    "subnet_id": "9e439c38-6a46-410e-a4a5-ff892facd55a"
  },
  "members": [
    {
      "id": "d81fac86-a54d-4f4c-8fae-99ff484f357f",
      "ip": "10.128.124.136",
      "name": "demo/demo-56c97d6845-2jfkc:8080",
      "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "port": 8080,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986"
    },
    {
      "id": "511fbe93-f435-4c37-905a-f6265a49f2e7",
      "ip": "10.128.124.186",
      "name": "demo/demo-56c97d6845-ggcrf:8080",
      "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "port": 8080,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986"
    },
    {
      "id": "ababdff0-1cc1-4fea-aa82-d80f030ab1e7",
      "ip": "10.128.125.180",
      "name": "demo/demo-56c97d6845-7zwtp:8080",
      "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "port": 8080,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986"
    }
  ],
  "pools": [
    {
      "id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "listener_id": "d717a73e-43cb-4655-95f4-3f09d530062d",
      "loadbalancer_id": "2e778a64-bb0c-4cca-9424-6fd217414b23",
      "name": "demo/demo:TCP:80",
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "protocol": "TCP"
    }
  ]
}

kuryr-controller remains stable during this process:

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-8j96g                     1/1     Running   0          99m
kuryr-cni-bt4xv                     1/1     Running   0          119m
kuryr-cni-gvg42                     1/1     Running   0          119m
kuryr-cni-k5m4v                     1/1     Running   0          104m
kuryr-cni-nvfxs                     1/1     Running   0          119m
kuryr-cni-zkjv9                     1/1     Running   0          105m
kuryr-controller-68b6cf9567-dbzq9   1/1     Running   0          5m56s

Comment 6 errata-xmlrpc 2021-06-15 09:26:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2286