Bug 1949541 - Kuryr-Controller crashes when it's missing the status object
Summary: Kuryr-Controller crashes when it's missing the status object
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.z
Assignee: Michał Dulko
QA Contact: rlobillo
URL:
Whiteboard:
Depends On: 1933880
Blocks: 1949540 1968418
TreeView+ depends on / blocked
 
Reported: 2021-04-14 13:59 UTC by OpenShift BugZilla Robot
Modified: 2021-06-15 09:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1968418 (view as bug list)
Environment:
Last Closed: 2021-06-15 09:26:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 501 0 None open [release-4.7] Bug 1949541: Fixing bug, Kuryr-Controller crashes when it's missing the status 2021-06-02 12:36:41 UTC
Red Hat Product Errata RHSA-2021:2286 0 None None None 2021-06-15 09:27:26 UTC

Description OpenShift BugZilla Robot 2021-04-14 13:59:09 UTC
+++ This bug was initially created as a clone of Bug #1933880 +++

Description of problem:
In some situations, it is necessary to forcefully delete Octavia LB's from OpenStack. In this situation, the way we prompt Kuryr to recreate them, is by removing the information from the status object in the kuryrloadbalancer CRD:

Which needs be changed to: {}
$ oc get kuryrloadbalancer -n openshift-monitoring grafana -o jsonpath='{.status}' | jq .
{}

If the user inadvertently deletes the status object though, this will force kuryr-controller to return a traceback that ultimately it is unable to recover from until the status object is returned.

Version-Release number of selected component (if applicable):
bash-4.4$ rpm -qa | grep kuryr
python3-kuryr-lib-1.1.1-0.20190923160834.41e6964.el8ost.noarch
python3-kuryr-kubernetes-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch
openshift-kuryr-controller-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch
openshift-kuryr-common-4.7.0-202101262230.p0.git.2494.cd95ce5.el8.noarch

How reproducible:
100%

Steps to Reproduce:
1. Edit the kuryrloadbalancer CRD for one of the LB's:
oc edit kuryrloadbalancer -n openshift-monitoring grafana
Remove everything from status: down. Including the status: line

eg:
[...]
  - name: https
    port: 3000
    protocol: TCP
    targetPort: https
  project_id: e75466bcb2eb4cf590026be2d94d95ef
  provider: ovn
  security_groups_ids:
  - e9d30328-ea13-4434-9ed2-fe8f4ddb3173
  subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd
  type: ClusterIP
status:
  listeners:
  - id: ea42c50c-b86f-40d7-a98a-310b46f16b70
    loadbalancer_id: 88648171-6441-4e16-8bd8-7959b9a52fae
    name: openshift-monitoring/grafana:TCP:3000
[...]

To this:
[...]
  - name: https
    port: 3000
    protocol: TCP
    targetPort: https
  project_id: e75466bcb2eb4cf590026be2d94d95ef
  provider: ovn
  security_groups_ids:
  - e9d30328-ea13-4434-9ed2-fe8f4ddb3173
  subnet_id: 0b048882-9b6c-4a5d-97eb-e613645c90fd
  type: ClusterIP
[...]


2. Observe kuryr-controller starts failing with the following traceback:
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'MODIFIED', 'object': {'apiVersion': 'openstack.org/v1', 'kind': 'KuryrLoadBalancer', 'metadata': {'creationTimestamp': '2021-03-01T06:08:28Z', 'finalizers': ['kuryr.openstack.org/kuryrloadbalancer-finalizers'], 'generation': 34, 'managedFields': [{'apiVersion': 'openstack.org/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {}, 'v:"kuryr.openstack.org/kuryrloadbalancer-finalizers"': {}}}, 'f:spec': {'.': {}, 'f:endpointSlices': {}, 'f:ip': {}, 'f:ports': {}, 'f:project_id': {}, 'f:provider': {}, 'f:security_groups_ids': {}, 'f:subnet_id': {}, 'f:type': {}}}, 'manager': 'python-requests', 'operation': 'Update', 'time': '2021-03-01T22:30:36Z'}], 'name': 'grafana', 'namespace': 'openshift-monitoring', 'resourceVersion': '2140553', 'selfLink': '/apis/openstack.org/v1/namespaces/openshift-monitoring/kuryrloadbalancers/grafana', 'uid': '1e8a70c2-350d-418c-b876-152cbb7d2f4b'}, 'spec': {'endpointSlices': [{'endpoints': [{'addresses': ['10.128.57.183'], 'conditions': {'ready': True}, 'targetRef': {'kind': 'Pod', 'name': 'grafana-6f4d96d7fd-vm8sv', 'namespace': 'openshift-monitoring', 'resourceVersion': '63165', 'uid': '04630764-2c7e-4e86-a4e8-f986f26931cd'}}], 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP'}]}], 'ip': '172.30.88.169', 'ports': [{'name': 'https', 'port': 3000, 'protocol': 'TCP', 'targetPort': 'https'}], 'project_id': 'e75466bcb2eb4cf590026be2d94d95ef', 'provider': 'ovn', 'security_groups_ids': ['e9d30328-ea13-4434-9ed2-fe8f4ddb3173'], 'subnet_id': '0b048882-9b6c-4a5d-97eb-e613645c90fd', 'type': 'ClusterIP'}}}: KeyError: 'status'
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 80, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event, *args, **kwargs)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     self.on_present(obj)
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/loadbalancer.py", line 65, in on_present
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging     crd_lb = loadbalancer_crd['status'].get('loadbalancer')
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'status'
2021-03-01 22:35:00.876 1 ERROR kuryr_kubernetes.handlers.logging
2021-03-01 22:35:01.243 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrLoadBalancerHandler is dead.


Actual results:
kuryr-controller crashes without the status object

Expected results:
If the status object is required, it shouldn't be something that can be removed.

Additional info:
I only tested this on OCP4.7. But I suspect it would be the same on 4.6

--- Additional comment from mdulko on 2021-03-03 11:42:44 UTC ---

Putting this on medium sev/prio as we have an easy workaround - just make sure to put {} as status if you want to clear it.

Comment 4 rlobillo 2021-06-09 13:22:06 UTC
Verified on 4.7.0-0.nightly-2021-06-07-203428 on OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia enabled.

loadbalancer replacement procedure worked fine.

$ oc get pods
NAME                    READY   STATUS    RESTARTS   AGE
demo-56c97d6845-2jfkc   1/1     Running   0          34s
demo-56c97d6845-7zwtp   1/1     Running   0          34s
demo-56c97d6845-ggcrf   1/1     Running   0          34s

$ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144
demo-56c97d6845-2jfkc: HELLO! I AM ALIVE!!!

$ openstack loadbalancer delete demo/demo --cascade

$ oc edit -n demo klb/demo
kuryrloadbalancer.openstack.org/demo edited
# ^remove from status until the end, including key 'status'.

$ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144
^Ccommand terminated with exit code 130

# Wait few seconds and the LB is recreated:
(shiftstack) [stack@undercloud-0 ~]$ oc rsh -n demo demo-56c97d6845-2jfkc curl 172.30.36.144
demo-56c97d6845-ggcrf: HELLO! I AM ALIVE!!!

$ oc get klb -n demo demo -o json | jq .status
{
  "listeners": [
    {
      "id": "d717a73e-43cb-4655-95f4-3f09d530062d",
      "loadbalancer_id": "2e778a64-bb0c-4cca-9424-6fd217414b23",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "2e778a64-bb0c-4cca-9424-6fd217414b23",
    "ip": "172.30.36.144",
    "name": "demo/demo",
    "port_id": "d2422c84-913d-4bbd-a947-2c865343a399",
    "project_id": "b20e10e10b514fb8a196b7734776b991",
    "provider": "ovn",
    "security_groups": [
      "aa9fc689-211e-460c-9635-fe7d0104aad2"
    ],
    "subnet_id": "9e439c38-6a46-410e-a4a5-ff892facd55a"
  },
  "members": [
    {
      "id": "d81fac86-a54d-4f4c-8fae-99ff484f357f",
      "ip": "10.128.124.136",
      "name": "demo/demo-56c97d6845-2jfkc:8080",
      "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "port": 8080,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986"
    },
    {
      "id": "511fbe93-f435-4c37-905a-f6265a49f2e7",
      "ip": "10.128.124.186",
      "name": "demo/demo-56c97d6845-ggcrf:8080",
      "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "port": 8080,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986"
    },
    {
      "id": "ababdff0-1cc1-4fea-aa82-d80f030ab1e7",
      "ip": "10.128.125.180",
      "name": "demo/demo-56c97d6845-7zwtp:8080",
      "pool_id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "port": 8080,
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "subnet_id": "49a98dc1-3ff2-4559-9058-353aee886986"
    }
  ],
  "pools": [
    {
      "id": "36497620-6e00-46c9-b7df-6ed3f83928f7",
      "listener_id": "d717a73e-43cb-4655-95f4-3f09d530062d",
      "loadbalancer_id": "2e778a64-bb0c-4cca-9424-6fd217414b23",
      "name": "demo/demo:TCP:80",
      "project_id": "b20e10e10b514fb8a196b7734776b991",
      "protocol": "TCP"
    }
  ]
}

kuryr-controller remains stable during this process:

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-8j96g                     1/1     Running   0          99m
kuryr-cni-bt4xv                     1/1     Running   0          119m
kuryr-cni-gvg42                     1/1     Running   0          119m
kuryr-cni-k5m4v                     1/1     Running   0          104m
kuryr-cni-nvfxs                     1/1     Running   0          119m
kuryr-cni-zkjv9                     1/1     Running   0          105m
kuryr-controller-68b6cf9567-dbzq9   1/1     Running   0          5m56s

Comment 6 errata-xmlrpc 2021-06-15 09:26:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2286


Note You need to log in before you can comment on or make changes to this bug.