1968418 – Kuryr-Controller crashes when it's missing the status object

Bug 1968418 - Kuryr-Controller crashes when it's missing the status object

Summary: Kuryr-Controller crashes when it's missing the status object

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Michał Dulko
QA Contact:	rlobillo
Docs Contact:
URL:
Whiteboard:
Depends On:	1949541
Blocks:	1949540
TreeView+	depends on / blocked

Reported:	2021-06-07 11:26 UTC by Michał Dulko
Modified:	2021-07-14 07:16 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1949541
Environment:
Last Closed:	2021-07-14 07:16:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift kuryr-kubernetes pull 519	0	None	open	[release-4.6] Bug 1968418: Fixing bug, Kuryr-Controller crashes when it's missing the status	2021-06-16 13:42:14 UTC
Red Hat Product Errata	RHBA-2021:2641	0	None	None	None	2021-07-14 07:16:53 UTC

Comment 5 rlobillo 2021-06-29 07:41:13 UTC

Verified on 4.6.0-0.nightly-2021-06-24-080044 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia enabled.

loadbalancer replacement procedure worked fine.

Given below project:

$ oc get all -n demo
NAME                        READY   STATUS    RESTARTS   AGE
pod/demo-7897db69cc-58jrq   1/1     Running   0          4d14h
pod/demo-7897db69cc-lhr6z   1/1     Running   0          4d15h
pod/demo-7897db69cc-t9p4n   1/1     Running   0          4d14h

NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/demo   ClusterIP   172.30.159.203   <none>        80/TCP    4d15h

NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo   3/3     3            3           4d15h

NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-7897db69cc   3         3         3       4d15h

$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
demo-7897db69cc-t9p4n: HELLO! I AM ALIVE!!!

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-2c8jl                     1/1     Running   5          4d16h
kuryr-cni-2nvmj                     1/1     Running   0          4d16h
kuryr-cni-96mwt                     1/1     Running   0          4d16h
kuryr-cni-kzk2h                     1/1     Running   5          4d16h
kuryr-cni-th8hn                     1/1     Running   0          4d16h
kuryr-controller-77649ffc44-cqtlh   1/1     Running   1          4d16h
             
$ oc get -n demo klb/demo -o json | jq .status
{
  "listeners": [
    {
      "id": "c224e532-4a6c-4c2e-9ad0-6fb5ad30e3b6",
      "loadbalancer_id": "a6944170-5a19-45c3-b136-203938cac1d6",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "a6944170-5a19-45c3-b136-203938cac1d6",
    "ip": "172.30.159.203",
    "name": "demo/demo",
    "port_id": "b127a9cb-d3ab-4564-afa3-3b1e649003fd",
    "project_id": "f4c2d34e928f4b739da3380a8065d802",
    "provider": "ovn",
    "security_groups": [
      "b33a99a5-de77-477d-aee7-206bdcd375d8"
    ],
    "subnet_id": "5b624a28-00bd-4af4-8a72-2137d82bd710"
  },
  "members": [
    {
      "id": "c88f8075-eac9-4855-8cc3-2d426c971db2",
      "ip": "10.128.116.22",
      "name": "demo/demo-7897db69cc-58jrq:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "80066fd9-21c5-4c5e-b6c6-73aa9edc7cc4",
      "ip": "10.128.116.228",
      "name": "demo/demo-7897db69cc-t9p4n:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "f0b217bc-c327-4c11-a228-824b2b18c485",
      "ip": "10.128.117.199",
      "name": "demo/demo-7897db69cc-lhr6z:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    }
  ],
  "pools": [
    {
      "id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "listener_id": "c224e532-4a6c-4c2e-9ad0-6fb5ad30e3b6",
      "loadbalancer_id": "a6944170-5a19-45c3-b136-203938cac1d6",
      "name": "demo/demo:TCP:80",
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ]
}
                                      

Destroy the loadbalancer and remove the status section from the klb resource:

$ openstack loadbalancer delete demo/demo --cascade
$ oc edit -n demo klb/demo
kuryrloadbalancer.openstack.org/demo edited
# ^remove from status until the end, including key 'status'.
$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
^Ccommand terminated with exit code 130


Triggers the replacement of the loadbalancer after few minutes:

$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
demo-7897db69cc-lhr6z: HELLO! I AM ALIVE!!!

During this process, kuryr-controller remains stable:

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-2c8jl                     1/1     Running   5          4d16h
kuryr-cni-2nvmj                     1/1     Running   0          4d16h
kuryr-cni-96mwt                     1/1     Running   0          4d16h
kuryr-cni-kzk2h                     1/1     Running   5          4d16h
kuryr-cni-th8hn                     1/1     Running   0          4d16h
kuryr-controller-77649ffc44-cqtlh   1/1     Running   1          4d16h


and the status section is updated on the klb resource:

$ oc get klb -n demo demo -o json | jq .status
{
  "listeners": [
    {
      "id": "667d4e9f-a979-4bc5-bc16-7eff6489d4b4",
      "loadbalancer_id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
    "ip": "172.30.159.203",
    "name": "demo/demo",
    "port_id": "b7e037cc-10af-4b6b-afdc-27312303fcca",
    "project_id": "f4c2d34e928f4b739da3380a8065d802",
    "provider": "ovn",
    "security_groups": [
      "b33a99a5-de77-477d-aee7-206bdcd375d8"
    ],
    "subnet_id": "5b624a28-00bd-4af4-8a72-2137d82bd710"
  },
  "members": [
    {
      "id": "edac7cdf-daa2-4e88-be64-97fbfc8262b2",
      "ip": "10.128.116.22",
      "name": "demo/demo-7897db69cc-58jrq:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "5b054fd1-3a3e-449d-9c2c-a8d1e8df3652",
      "ip": "10.128.116.228",
      "name": "demo/demo-7897db69cc-t9p4n:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "a6af909f-7087-4f02-82e9-d61256f17310",
      "ip": "10.128.117.199",
      "name": "demo/demo-7897db69cc-lhr6z:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    }
  ],
  "pools": [
    {
      "id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "listener_id": "667d4e9f-a979-4bc5-bc16-7eff6489d4b4",
      "loadbalancer_id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
      "name": "demo/demo:TCP:80",
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ]
}

Comment 10 errata-xmlrpc 2021-07-14 07:16:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.38 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2641

Note You need to log in before you can comment on or make changes to this bug.