Bug 1968418 - Kuryr-Controller crashes when it's missing the status object
Summary: Kuryr-Controller crashes when it's missing the status object
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.z
Assignee: Michał Dulko
QA Contact: rlobillo
URL:
Whiteboard:
Depends On: 1949541
Blocks: 1949540
TreeView+ depends on / blocked
 
Reported: 2021-06-07 11:26 UTC by Michał Dulko
Modified: 2021-07-14 07:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1949541
Environment:
Last Closed: 2021-07-14 07:16:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 519 0 None open [release-4.6] Bug 1968418: Fixing bug, Kuryr-Controller crashes when it's missing the status 2021-06-16 13:42:14 UTC
Red Hat Product Errata RHBA-2021:2641 0 None None None 2021-07-14 07:16:53 UTC

Comment 5 rlobillo 2021-06-29 07:41:13 UTC
Verified on 4.6.0-0.nightly-2021-06-24-080044 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia enabled.

loadbalancer replacement procedure worked fine.

Given below project:

$ oc get all -n demo
NAME                        READY   STATUS    RESTARTS   AGE
pod/demo-7897db69cc-58jrq   1/1     Running   0          4d14h
pod/demo-7897db69cc-lhr6z   1/1     Running   0          4d15h
pod/demo-7897db69cc-t9p4n   1/1     Running   0          4d14h

NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/demo   ClusterIP   172.30.159.203   <none>        80/TCP    4d15h

NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo   3/3     3            3           4d15h

NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-7897db69cc   3         3         3       4d15h

$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
demo-7897db69cc-t9p4n: HELLO! I AM ALIVE!!!

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-2c8jl                     1/1     Running   5          4d16h
kuryr-cni-2nvmj                     1/1     Running   0          4d16h
kuryr-cni-96mwt                     1/1     Running   0          4d16h
kuryr-cni-kzk2h                     1/1     Running   5          4d16h
kuryr-cni-th8hn                     1/1     Running   0          4d16h
kuryr-controller-77649ffc44-cqtlh   1/1     Running   1          4d16h
             
$ oc get -n demo klb/demo -o json | jq .status
{
  "listeners": [
    {
      "id": "c224e532-4a6c-4c2e-9ad0-6fb5ad30e3b6",
      "loadbalancer_id": "a6944170-5a19-45c3-b136-203938cac1d6",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "a6944170-5a19-45c3-b136-203938cac1d6",
    "ip": "172.30.159.203",
    "name": "demo/demo",
    "port_id": "b127a9cb-d3ab-4564-afa3-3b1e649003fd",
    "project_id": "f4c2d34e928f4b739da3380a8065d802",
    "provider": "ovn",
    "security_groups": [
      "b33a99a5-de77-477d-aee7-206bdcd375d8"
    ],
    "subnet_id": "5b624a28-00bd-4af4-8a72-2137d82bd710"
  },
  "members": [
    {
      "id": "c88f8075-eac9-4855-8cc3-2d426c971db2",
      "ip": "10.128.116.22",
      "name": "demo/demo-7897db69cc-58jrq:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "80066fd9-21c5-4c5e-b6c6-73aa9edc7cc4",
      "ip": "10.128.116.228",
      "name": "demo/demo-7897db69cc-t9p4n:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "f0b217bc-c327-4c11-a228-824b2b18c485",
      "ip": "10.128.117.199",
      "name": "demo/demo-7897db69cc-lhr6z:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    }
  ],
  "pools": [
    {
      "id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "listener_id": "c224e532-4a6c-4c2e-9ad0-6fb5ad30e3b6",
      "loadbalancer_id": "a6944170-5a19-45c3-b136-203938cac1d6",
      "name": "demo/demo:TCP:80",
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ]
}
                                      

Destroy the loadbalancer and remove the status section from the klb resource:

$ openstack loadbalancer delete demo/demo --cascade
$ oc edit -n demo klb/demo
kuryrloadbalancer.openstack.org/demo edited
# ^remove from status until the end, including key 'status'.
$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
^Ccommand terminated with exit code 130


Triggers the replacement of the loadbalancer after few minutes:

$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
demo-7897db69cc-lhr6z: HELLO! I AM ALIVE!!!

During this process, kuryr-controller remains stable:

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-2c8jl                     1/1     Running   5          4d16h
kuryr-cni-2nvmj                     1/1     Running   0          4d16h
kuryr-cni-96mwt                     1/1     Running   0          4d16h
kuryr-cni-kzk2h                     1/1     Running   5          4d16h
kuryr-cni-th8hn                     1/1     Running   0          4d16h
kuryr-controller-77649ffc44-cqtlh   1/1     Running   1          4d16h


and the status section is updated on the klb resource:

$ oc get klb -n demo demo -o json | jq .status
{
  "listeners": [
    {
      "id": "667d4e9f-a979-4bc5-bc16-7eff6489d4b4",
      "loadbalancer_id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
    "ip": "172.30.159.203",
    "name": "demo/demo",
    "port_id": "b7e037cc-10af-4b6b-afdc-27312303fcca",
    "project_id": "f4c2d34e928f4b739da3380a8065d802",
    "provider": "ovn",
    "security_groups": [
      "b33a99a5-de77-477d-aee7-206bdcd375d8"
    ],
    "subnet_id": "5b624a28-00bd-4af4-8a72-2137d82bd710"
  },
  "members": [
    {
      "id": "edac7cdf-daa2-4e88-be64-97fbfc8262b2",
      "ip": "10.128.116.22",
      "name": "demo/demo-7897db69cc-58jrq:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "5b054fd1-3a3e-449d-9c2c-a8d1e8df3652",
      "ip": "10.128.116.228",
      "name": "demo/demo-7897db69cc-t9p4n:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "a6af909f-7087-4f02-82e9-d61256f17310",
      "ip": "10.128.117.199",
      "name": "demo/demo-7897db69cc-lhr6z:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    }
  ],
  "pools": [
    {
      "id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "listener_id": "667d4e9f-a979-4bc5-bc16-7eff6489d4b4",
      "loadbalancer_id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
      "name": "demo/demo:TCP:80",
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ]
}

Comment 10 errata-xmlrpc 2021-07-14 07:16:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.38 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2641


Note You need to log in before you can comment on or make changes to this bug.