Bug 1968418

Summary: Kuryr-Controller crashes when it's missing the status object
Product: OpenShift Container Platform Reporter: Michał Dulko <mdulko>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: rlobillo
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: juriarte, mdemaced, mdulko, openshift-bugzilla-robot, pmannidi, rlobillo, talessio
Version: 4.6.zKeywords: Triaged
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1949541 Environment:
Last Closed: 2021-07-14 07:16:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1949541    
Bug Blocks: 1949540    

Comment 5 rlobillo 2021-06-29 07:41:13 UTC
Verified on 4.6.0-0.nightly-2021-06-24-080044 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia enabled.

loadbalancer replacement procedure worked fine.

Given below project:

$ oc get all -n demo
NAME                        READY   STATUS    RESTARTS   AGE
pod/demo-7897db69cc-58jrq   1/1     Running   0          4d14h
pod/demo-7897db69cc-lhr6z   1/1     Running   0          4d15h
pod/demo-7897db69cc-t9p4n   1/1     Running   0          4d14h

NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/demo   ClusterIP   172.30.159.203   <none>        80/TCP    4d15h

NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo   3/3     3            3           4d15h

NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-7897db69cc   3         3         3       4d15h

$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
demo-7897db69cc-t9p4n: HELLO! I AM ALIVE!!!

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-2c8jl                     1/1     Running   5          4d16h
kuryr-cni-2nvmj                     1/1     Running   0          4d16h
kuryr-cni-96mwt                     1/1     Running   0          4d16h
kuryr-cni-kzk2h                     1/1     Running   5          4d16h
kuryr-cni-th8hn                     1/1     Running   0          4d16h
kuryr-controller-77649ffc44-cqtlh   1/1     Running   1          4d16h
             
$ oc get -n demo klb/demo -o json | jq .status
{
  "listeners": [
    {
      "id": "c224e532-4a6c-4c2e-9ad0-6fb5ad30e3b6",
      "loadbalancer_id": "a6944170-5a19-45c3-b136-203938cac1d6",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "a6944170-5a19-45c3-b136-203938cac1d6",
    "ip": "172.30.159.203",
    "name": "demo/demo",
    "port_id": "b127a9cb-d3ab-4564-afa3-3b1e649003fd",
    "project_id": "f4c2d34e928f4b739da3380a8065d802",
    "provider": "ovn",
    "security_groups": [
      "b33a99a5-de77-477d-aee7-206bdcd375d8"
    ],
    "subnet_id": "5b624a28-00bd-4af4-8a72-2137d82bd710"
  },
  "members": [
    {
      "id": "c88f8075-eac9-4855-8cc3-2d426c971db2",
      "ip": "10.128.116.22",
      "name": "demo/demo-7897db69cc-58jrq:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "80066fd9-21c5-4c5e-b6c6-73aa9edc7cc4",
      "ip": "10.128.116.228",
      "name": "demo/demo-7897db69cc-t9p4n:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "f0b217bc-c327-4c11-a228-824b2b18c485",
      "ip": "10.128.117.199",
      "name": "demo/demo-7897db69cc-lhr6z:8080",
      "pool_id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    }
  ],
  "pools": [
    {
      "id": "0c65b0b9-8e5c-44e4-85e3-b4c59420b792",
      "listener_id": "c224e532-4a6c-4c2e-9ad0-6fb5ad30e3b6",
      "loadbalancer_id": "a6944170-5a19-45c3-b136-203938cac1d6",
      "name": "demo/demo:TCP:80",
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ]
}
                                      

Destroy the loadbalancer and remove the status section from the klb resource:

$ openstack loadbalancer delete demo/demo --cascade
$ oc edit -n demo klb/demo
kuryrloadbalancer.openstack.org/demo edited
# ^remove from status until the end, including key 'status'.
$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
^Ccommand terminated with exit code 130


Triggers the replacement of the loadbalancer after few minutes:

$ oc rsh -n demo pod/demo-7897db69cc-58jrq curl 172.30.159.203
demo-7897db69cc-lhr6z: HELLO! I AM ALIVE!!!

During this process, kuryr-controller remains stable:

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-2c8jl                     1/1     Running   5          4d16h
kuryr-cni-2nvmj                     1/1     Running   0          4d16h
kuryr-cni-96mwt                     1/1     Running   0          4d16h
kuryr-cni-kzk2h                     1/1     Running   5          4d16h
kuryr-cni-th8hn                     1/1     Running   0          4d16h
kuryr-controller-77649ffc44-cqtlh   1/1     Running   1          4d16h


and the status section is updated on the klb resource:

$ oc get klb -n demo demo -o json | jq .status
{
  "listeners": [
    {
      "id": "667d4e9f-a979-4bc5-bc16-7eff6489d4b4",
      "loadbalancer_id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
      "name": "demo/demo:TCP:80",
      "port": 80,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ],
  "loadbalancer": {
    "id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
    "ip": "172.30.159.203",
    "name": "demo/demo",
    "port_id": "b7e037cc-10af-4b6b-afdc-27312303fcca",
    "project_id": "f4c2d34e928f4b739da3380a8065d802",
    "provider": "ovn",
    "security_groups": [
      "b33a99a5-de77-477d-aee7-206bdcd375d8"
    ],
    "subnet_id": "5b624a28-00bd-4af4-8a72-2137d82bd710"
  },
  "members": [
    {
      "id": "edac7cdf-daa2-4e88-be64-97fbfc8262b2",
      "ip": "10.128.116.22",
      "name": "demo/demo-7897db69cc-58jrq:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "5b054fd1-3a3e-449d-9c2c-a8d1e8df3652",
      "ip": "10.128.116.228",
      "name": "demo/demo-7897db69cc-t9p4n:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    },
    {
      "id": "a6af909f-7087-4f02-82e9-d61256f17310",
      "ip": "10.128.117.199",
      "name": "demo/demo-7897db69cc-lhr6z:8080",
      "pool_id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "port": 8080,
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "subnet_id": "318cf2cc-7213-421b-903f-e6acf9653b4d"
    }
  ],
  "pools": [
    {
      "id": "77bfc3f3-b25e-4860-9334-1d44b6909e11",
      "listener_id": "667d4e9f-a979-4bc5-bc16-7eff6489d4b4",
      "loadbalancer_id": "a4264c62-de03-4d36-a67f-f617f76c6c6a",
      "name": "demo/demo:TCP:80",
      "project_id": "f4c2d34e928f4b739da3380a8065d802",
      "protocol": "TCP"
    }
  ]
}

Comment 10 errata-xmlrpc 2021-07-14 07:16:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.38 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2641