Bug 1843463

Summary: kuryrnetworks blocking namespace deletion and making kuryr-controller to crash
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Luis Tomas Bolivar <ltomasbo>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: rlobillo
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:42:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1841846    
Bug Blocks:    

Comment 3 rlobillo 2020-06-08 16:25:54 UTC
Verified on OSP16+Octavia-OVN (RHOS_TRUNK-16.0-RHEL-8-20200513.n.1) with OCP4.5.0-0.nightly-2020-06-08-053957

NP and Conformance run without compromising kuryr-controller stability. No KeyError: 'status' observed.

Manual reproduction also confirms the behaviour. Steps:

#1. Create project with two pods and service.
$ oc new-project test1 && oc run --image kuryr/demo demo && oc run --image kuryr/demo demo-caller && oc expose pod/demo --port 80 --target-port 8080
$ oc get all
NAME              READY   STATUS    RESTARTS   AGE
pod/demo          1/1     Running   0          2m27s
pod/demo-caller   1/1     Running   0          2m27s

NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/demo   ClusterIP   172.30.92.184   <none>        80/TCP    2m26s

$  oc rsh demo-caller curl 172.30.92.184
demo: HELLO! I AM ALIVE!!!

$ oc get kuryrnetworks
NAME    SUBNET-CIDR       AGE
test1   10.128.114.0/23   2m23s


#3. Edit kuryrnetwork CRD and remove the status section. Set the instruction to remove the project immediately before the status is fulfilled again by kuryr-controller:

$ oc edit kuryrnetworks/test1 && oc delete project test1 && oc edit kuryrnetworks/test1 && oc delete project test1 && oc edit kuryrnetworks/test1

>> Remove below:
status:
  netId: 0d3b19f5-e578-4f0b-9949-eaa3ca8afede
  nsLabels: {}
  populated: false
  routerId: 3a218a45-e158-4aad-90aa-ed67c97bc6c7
  subnetCIDR: 10.128.114.0/23
  subnetId: b9156704-b438-43cc-9baf-6d70ccf2ade0

#4. Confirm that KeyError exception does not take place and no restarts took place.

$ oc logs -n openshift-kuryr $(oc get pods -n openshift-kuryr -o jsonpath='{.items[6].metadata.name}')  | grep -i keyerror
$ oc get pods -n openshift-kuryr
NAME                                   READY   STATUS              RESTARTS   AGE
kuryr-cni-6xthm                        1/1     Running             0          49m
kuryr-cni-c6n82                        1/1     Running             0          49m
kuryr-cni-cttpr                        1/1     Running             0          49m
kuryr-cni-kw8bm                        1/1     Running             0          49m
kuryr-cni-m45nr                        1/1     Running             0          49m
kuryr-cni-r2ct2                        1/1     Running             0          49m
kuryr-controller-696c89958-l5qrc       1/1     Running             0          23m
[...]

Comment 4 errata-xmlrpc 2020-07-13 17:42:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409