Bug 1874439

Summary: [Kuryr] kuryr-controller restarts due to attempt to change namespaces that already terminated
Product: OpenShift Container Platform Reporter: Jon Uriarte <juriarte>
Component: NetworkingAssignee: MichaƂ Dulko <mdulko>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: rlobillo
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:36:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Kuryr controller logs none

Description Jon Uriarte 2020-09-01 11:11:55 UTC
Created attachment 1713283 [details]
Kuryr controller logs

Description of problem:

Consecutive kuryr_kubernetes.exceptions.K8sNamespaceTerminating: Forbidden: 'Namespace already terminated' exceptions during conformance tests make the kuryr-controller pod restart.

2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas [-] Kubernetes Client Exception creating kuryrloadbalancer CRD. <class 'kuryr_kubernetes.exceptions.K8sClientException'>: kuryr_kubernetes.exceptions.K8sNamespaceTerminating: Forbidden: 'Namespace already terminated: \'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"kuryrloadbalancers.openstack.org \\\\"test-service\\\\" is forbidden: unable to create new content in namespace e2e-nsdeletetest-6977 because it is being terminated","reason":"Forbidden","details":{"name":"test-service","group":"openstack.org","kind":"kuryrloadbalancers","causes":[{"reason":"NamespaceTerminating","message":"namespace e2e-nsdeletetest-6977 is being terminated","field":"metadata.namespace"}]},"code":403}\\n\''
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas Traceback (most recent call last):
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 171, in create_crd_spec
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas     loadbalancer_crd)
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 210, in post
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas     self._raise_from_response(response)
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 88, in _raise_from_response
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas     raise exc.K8sNamespaceTerminating(response.text)
2020-09-01 10:27:24.343 1 ERROR kuryr_kubernetes.controller.handlers.lbaas kuryr_kubernetes.exceptions.K8sNamespaceTerminating: Forbidden: 'Namespace already terminated: \'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"kuryrloadbalancers.openstack.org \\\\"test-service\\\\" is forbidden: unable to create new content in namespace e2e-nsdeletetest-6977 because it is being terminated","reason":"Forbidden","details":{"name":"test-service","group":"openstack.org","kind":"kuryrloadbalancers","causes":[{"reason":"NamespaceTerminating","message":"namespace e2e-nsdeletetest-6977 is being terminated","field":"metadata.namespace"}]},"code":403}\\n\''


Version-Release number of selected component (if applicable):

4.6.0-0.nightly-2020-08-31-220837
RHOS-16.1-RHEL-8-20200821.n.0


How reproducible: always running conformance tests


Steps to Reproduce:
1. Install 4.6 on OSP 16.1 with OVN
2. Run conformance tests

Actual results:
kuryr_kubernetes.exceptions.K8sNamespaceTerminating exception is continuously raised and kuryr-controller is restarted


Expected results: no kuryr-controller restarts due to that exception


Additional info:

Conformance test results: error: 44 fail, 257 pass, 1 skip (1h31m10s)

$ oc -n openshift-kuryr get pods
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-5nj6l                     1/1     Running   8          3h58m
kuryr-cni-7xp7x                     1/1     Running   8          3h57m
kuryr-cni-9stvw                     1/1     Running   0          4h20m
kuryr-cni-mtxd4                     1/1     Running   0          4h20m
kuryr-cni-plcxk                     1/1     Running   4          3h58m
kuryr-cni-rv5ff                     1/1     Running   0          4h20m
kuryr-controller-66d4854f56-td7cc   1/1     Running   14         4h20m

Comment 3 rlobillo 2020-09-07 16:04:26 UTC
Verified on 4.6.0-0.nightly-2020-09-05-015624 over RHOS-16.1-RHEL-8-20200831.n.1

OCP installed with IPI and run NP and conformance tests with expected results.

The error is observed and captured so no kuryr-controller restarts are observed due to 'Namespace already terminated' error:

2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas [-] Kubernetes Client Exception creating kuryrloadbalancer CRD. <class 'kuryr_kubernetes.exceptions.K8sClientException'>: kuryr_kubernetes.exceptions.K8sNamespaceTerminating: Forbidden: 'Namespace already terminated: \'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"kuryrloadbalancers.openstack.org \\\\"latency-svc-tcrdr\\\\" is forbidden: unable to create new content in namespace e2e-svc-latency-4916 because it is being terminated","reason":"Forbidden","details":{"name":"latency-svc-tcrdr","group":"openstack.org","kind":"kuryrloadbalancers","causes":[{"reason":"NamespaceTerminating","message":"namespace e2e-svc-latency-4916 is being terminated","field":"metadata.namespace"}]},"code":403}\\n\''
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas Traceback (most recent call last):
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 353, in _create_crd_spec
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas     k_const.K8S_API_CRD_NAMESPACES, namespace), loadbalancer_crd)
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 210, in post
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas     self._raise_from_response(response)
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 88, in _raise_from_response
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas     raise exc.K8sNamespaceTerminating(response.text)
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas kuryr_kubernetes.exceptions.K8sNamespaceTerminating: Forbidden: 'Namespace already terminated: \'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"kuryrloadbalancers.openstack.org \\\\"latency-svc-tcrdr\\\\" is forbidden: unable to create new content in namespace e2e-svc-latency-4916 because it is being terminated","reason":"Forbidden","details":{"name":"latency-svc-tcrdr","group":"openstack.org","kind":"kuryrloadbalancers","causes":[{"reason":"NamespaceTerminating","message":"namespace e2e-svc-latency-4916 is being terminated","field":"metadata.namespace"}]},"code":403}\\n\''
2020-09-07 13:05:12.764 1 ERROR kuryr_kubernetes.controller.handlers.lbaas ESC[00m
2020-09-07 13:05:12.765 1 WARNING kuryr_kubernetes.controller.handlers.lbaas [-] Namespace e2e-svc-latency-4916 is being terminated, ignoring Endpoints latency-svc-tcrdr in that namespace.: kuryr_kubernetes.exceptions.K8sNamespaceTerminating: Forbidden: 'Namespace already terminated: \'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"kuryrloadbalancers.openstack.org \\\\"latency-svc-tcrdr\\\\" is forbidden: unable to create new content in namespace e2e-svc-latency-4916 because it is being terminated","reason":"Forbidden","details":{"name":"latency-svc-tcrdr","group":"openstack.org","kind":"kuryrloadbalancers","causes":[{"reason":"NamespaceTerminating","message":"namespace e2e-svc-latency-4916 is being terminated","field":"metadata.namespace"}]},"code":403}\\n\''ESC[00m

It is mentioned on  https://bugzilla.redhat.com/show_bug.cgi?id=1860030 a rework regarding the logs so only the warning message is shown.

As the kuryr-controller is not anymore restarted due to this error and the functionality is OK, this BZ is verified.

Comment 5 errata-xmlrpc 2020-10-27 16:36:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196