Description of problem: Some kuryr cni pods and kuryr controller pod end up crashlooping while running OCP conformance tests due to some pods not being found. 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport [-] Failed to get pod: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0\\" not found","reason":"NotFound","details":{"name":"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0","kind":"pods"},"code":404}\n': kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0\\" not found","reason":"NotFound","details":{"name":"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0","kind":"pods"},"code":404}\n' 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport Traceback (most recent call last): 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 136, in on_finalize 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport pod = self.k8s.get(f"{constants.K8S_API_NAMESPACES}" 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 104, in get 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport self._raise_from_response(response) 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 83, in _raise_from_response 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport raise exc.K8sResourceNotFound(response.text) 2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0\\" not found","reason":"NotFound","details":{"name":"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0","kind":"pods"},"code":404}\n' 2020-08-25 08:02:27.316 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrPortHandler is dead. 2020-08-25 08:02:27.347 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopping 2020-08-25 08:02:27.348 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworkpolicies' 2020-08-25 08:02:27.349 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/namespaces' 2020-08-25 08:02:27.351 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/services' 2020-08-25 08:02:27.352 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/endpoints' 2020-08-25 08:02:27.353 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/networking.k8s.io/v1/networkpolicies' 2020-08-25 08:02:27.354 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/pods' 2020-08-25 08:02:27.356 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrloadbalancers' 2020-08-25 08:02:27.357 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworks' 2020-08-25 08:02:27.361 1 WARNING urllib3.connectionpool [-] Connection pool is full, discarding connection: api-int.ostest.shiftstack.com: queue.Full 2020-08-25 08:02:27.364 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrports' 2020-08-25 08:02:27.364 1 INFO kuryr_kubernetes.watcher [-] No remaining active watchers, Exiting... 2020-08-25 08:02:27.366 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopping Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-08-24-034934 RHOS-16.1-RHEL-8-20200813.n.0 How reproducible: always when running conformance tests Steps to Reproduce: 1. Install OCP 4.6 on OSP 16.1 2. Run conformance tests Actual results: Kuryr controller and cni pods in crashloop Expected results: no crashloop
Verified on 4.6.0-0.nightly-2020-09-05-015624 over RHOS-16.1-RHEL-8-20200831.n.1 OCP installaed with IPI and NP and Conformance tests run with expected results. kuryr-controller handled the target scenario and it is managed successfully: $ oc logs -n openshift-kuryr kuryr-controller-7b6cdb86dd-wpx2x -p | grep 'Manual' 2020-09-07 13:01:31.196 1 WARNING kuryr_kubernetes.controller.handlers.kuryrport [-] Manually triggered KuryrPort taint-eviction-4 removal. This action should be avoided, since KuryrPort CRDs are internal to Kuryr. No crashloopback observed neither on controllers nor cni pods. Test logs attached.
Created attachment 1713987 [details] conformance test result
Created attachment 1713988 [details] NP test results
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196