Description of problem: When a pod event is handled it's possible that a Network Policy and Service are affected by that pod and the LB sg of selected services needs to be updated. However, the endpoints for the matched service may not be yet present or were deleted, resulting in a NotFound exception. 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry [-] Report handler unhealthy VIFHandler: kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1"," metadata":{},"status":"Failure","message":"endpoints \\"svc-server\\" not found","reason":"NotFound","details":{"name":"svc-server","kind":"endpoints"},"code":404}\n' 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry Traceback (most recent call last): 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 78, in __call__ 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._handler(event) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__ 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self.on_present(obj) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 70, in on_present 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self.on_deleted(pod) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 212, in on_deleted 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._update_services(services, crd_pod_selectors, project_id) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 280, in _update_services 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._drv_lbaas.update_lbaas_sg(service, sgs) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 793, in update_lbaas_sg 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry endpoint = k8s.get(endpoints_link) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 96, in get 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._raise_from_response(response) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 81, in _raise_from_response 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry raise exc.K8sResourceNotFound(response.text) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message" :"endpoints \\"svc-server\\" not found","reason":"NotFound","details":{"name":"svc-server","kind":"endpoints"},"code":404}\n' Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Run Network Policy tests with parallel of 3 on envs with Amphoras deployed 2. 3. Actual results: Expected results: Additional info:
Verified on OSP13 - 2020-05-19.2 and 4.5.0-0.nightly-2020-05-29-005153 NP tests run with parallelism set to 3. The controller is not restarted and backtrace is not observed. Furthermore, manual reproduction of the issue was performed to confirm stability: 1. Enable debug logs: oc scale deployment -n openshift-cluster-version cluster-version-operator --replicas 0 oc edit cm kuryr-config -n openshift-kuryr 2. Create pod, service and network policy. oc run server --image=kuryr/demo oc expose pod/server --port 80 oc apply -f np.yml where np.yaml contains: kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: api-allow spec: podSelector: matchLabels: run: server ingress: - from: - podSelector: matchLabels: run: client 3. Delete the endpoints of the created svc: oc delete endpoints server 4. Created another pod which will trigger lb update: oc run client --image=kuryr/demo) and check controller logs during it: oc logs -n openshift-kuryr kuryr-controller-5d8dc4f6c5-t82vk -f | grep 'Endpoint not Found. Skipping LB SG update for' Result: - No controller crashes. - DEBUG log line is written: [stack@undercloud-0 ~]$ oc logs -n openshift-kuryr kuryr-controller-5d8dc4f6c5-t82vk -f | grep 'Endpoint not Found. Skipping LB SG update for' 2020-06-01 10:34:31.931 1 DEBUG kuryr_kubernetes.controller.drivers.lbaasv2 [-] Endpoint not Found. Skipping LB SG update fortest/server as the LB resources are not present update_lbaas_sg /usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py:808
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409