Bug 1838985
| Summary: | [Kuryr] LB sg update not skipped when no endpoint is found | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Maysa Macedo <mdemaced> | |
| Component: | Networking | Assignee: | Maysa Macedo <mdemaced> | |
| Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | ltomasbo, rlobillo | |
| Version: | 4.5 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1839023 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-13 17:41:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1839023 | |||
Verified on OSP13 - 2020-05-19.2 and 4.5.0-0.nightly-2020-05-29-005153
NP tests run with parallelism set to 3. The controller is not restarted and backtrace is not observed.
Furthermore, manual reproduction of the issue was performed to confirm stability:
1. Enable debug logs:
oc scale deployment -n openshift-cluster-version cluster-version-operator --replicas 0
oc edit cm kuryr-config -n openshift-kuryr
2. Create pod, service and network policy.
oc run server --image=kuryr/demo
oc expose pod/server --port 80
oc apply -f np.yml
where np.yaml contains:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: api-allow
spec:
podSelector:
matchLabels:
run: server
ingress:
- from:
- podSelector:
matchLabels:
run: client
3. Delete the endpoints of the created svc: oc delete endpoints server
4. Created another pod which will trigger lb update:
oc run client --image=kuryr/demo) and check controller logs during it:
oc logs -n openshift-kuryr kuryr-controller-5d8dc4f6c5-t82vk -f | grep 'Endpoint not Found. Skipping LB SG update for'
Result:
- No controller crashes.
- DEBUG log line is written:
[stack@undercloud-0 ~]$ oc logs -n openshift-kuryr kuryr-controller-5d8dc4f6c5-t82vk -f | grep 'Endpoint not Found. Skipping LB SG update for'
2020-06-01 10:34:31.931 1 DEBUG kuryr_kubernetes.controller.drivers.lbaasv2 [-] Endpoint not Found. Skipping LB SG update fortest/server as the LB resources are not present update_lbaas_sg /usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py:808
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |
Description of problem: When a pod event is handled it's possible that a Network Policy and Service are affected by that pod and the LB sg of selected services needs to be updated. However, the endpoints for the matched service may not be yet present or were deleted, resulting in a NotFound exception. 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry [-] Report handler unhealthy VIFHandler: kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1"," metadata":{},"status":"Failure","message":"endpoints \\"svc-server\\" not found","reason":"NotFound","details":{"name":"svc-server","kind":"endpoints"},"code":404}\n' 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry Traceback (most recent call last): 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 78, in __call__ 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._handler(event) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__ 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self.on_present(obj) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 70, in on_present 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self.on_deleted(pod) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 212, in on_deleted 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._update_services(services, crd_pod_selectors, project_id) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 280, in _update_services 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._drv_lbaas.update_lbaas_sg(service, sgs) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 793, in update_lbaas_sg 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry endpoint = k8s.get(endpoints_link) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 96, in get 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry self._raise_from_response(response) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 81, in _raise_from_response 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry raise exc.K8sResourceNotFound(response.text) 2020-04-30 15:03:46.792 1 ERROR kuryr_kubernetes.handlers.retry kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message" :"endpoints \\"svc-server\\" not found","reason":"NotFound","details":{"name":"svc-server","kind":"endpoints"},"code":404}\n' Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Run Network Policy tests with parallel of 3 on envs with Amphoras deployed 2. 3. Actual results: Expected results: Additional info: