Bug 1875751

Summary: "Resource not found" if NP deleted during processing
Product: OpenShift Container Platform Reporter: Michał Dulko <mdulko>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: rlobillo
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:37:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
NP test results none

Description Michał Dulko 2020-09-04 09:14:08 UTC
Description of problem:
  We have various issues in NP code when resources are listed and while
  processed one gets removed. E.g.:

  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry [-] Report handler unhealthy KuryrPortHandler: kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"networkpolicies.networking.k8s.io \\"allow-from-client-a-pod-selector\\" not found","reason":"NotFound","details":{"name":"allow-from-client-a-pod-selector","group":"networking.k8s.io","kind":"networkpolicies"},"code":404}\n'
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry Traceback (most recent call last):
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 78, in __call__
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry     self._handler(event, *args, **kwargs)
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 84, in __call__
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry     self.on_present(obj)
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 126, in on_present
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry     crd_pod_selectors = self._drv_sg.create_sg_rules(pod)
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/network_policy_security_groups.py", line 324, in create_sg_rules
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry     _bump_networkpolicy(crd)
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/network_policy_security_groups.py", line 53, in _bump_networkpolicy
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry     {constants.K8S_ANNOTATION_POLICY: str(uuid.uuid4())})
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 368, in annotate
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry     self._raise_from_response(response)
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 83, in _raise_from_response
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry     raise exc.K8sResourceNotFound(response.text)
  2020-09-03 11:36:01.294 1 ERROR kuryr_kubernetes.handlers.retry kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"networkpolicies.networking.k8s.io \\"allow-from-client-a-pod-selector\\" not found","reason":"NotFound","details":{"name":"allow-from-client-a-pod-selector","group":"networking.k8s.io","kind":"networkpolicies"},"code":404}\n'

Version-Release number of selected component (if applicable):


How reproducible:
Quite a race condition.

Steps to Reproduce:
1. This happened when running network policy e2e tests, so that's the best we have.

Actual results:
Error causes kuryr-controller restart.

Expected results:
Errors ignored, no trace of them in the logs, kuryr-controller not restarted.

Additional info:

Comment 3 rlobillo 2020-09-10 11:39:19 UTC
Verified on OCP4.6.0-0.nightly-2020-09-10-011413 over OSP16.1 (RHOS-16.1-RHEL-8-20200831.n.1) with OVN-Octavia.

All NP tests PASSED. No kuryr-controller restarts.

Logs attached.

Comment 4 rlobillo 2020-09-10 11:51:17 UTC
Created attachment 1714420 [details]
NP test results

Comment 6 errata-xmlrpc 2020-10-27 16:37:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196