Bug 1872265

Summary: [Kuryr] KuryrPort handler may cause pod to be removed
Product: OpenShift Container Platform Reporter: Jon Uriarte <juriarte>
Component: NetworkingAssignee: rdobosz
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: ltomasbo, rdobosz, rlobillo
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:33:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
conformance test result
none
NP test results none

Description Jon Uriarte 2020-08-25 10:25:07 UTC
Description of problem:
Some kuryr cni pods and kuryr controller pod end up crashlooping while running OCP conformance tests due to some pods not being found.

2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport [-] Failed to get pod: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0\\" not found","reason":"NotFound","details":{"name":"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0","kind":"pods"},"code":404}\n': kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods 
\\"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0\\" not found","reason":"NotFound","details":{"name":"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0","kind":"pods"},"code":404}\n'                                                                                                                                                                            
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport Traceback (most recent call last):                                                                              
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 136, in on_finalize                                                                                                                                                                                          
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport     pod = self.k8s.get(f"{constants.K8S_API_NAMESPACES}"                                                        
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 104, in get                      
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport     self._raise_from_response(response)                                                                         
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 83, in _raise_from_response      
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport     raise exc.K8sResourceNotFound(response.text)                                                                
2020-08-25 08:45:00.727 1 ERROR kuryr_kubernetes.controller.handlers.kuryrport kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0\\" not found","reason":"NotFound","details":{"name":"pod-projected-configmaps-19f6cf54-cab1-404d-8f1f-13dd06da72e0","kind":"pods"},"code":404}\n'


2020-08-25 08:02:27.316 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrPortHandler is dead.                                                                                                
2020-08-25 08:02:27.347 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopping                                                                                                          
2020-08-25 08:02:27.348 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworkpolicies'                                                                                         2020-08-25 08:02:27.349 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/namespaces'                                                                                                                  2020-08-25 08:02:27.351 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/services'                                                                                                                    2020-08-25 08:02:27.352 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/endpoints'                                                                                                                   2020-08-25 08:02:27.353 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/networking.k8s.io/v1/networkpolicies'                                                                                          2020-08-25 08:02:27.354 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/pods'                                                                                                                        
2020-08-25 08:02:27.356 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrloadbalancers'                                                                                          2020-08-25 08:02:27.357 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworks'                                                                                                
2020-08-25 08:02:27.361 1 WARNING urllib3.connectionpool [-] Connection pool is full, discarding connection: api-int.ostest.shiftstack.com: queue.Full                                                             
2020-08-25 08:02:27.364 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrports'                                                                                                   
2020-08-25 08:02:27.364 1 INFO kuryr_kubernetes.watcher [-] No remaining active watchers, Exiting...                                                                                                               2020-08-25 08:02:27.366 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopping


Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-24-034934
RHOS-16.1-RHEL-8-20200813.n.0

How reproducible: always when running conformance tests


Steps to Reproduce:
1. Install OCP 4.6 on OSP 16.1
2. Run conformance tests

Actual results: Kuryr controller and cni pods in crashloop

Expected results: no crashloop

Comment 5 rlobillo 2020-09-07 16:22:42 UTC
Verified on 4.6.0-0.nightly-2020-09-05-015624 over RHOS-16.1-RHEL-8-20200831.n.1

OCP installaed with IPI and NP and Conformance tests run with expected results.

kuryr-controller handled the target scenario and it is managed successfully:

$ oc logs -n openshift-kuryr kuryr-controller-7b6cdb86dd-wpx2x -p | grep 'Manual'
2020-09-07 13:01:31.196 1 WARNING kuryr_kubernetes.controller.handlers.kuryrport [-] Manually triggered KuryrPort taint-eviction-4 removal. This action should be avoided, since KuryrPort CRDs are internal to Kuryr.

No crashloopback observed neither on controllers nor cni pods.

Test logs attached.

Comment 6 rlobillo 2020-09-07 16:23:10 UTC
Created attachment 1713987 [details]
conformance test result

Comment 7 rlobillo 2020-09-07 16:23:28 UTC
Created attachment 1713988 [details]
NP test results

Comment 9 errata-xmlrpc 2020-10-27 16:33:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196