Bug 1762788

Summary: Kuryr: Failures releasing port when pod doesn't have status.hostIP set
Product: OpenShift Container Platform Reporter: Michał Dulko <mdulko>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: Jon Uriarte <juriarte>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: gcheresh, juriarte, ltomasbo
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1762787 Environment:
Last Closed: 2019-10-30 04:45:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1762787    
Bug Blocks:    

Description Michał Dulko 2019-10-17 13:33:49 UTC
+++ This bug was initially created as a clone of Bug #1762787 +++

Description of problem:
We noticed that sometimes scheduled pods have no .status.hostIP field set on DELETED events. API reference does not say this may ever happen, but we consistently see it. This causes traceback below in kuryr-kubernetes:

2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 56, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 77, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self.on_deleted(obj)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 204, in on_deleted
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     security_groups)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1048, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._vif_drvs[vif_drv_alias].release_vif(pod, vif, *argv)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 249, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     host_addr = self._get_host_addr(pod)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 182, in _get_host_addr
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     return pod['status']['hostIP']
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'hostIP'
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging ESC[00m
  2019-10-11 12:15:42.077 1 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available for event DELETED Namespace:None/e2e-pods-8996 (uid: bb511827-ec20-11e9-8b0d-fa163e225f48) __call__ /usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/dispatch.py:68ESC[00m


Version-Release number of selected component (if applicable):
4.2

How reproducible:
Occasionally

Steps to Reproduce:
1. Deploy OpenShift with Kuryr
2. Run openshift-tests

Actual results:
Error in the logs

Expected results:
No error, Kuryr recovers

Additional info:

Comment 2 Jon Uriarte 2019-10-25 14:43:40 UTC
Verified on OCP 4.2.0-0.nightly-2019-10-25-021846 build on top of OSP 13 2019-10-01.1 puddle.

release image: registry.svc.ci.openshift.org/ocp/release@sha256:8f97aa21e1c0b2815ec7c86e4138362940a5dcbc292840ab4d6d5b67fedb173f

The next warning log has been caught during openshift-tests execution:

2019-10-25 13:57:07.948 1 WARNING kuryr_kubernetes.controller.drivers.vif_pool [-] Pod pod-update-activedeadlineseconds-286b28d2-f72f-11e9-b339-525400903b5b does not have status.hostIP field set when getting del│eted. This is unusual. Trying to determine the IP by calling Neutron.

And kuryr-controller still remains "alive" so it was not restarted:

2019-10-25 13:57:08.827 1 INFO werkzeug [-] 192.168.0.22 - - [25/Oct/2019 13:57:08] "GET /alive HTTP/1.1" 200

Comment 4 errata-xmlrpc 2019-10-30 04:45:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3151