Bug 1762787

Summary: Kuryr: Failures releasing port when pod doesn't have status.hostIP set
Product: OpenShift Container Platform Reporter: Michał Dulko <mdulko>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: Jon Uriarte <juriarte>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: juriarte
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1762788 (view as bug list) Environment:
Last Closed: 2020-01-23 11:07:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1762788    

Description Michał Dulko 2019-10-17 13:31:42 UTC
Description of problem:
We noticed that sometimes scheduled pods have no .status.hostIP field set on DELETED events. API reference does not say this may ever happen, but we consistently see it. This causes traceback below in kuryr-kubernetes:

2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 56, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 77, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self.on_deleted(obj)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 204, in on_deleted
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     security_groups)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1048, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._vif_drvs[vif_drv_alias].release_vif(pod, vif, *argv)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 249, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     host_addr = self._get_host_addr(pod)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 182, in _get_host_addr
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     return pod['status']['hostIP']
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'hostIP'
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging ESC[00m
  2019-10-11 12:15:42.077 1 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available for event DELETED Namespace:None/e2e-pods-8996 (uid: bb511827-ec20-11e9-8b0d-fa163e225f48) __call__ /usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/dispatch.py:68ESC[00m


Version-Release number of selected component (if applicable):
4.2

How reproducible:
Occasionally

Steps to Reproduce:
1. Deploy OpenShift with Kuryr
2. Run openshift-tests

Actual results:
Error in the logs

Expected results:
No error, Kuryr recovers

Additional info:

Comment 2 Jon Uriarte 2019-10-18 15:54:02 UTC
Verified on OCP 4.3.0-0.nightly-2019-10-18-051534 build on top of OSP 13 2019-10-01.1 puddle.

release image: registry.svc.ci.openshift.org/ocp/release@sha256:2a8f99a817784b303bd76706e14b23cffd98fca1e96b672dfb0b534a79ec5a86

The next warning log has been caught during openshift-tests execution:

WARNING kuryr_kubernetes.controller.drivers.vif_pool [-] Pod pod-update-activedeadlineseconds-5f983277-f1bd-11e9-8a12-525400903b5b does not have status.hostIP field set when getting deleted. This is unusual. Trying to determine the IP by calling Neutron.

And kuryr-controller still remains "alive" so it was not restarted:

INFO werkzeug [-] 192.168.0.17 - - [18/Oct/2019 15:40:05] "GET /alive HTTP/1.1" 200

Comment 4 errata-xmlrpc 2020-01-23 11:07:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062