Bug 1762787 - Kuryr: Failures releasing port when pod doesn't have status.hostIP set
Summary: Kuryr: Failures releasing port when pod doesn't have status.hostIP set
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: All
OS: All
high
medium
Target Milestone: ---
: 4.3.0
Assignee: Michał Dulko
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks: 1762788
TreeView+ depends on / blocked
 
Reported: 2019-10-17 13:31 UTC by Michał Dulko
Modified: 2020-01-23 11:08 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1762788 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:07:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 65 0 None closed Bug 1762787: NestedVIFPool: React when status.hostIP is missing 2020-11-04 16:37:58 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:08:19 UTC

Description Michał Dulko 2019-10-17 13:31:42 UTC
Description of problem:
We noticed that sometimes scheduled pods have no .status.hostIP field set on DELETED events. API reference does not say this may ever happen, but we consistently see it. This causes traceback below in kuryr-kubernetes:

2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 56, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 77, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self.on_deleted(obj)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 204, in on_deleted
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     security_groups)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1048, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._vif_drvs[vif_drv_alias].release_vif(pod, vif, *argv)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 249, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     host_addr = self._get_host_addr(pod)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 182, in _get_host_addr
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     return pod['status']['hostIP']
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'hostIP'
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging ESC[00m
  2019-10-11 12:15:42.077 1 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available for event DELETED Namespace:None/e2e-pods-8996 (uid: bb511827-ec20-11e9-8b0d-fa163e225f48) __call__ /usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/dispatch.py:68ESC[00m


Version-Release number of selected component (if applicable):
4.2

How reproducible:
Occasionally

Steps to Reproduce:
1. Deploy OpenShift with Kuryr
2. Run openshift-tests

Actual results:
Error in the logs

Expected results:
No error, Kuryr recovers

Additional info:

Comment 2 Jon Uriarte 2019-10-18 15:54:02 UTC
Verified on OCP 4.3.0-0.nightly-2019-10-18-051534 build on top of OSP 13 2019-10-01.1 puddle.

release image: registry.svc.ci.openshift.org/ocp/release@sha256:2a8f99a817784b303bd76706e14b23cffd98fca1e96b672dfb0b534a79ec5a86

The next warning log has been caught during openshift-tests execution:

WARNING kuryr_kubernetes.controller.drivers.vif_pool [-] Pod pod-update-activedeadlineseconds-5f983277-f1bd-11e9-8a12-525400903b5b does not have status.hostIP field set when getting deleted. This is unusual. Trying to determine the IP by calling Neutron.

And kuryr-controller still remains "alive" so it was not restarted:

INFO werkzeug [-] 192.168.0.17 - - [18/Oct/2019 15:40:05] "GET /alive HTTP/1.1" 200

Comment 4 errata-xmlrpc 2020-01-23 11:07:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.