Bug 1762788 - Kuryr: Failures releasing port when pod doesn't have status.hostIP set
Summary: Kuryr: Failures releasing port when pod doesn't have status.hostIP set
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: All
OS: All
high
medium
Target Milestone: ---
: 4.2.z
Assignee: Michał Dulko
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On: 1762787
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-17 13:33 UTC by Michał Dulko
Modified: 2019-10-30 04:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1762787
Environment:
Last Closed: 2019-10-30 04:45:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 67 0 None None None 2019-10-18 15:56:10 UTC
Red Hat Product Errata RHBA-2019:3151 0 None None None 2019-10-30 04:45:19 UTC

Description Michał Dulko 2019-10-17 13:33:49 UTC
+++ This bug was initially created as a clone of Bug #1762787 +++

Description of problem:
We noticed that sometimes scheduled pods have no .status.hostIP field set on DELETED events. API reference does not say this may ever happen, but we consistently see it. This causes traceback below in kuryr-kubernetes:

2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 56, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 77, in __call__
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self.on_deleted(obj)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 204, in on_deleted
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     security_groups)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1048, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     self._vif_drvs[vif_drv_alias].release_vif(pod, vif, *argv)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 249, in release_vif
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     host_addr = self._get_host_addr(pod)
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 182, in _get_host_addr
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging     return pod['status']['hostIP']
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'hostIP'
  2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging ESC[00m
  2019-10-11 12:15:42.077 1 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available for event DELETED Namespace:None/e2e-pods-8996 (uid: bb511827-ec20-11e9-8b0d-fa163e225f48) __call__ /usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/dispatch.py:68ESC[00m


Version-Release number of selected component (if applicable):
4.2

How reproducible:
Occasionally

Steps to Reproduce:
1. Deploy OpenShift with Kuryr
2. Run openshift-tests

Actual results:
Error in the logs

Expected results:
No error, Kuryr recovers

Additional info:

Comment 2 Jon Uriarte 2019-10-25 14:43:40 UTC
Verified on OCP 4.2.0-0.nightly-2019-10-25-021846 build on top of OSP 13 2019-10-01.1 puddle.

release image: registry.svc.ci.openshift.org/ocp/release@sha256:8f97aa21e1c0b2815ec7c86e4138362940a5dcbc292840ab4d6d5b67fedb173f

The next warning log has been caught during openshift-tests execution:

2019-10-25 13:57:07.948 1 WARNING kuryr_kubernetes.controller.drivers.vif_pool [-] Pod pod-update-activedeadlineseconds-286b28d2-f72f-11e9-b339-525400903b5b does not have status.hostIP field set when getting del│eted. This is unusual. Trying to determine the IP by calling Neutron.

And kuryr-controller still remains "alive" so it was not restarted:

2019-10-25 13:57:08.827 1 INFO werkzeug [-] 192.168.0.22 - - [25/Oct/2019 13:57:08] "GET /alive HTTP/1.1" 200

Comment 4 errata-xmlrpc 2019-10-30 04:45:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3151


Note You need to log in before you can comment on or make changes to this bug.