Description of problem: We noticed that sometimes scheduled pods have no .status.hostIP field set on DELETED events. API reference does not say this may ever happen, but we consistently see it. This causes traceback below in kuryr-kubernetes: 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last): 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__ 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event) 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 56, in __call__ 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event) 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 77, in __call__ 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging self.on_deleted(obj) 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/vif.py", line 204, in on_deleted 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging security_groups) 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1048, in release_vif 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging self._vif_drvs[vif_drv_alias].release_vif(pod, vif, *argv) 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 249, in release_vif 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging host_addr = self._get_host_addr(pod) 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 182, in _get_host_addr 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging return pod['status']['hostIP'] 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'hostIP' 2019-10-11 12:15:41.780 1 ERROR kuryr_kubernetes.handlers.logging ESC[00m 2019-10-11 12:15:42.077 1 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available for event DELETED Namespace:None/e2e-pods-8996 (uid: bb511827-ec20-11e9-8b0d-fa163e225f48) __call__ /usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/dispatch.py:68ESC[00m Version-Release number of selected component (if applicable): 4.2 How reproducible: Occasionally Steps to Reproduce: 1. Deploy OpenShift with Kuryr 2. Run openshift-tests Actual results: Error in the logs Expected results: No error, Kuryr recovers Additional info:
Verified on OCP 4.3.0-0.nightly-2019-10-18-051534 build on top of OSP 13 2019-10-01.1 puddle. release image: registry.svc.ci.openshift.org/ocp/release@sha256:2a8f99a817784b303bd76706e14b23cffd98fca1e96b672dfb0b534a79ec5a86 The next warning log has been caught during openshift-tests execution: WARNING kuryr_kubernetes.controller.drivers.vif_pool [-] Pod pod-update-activedeadlineseconds-5f983277-f1bd-11e9-8a12-525400903b5b does not have status.hostIP field set when getting deleted. This is unusual. Trying to determine the IP by calling Neutron. And kuryr-controller still remains "alive" so it was not restarted: INFO werkzeug [-] 192.168.0.17 - - [18/Oct/2019 15:40:05] "GET /alive HTTP/1.1" 200
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062