As both kuryr-controller and kuryr-cni are running with host-networking, they cannot listen on the same port for the healthcheck probes. So there is a need for having different port for controller and cni probes. Otherwise the next error happens: 2018-05-17 15:25:26.495 1 INFO kuryr_kubernetes.controller.managers.health [-] Starting health check server. 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health [-] Failed to start health check server.: error: [Errno 98] Address already in use 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health Traceback (most recent call last): 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/managers/health.py", line 106, in run 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health self.application.run(address, CONF.health_server.port) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib/python2.7/site-packages/flask/app.py", line 841, in run 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health run_simple(host, port, self, **options) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 814, in run_simple 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health inner() 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 774, in inner 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health fd=fd) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 666, in make_server 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health passthrough_errors, ssl_context, fd=fd) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 577, in __init__ 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health self.address_family), handler) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__ 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health self.server_bind() 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health SocketServer.TCPServer.server_bind(self) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health self.socket.bind(self.server_address) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health File "/usr/lib64/python2.7/socket.py", line 224, in meth 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health return getattr(self._sock,name)(*args) 2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health error: [Errno 98] Address already in use
The linked PR merged quite some time ago and is in openshift-ansible-3.10.27-1
Verified in openshift-ansible-3.10.90-1.git.0.5a504fb.el7.noarch on top of OSP 13 2018-12-13.4 puddle. Verification steps: - Deploy OCP 3.10 on OSP 13, enable kuryr controller and cni healthchecks (kuryr-cni healthchecks do not work on OSP 13 but they are enabled in order to verify there is no port collision between both healthchecks) - Check Kuryr config (oc -n openshift-infra get cm kuryr-config -o yaml): kuryr-cni.conf: ... [cni_health_server] port = 8090 kuryr.conf: ... [health_server] port = 8082 - Check cni probes port in kuryr cni daemonset (oc -n openshift-infra get ds kuryr-cni-ds -o yaml): livenessProbe: failureThreshold: 3 httpGet: path: /alive port: 8090 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: kuryr-cni readinessProbe: failureThreshold: 3 httpGet: path: /ready port: 8090 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 - Check controller probes port in kuryr controller deployment (oc -n openshift-infra get deployment kuryr-controller -o yaml): livenessProbe: failureThreshold: 3 httpGet: path: /alive port: 8082 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: controller readinessProbe: failureThreshold: 3 httpGet: path: /ready port: 8082 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 - Check no port collision error is shown in Kuryr logs (controller and cni pods logs in same openshift node, in master node).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0206