Bug 1579441 - Healthcheck port collision between kuryr controller and cni
Summary: Healthcheck port collision between kuryr controller and cni
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.z
Assignee: Luis Tomas Bolivar
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-17 16:22 UTC by Luis Tomas Bolivar
Modified: 2019-01-30 15:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Both kuryr-controller and kuryr-cni are running with host-networking Consequence: They cannot listen on the same port for the healthcheck probes Fix: Add a different port for kuryr-controller and cni probes to avoid this collision
Clone Of:
Environment:
Last Closed: 2019-01-30 15:13:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 8425 0 None closed Avoid kuryr healthcheck ports collision 2020-05-12 03:11:44 UTC
Red Hat Product Errata RHBA-2019:0206 0 None None None 2019-01-30 15:13:25 UTC

Description Luis Tomas Bolivar 2018-05-17 16:22:33 UTC
As both kuryr-controller and kuryr-cni are running with host-networking, they cannot listen on the same port for the healthcheck probes. So there is a need for having different port for controller and cni probes. Otherwise the next error happens:

2018-05-17 15:25:26.495 1 INFO kuryr_kubernetes.controller.managers.health [-] Starting health check server.
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health [-] Failed to start health check server.: error: [Errno 98] Address already in use
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health Traceback (most recent call last):
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/managers/health.py", line 106, in run
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     self.application.run(address, CONF.health_server.port)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python2.7/site-packages/flask/app.py", line 841, in run
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     run_simple(host, port, self, **options)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 814, in run_simple
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     inner()
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 774, in inner
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     fd=fd)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 666, in make_server
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     passthrough_errors, ssl_context, fd=fd)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 577, in __init__
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     self.address_family), handler)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     self.server_bind()
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     SocketServer.TCPServer.server_bind(self)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     self.socket.bind(self.server_address)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib64/python2.7/socket.py", line 224, in meth
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health     return getattr(self._sock,name)(*args)
2018-05-17 15:25:26.497 1 ERROR kuryr_kubernetes.controller.managers.health error: [Errno 98] Address already in use

Comment 1 Scott Dodson 2018-08-01 19:33:25 UTC
The linked PR merged quite some time ago and is in openshift-ansible-3.10.27-1

Comment 2 Jon Uriarte 2018-12-21 09:03:16 UTC
Verified in openshift-ansible-3.10.90-1.git.0.5a504fb.el7.noarch on top of
OSP 13 2018-12-13.4 puddle.

Verification steps:

- Deploy OCP 3.10 on OSP 13, enable kuryr controller and cni healthchecks
  (kuryr-cni healthchecks do not work on OSP 13 but they are enabled in
   order to verify there is no port collision between both healthchecks)

- Check Kuryr config (oc -n openshift-infra get cm kuryr-config -o yaml):

  kuryr-cni.conf:
    ...
    [cni_health_server]
    port = 8090

  kuryr.conf:
    ...
    [health_server]
    port = 8082

- Check cni probes port in kuryr cni daemonset (oc -n openshift-infra get ds kuryr-cni-ds -o yaml):

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /alive
            port: 8090
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: kuryr-cni
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8090
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5

- Check controller probes port in kuryr controller deployment (oc -n openshift-infra get deployment kuryr-controller -o yaml):

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /alive
            port: 8082
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8082
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5


- Check no port collision error is shown in Kuryr logs (controller and cni pods
  logs in same openshift node, in master node).

Comment 4 errata-xmlrpc 2019-01-30 15:13:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0206


Note You need to log in before you can comment on or make changes to this bug.