Bug 1985366 - CCCMO using unregistered host ports
Summary: CCCMO using unregistered host ports
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-23 14:02 UTC by Zane Bitter
Modified: 2022-04-11 08:33 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:40:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-cloud-controller-manager-operator pull 101 0 None open Bug 1985366: Use registered ports and ensure that ports are defined in pod specs 2021-07-23 15:53:55 UTC
Github openshift enhancements pull 847 0 None open Add CCM related ports to port registry 2021-07-23 15:23:50 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:41:15 UTC

Description Zane Bitter 2021-07-23 14:02:23 UTC
The Cluster Cloud Controller Manager Operator (BINGO!) is using host networking, for reasons presumed to be legitimate, and also exposes health checks on ports 9440 and 9441, meaning that it binds these ports on the host network. These ports are not registered to CCCMO in https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md#host-port-registry

The baremetal operator is also using host networking, for reasons that were legitimate historically if not historically legitimate, and also exposes health checks on port 9440, that being the default generated by kubebuilder. This port is also not registered to BMO in https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md#host-port-registry

The result of this is that when both pods land on the same node (at least 1/3 of the time), one or other will fail to start as they both bind the same port (see bug 1983975). This is causing many, many CI failures, so to resolve this BMO is moving its health endpoint to port 9446 (https://github.com/openshift/cluster-baremetal-operator/pull/180) and registering that port (https://github.com/openshift/enhancements/pull/844).

It would probably be wise for CCCMO to move its endpoints away from the kubebuilder defaults so long as it is using hot networking. In any event, the ports it is using on the host network must be registered in the Host Port Registry.

Comment 1 Joel Speed 2021-07-23 14:09:47 UTC
Thanks for the report Zane, agreed, we will register our own ports and move ours over to prevent future collisions

Comment 3 sunzhaohua 2021-08-11 07:56:29 UTC
verified
clusterversion: 4.9.0-0.nightly-2021-08-07-175228

$ oc edit deploy azure-cloud-controller-manager  -n openshift-cloud-controller-manager
        name: cloud-controller-manager
        ports:
        - containerPort: 10258
          hostPort: 10258
          name: https
$ oc edit deploy cluster-cloud-controller-manager-operator  -n openshift-cloud-controller-manager-operator
        name: cloud-config-sync-controller
        ports:
        - containerPort: 9258
          hostPort: 9258
          name: metrics
          protocol: TCP
        - containerPort: 9259
          hostPort: 9259
          name: healthz
          protocol: TCP

$ oc edit ds azure-cloud-node-manager -n openshift-cloud-controller-manager
        name: cloud-node-manager
        ports:
        - containerPort: 10263
          hostPort: 10263
          name: https

Comment 6 errata-xmlrpc 2021-10-18 17:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.