Bug 1985366

Summary: CCCMO using unregistered host ports
Product: OpenShift Container Platform Reporter: Zane Bitter <zbitter>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Cloud Controller Manager QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:40:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zane Bitter 2021-07-23 14:02:23 UTC
The Cluster Cloud Controller Manager Operator (BINGO!) is using host networking, for reasons presumed to be legitimate, and also exposes health checks on ports 9440 and 9441, meaning that it binds these ports on the host network. These ports are not registered to CCCMO in https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md#host-port-registry

The baremetal operator is also using host networking, for reasons that were legitimate historically if not historically legitimate, and also exposes health checks on port 9440, that being the default generated by kubebuilder. This port is also not registered to BMO in https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md#host-port-registry

The result of this is that when both pods land on the same node (at least 1/3 of the time), one or other will fail to start as they both bind the same port (see bug 1983975). This is causing many, many CI failures, so to resolve this BMO is moving its health endpoint to port 9446 (https://github.com/openshift/cluster-baremetal-operator/pull/180) and registering that port (https://github.com/openshift/enhancements/pull/844).

It would probably be wise for CCCMO to move its endpoints away from the kubebuilder defaults so long as it is using hot networking. In any event, the ports it is using on the host network must be registered in the Host Port Registry.

Comment 1 Joel Speed 2021-07-23 14:09:47 UTC
Thanks for the report Zane, agreed, we will register our own ports and move ours over to prevent future collisions

Comment 3 sunzhaohua 2021-08-11 07:56:29 UTC
verified
clusterversion: 4.9.0-0.nightly-2021-08-07-175228

$ oc edit deploy azure-cloud-controller-manager  -n openshift-cloud-controller-manager
        name: cloud-controller-manager
        ports:
        - containerPort: 10258
          hostPort: 10258
          name: https
$ oc edit deploy cluster-cloud-controller-manager-operator  -n openshift-cloud-controller-manager-operator
        name: cloud-config-sync-controller
        ports:
        - containerPort: 9258
          hostPort: 9258
          name: metrics
          protocol: TCP
        - containerPort: 9259
          hostPort: 9259
          name: healthz
          protocol: TCP

$ oc edit ds azure-cloud-node-manager -n openshift-cloud-controller-manager
        name: cloud-node-manager
        ports:
        - containerPort: 10263
          hostPort: 10263
          name: https

Comment 6 errata-xmlrpc 2021-10-18 17:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759