The Cluster Cloud Controller Manager Operator (BINGO!) is using host networking, for reasons presumed to be legitimate, and also exposes health checks on ports 9440 and 9441, meaning that it binds these ports on the host network. These ports are not registered to CCCMO in https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md#host-port-registry The baremetal operator is also using host networking, for reasons that were legitimate historically if not historically legitimate, and also exposes health checks on port 9440, that being the default generated by kubebuilder. This port is also not registered to BMO in https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md#host-port-registry The result of this is that when both pods land on the same node (at least 1/3 of the time), one or other will fail to start as they both bind the same port (see bug 1983975). This is causing many, many CI failures, so to resolve this BMO is moving its health endpoint to port 9446 (https://github.com/openshift/cluster-baremetal-operator/pull/180) and registering that port (https://github.com/openshift/enhancements/pull/844). It would probably be wise for CCCMO to move its endpoints away from the kubebuilder defaults so long as it is using hot networking. In any event, the ports it is using on the host network must be registered in the Host Port Registry.
Thanks for the report Zane, agreed, we will register our own ports and move ours over to prevent future collisions
verified clusterversion: 4.9.0-0.nightly-2021-08-07-175228 $ oc edit deploy azure-cloud-controller-manager -n openshift-cloud-controller-manager name: cloud-controller-manager ports: - containerPort: 10258 hostPort: 10258 name: https $ oc edit deploy cluster-cloud-controller-manager-operator -n openshift-cloud-controller-manager-operator name: cloud-config-sync-controller ports: - containerPort: 9258 hostPort: 9258 name: metrics protocol: TCP - containerPort: 9259 hostPort: 9259 name: healthz protocol: TCP $ oc edit ds azure-cloud-node-manager -n openshift-cloud-controller-manager name: cloud-node-manager ports: - containerPort: 10263 hostPort: 10263 name: https
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759