A kubelet running with --cloud-provider=”openstack” or --cloud-provider="aws" may stop working when upgraded to use CCM with --cloud-provider=”external”. Kubelet’s notion of node name changes during the upgrade. The in-tree cloud provider does not use FQDN hostname as the node name. Kubelet defaults to using FQDN hostname as the node name when cloud provider is changed to external. If these do not match we have an upgrade problem. For OpenStack, the detail is as follows; the issue is analogous on AWS: OpenStack in-tree cloud provider returns the node name as name returned by Nova metadata: openstack:CurrentNodeName(). When we switch to an external cloud provider, cloud is unset and kubelet defaults to using hostname: kubelet:getNodeName(). Hostname is set by afterburn to be hostname as returned by Nova metadata, which contains a domain suffix if one is defined: afterburn service for openstack. After the upgrade, kubelet can no longer find its own Node, because name != hostname. Hostname contains a domain suffix, whereas name does not. AWS has worked round this issue by using afterburn to set hostname to the unqualified hostname rather than the fully-qualified hostname. However, this change potentially has its own upgrade issues, especially when using third-party extensions which also rely on hostname, e.g. Calico. I believe there is a safer solution that can work for both providers. Steps to reproduce the issue: Install OpenShift on an OpenStack cloud that returns a domain name in hostname. This is the default for non-OSP OpenStack installations. OSP does not set a domain name by default, but can be configured to do so. Apply the ExternalCloudProvider feature gate Describe the results you received: The first node to upgrade will fail. kubelet logs are full of errors about being unable to find nodename. Static pods have not started. Heartbeats are not updated on the Node. Note that this is somewhat similar to, but distinct from https://github.com/kubernetes/kubernetes/issues/70897.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759