Description of problem: After completing the deployment on Dell hardware which present an "idrac" NIC to RHCOS, when the nodes reboot, the Kubelet does not use the FQDN from /etc/hostname and try to discover the hostname using the information provided by the "idrac" NIC and it ends up with "localhost.localdomain". It use this information trying to interact with the K8s API but the Kubelet certificates are not valid anymore due to the FQDN mismatch. Version-Release number of selected component (if applicable): OCP 4.3.0 - Kubelet How reproducible: When Dell BMC is configured to present a NIC to the OS _W
I think a best practice here is to turn off the default of DHCP on connected interfaces if using static addressing: /etc/NetworkManager.conf.d/disabledhcp.conf [main] no-auto-default=* Or specifically just do: /etc/NetworkManager.conf.d/unmanaged-idrac.conf [keyfile] unmanaged-devices=interface-name:idrac
One thing perhaps we could do is add a kernel cmdline to make this even easier, like `nm.no-auto-default=*` or something.
I am not sure we can do more here; we default to DHCP which will potentially change the hostname. We did debate avoiding hostname changes after kubelet has started, but that blurs the concepts of "source of truth": https://github.com/coreos/ignition-dracut/pull/156 Disabling DHCP on interfaces that you don't want is the right thing to do. As mentioned above we can make this more ergonomic of course but I think that should be a separate bug.
I think we likely need a KCS on this. For people who are in this situation where you're assigning static IP addresses; if you are doing so via the kernel cmdline, then in OpenShift 4.4 you can pass the hostname on the kernel cmdline since https://github.com/coreos/ignition-dracut/pull/156 merged. If you are doing static IP addresses by injecting files into the pointer Ignition configuration, then you should also override `/etc/hostname` there. If you are using DHCP, but you only want to do DHCP on one specific interface and may have other interfaces, then the technique in https://bugzilla.redhat.com/show_bug.cgi?id=1800900#c1 may help.
Today, `kubelet.service` is `After=network-online.target`: https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/units/kubelet.yaml#L7 This situation will most often occur when something causes that to either fail, or occur before the expected IP address/hostname is assigned.
(In reply to Colin Walters from comment #7) > I think we likely need a KCS on this. > > For people who are in this situation where you're assigning static IP > addresses; if you are doing so via the kernel cmdline, then in OpenShift 4.4 > you can pass the hostname on the kernel cmdline since > https://github.com/coreos/ignition-dracut/pull/156 merged. > > If you are doing static IP addresses by injecting files into the pointer > Ignition configuration, then you should also override `/etc/hostname` there. > > If you are using DHCP, but you only want to do DHCP on one specific > interface and may have other interfaces, then the technique in > https://bugzilla.redhat.com/show_bug.cgi?id=1800900#c1 > may help. I'll get something created in the next couple of weeks. I may bug you or Micah if I have questions.