Bug 1800900

Summary: After a reboot nodes get "localhost.localdomain" when "idrac" NIC is present
Product: OpenShift Container Platform Reporter: William Caban <william.caban>
Component: RHCOSAssignee: Colin Walters <walters>
Status: CLOSED WONTFIX QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.3.0CC: aos-bugs, augol, bbreard, dmoessne, dornelas, dustymabe, fedoraproject, imcleod, jligon, jokerman, miabbott, nstielau, obockows, sgordon, walters
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-20 14:38:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1771572    

Description William Caban 2020-02-08 22:30:45 UTC
Description of problem:

After completing the deployment on Dell hardware which present an "idrac" NIC to RHCOS, when the nodes reboot, the Kubelet does not use the FQDN from /etc/hostname and try to discover the hostname using the information provided by the "idrac" NIC and it ends up with "localhost.localdomain". It use this information trying to interact with the K8s API but the Kubelet certificates are not valid anymore due to the FQDN mismatch.


Version-Release number of selected component (if applicable):

OCP 4.3.0 - Kubelet


How reproducible:

When Dell BMC is configured to present a NIC to the OS


_W

Comment 1 Colin Walters 2020-03-05 19:11:37 UTC
I think a best practice here is to turn off the default of DHCP on connected interfaces if using static addressing:

/etc/NetworkManager.conf.d/disabledhcp.conf
[main]
no-auto-default=*

Or specifically just do:

/etc/NetworkManager.conf.d/unmanaged-idrac.conf
[keyfile]
unmanaged-devices=interface-name:idrac

Comment 2 Colin Walters 2020-03-05 20:22:28 UTC
One thing perhaps we could do is add a kernel cmdline to make this even easier, like
`nm.no-auto-default=*` or something.

Comment 6 Colin Walters 2020-04-20 14:38:01 UTC
I am not sure we can do more here; we default to DHCP which will potentially change the hostname.  

We did debate avoiding hostname changes after kubelet has started, but that blurs the concepts of "source of truth":
https://github.com/coreos/ignition-dracut/pull/156

Disabling DHCP on interfaces that you don't want is the right thing to do.  As mentioned above we can make this more ergonomic of course but I think that should be a separate bug.

Comment 7 Colin Walters 2020-06-01 19:45:32 UTC
I think we likely need a KCS on this.

For people who are in this situation where you're assigning static IP addresses; if you are doing so via the kernel cmdline, then in OpenShift 4.4 you can pass the hostname on the kernel cmdline since https://github.com/coreos/ignition-dracut/pull/156 merged.

If you are doing static IP addresses by injecting files into the pointer Ignition configuration, then you should also override `/etc/hostname` there.

If you are using DHCP, but you only want to do DHCP on one specific interface and may have other interfaces, then the technique in
https://bugzilla.redhat.com/show_bug.cgi?id=1800900#c1
may help.

Comment 8 Colin Walters 2020-06-01 19:50:16 UTC
Today, `kubelet.service` is `After=network-online.target`:
https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/units/kubelet.yaml#L7

This situation will most often occur when something causes that to either fail, or occur before the expected IP address/hostname is assigned.

Comment 9 Derrick Ornelas 2020-06-01 20:17:24 UTC
(In reply to Colin Walters from comment #7)
> I think we likely need a KCS on this.
> 
> For people who are in this situation where you're assigning static IP
> addresses; if you are doing so via the kernel cmdline, then in OpenShift 4.4
> you can pass the hostname on the kernel cmdline since
> https://github.com/coreos/ignition-dracut/pull/156 merged.
> 
> If you are doing static IP addresses by injecting files into the pointer
> Ignition configuration, then you should also override `/etc/hostname` there.
> 
> If you are using DHCP, but you only want to do DHCP on one specific
> interface and may have other interfaces, then the technique in
> https://bugzilla.redhat.com/show_bug.cgi?id=1800900#c1
> may help.

I'll get something created in the next couple of weeks.  I may bug you or Micah if I have questions.

Comment 10 Red Hat Bugzilla 2024-01-06 04:27:59 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days