Bug 1800900 - After a reboot nodes get "localhost.localdomain" when "idrac" NIC is present [NEEDINFO]
Summary: After a reboot nodes get "localhost.localdomain" when "idrac" NIC is present
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.5.0
Assignee: Colin Walters
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1186913 1771572
TreeView+ depends on / blocked
 
Reported: 2020-02-08 22:30 UTC by William Caban
Modified: 2020-06-01 20:17 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-20 14:38:01 UTC
Target Upstream Version:
augol: needinfo? (william.caban)


Attachments (Terms of Use)

Description William Caban 2020-02-08 22:30:45 UTC
Description of problem:

After completing the deployment on Dell hardware which present an "idrac" NIC to RHCOS, when the nodes reboot, the Kubelet does not use the FQDN from /etc/hostname and try to discover the hostname using the information provided by the "idrac" NIC and it ends up with "localhost.localdomain". It use this information trying to interact with the K8s API but the Kubelet certificates are not valid anymore due to the FQDN mismatch.


Version-Release number of selected component (if applicable):

OCP 4.3.0 - Kubelet


How reproducible:

When Dell BMC is configured to present a NIC to the OS


_W

Comment 1 Colin Walters 2020-03-05 19:11:37 UTC
I think a best practice here is to turn off the default of DHCP on connected interfaces if using static addressing:

/etc/NetworkManager.conf.d/disabledhcp.conf
[main]
no-auto-default=*

Or specifically just do:

/etc/NetworkManager.conf.d/unmanaged-idrac.conf
[keyfile]
unmanaged-devices=interface-name:idrac

Comment 2 Colin Walters 2020-03-05 20:22:28 UTC
One thing perhaps we could do is add a kernel cmdline to make this even easier, like
`nm.no-auto-default=*` or something.

Comment 6 Colin Walters 2020-04-20 14:38:01 UTC
I am not sure we can do more here; we default to DHCP which will potentially change the hostname.  

We did debate avoiding hostname changes after kubelet has started, but that blurs the concepts of "source of truth":
https://github.com/coreos/ignition-dracut/pull/156

Disabling DHCP on interfaces that you don't want is the right thing to do.  As mentioned above we can make this more ergonomic of course but I think that should be a separate bug.

Comment 7 Colin Walters 2020-06-01 19:45:32 UTC
I think we likely need a KCS on this.

For people who are in this situation where you're assigning static IP addresses; if you are doing so via the kernel cmdline, then in OpenShift 4.4 you can pass the hostname on the kernel cmdline since https://github.com/coreos/ignition-dracut/pull/156 merged.

If you are doing static IP addresses by injecting files into the pointer Ignition configuration, then you should also override `/etc/hostname` there.

If you are using DHCP, but you only want to do DHCP on one specific interface and may have other interfaces, then the technique in
https://bugzilla.redhat.com/show_bug.cgi?id=1800900#c1
may help.

Comment 8 Colin Walters 2020-06-01 19:50:16 UTC
Today, `kubelet.service` is `After=network-online.target`:
https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/units/kubelet.yaml#L7

This situation will most often occur when something causes that to either fail, or occur before the expected IP address/hostname is assigned.

Comment 9 Derrick Ornelas 2020-06-01 20:17:24 UTC
(In reply to Colin Walters from comment #7)
> I think we likely need a KCS on this.
> 
> For people who are in this situation where you're assigning static IP
> addresses; if you are doing so via the kernel cmdline, then in OpenShift 4.4
> you can pass the hostname on the kernel cmdline since
> https://github.com/coreos/ignition-dracut/pull/156 merged.
> 
> If you are doing static IP addresses by injecting files into the pointer
> Ignition configuration, then you should also override `/etc/hostname` there.
> 
> If you are using DHCP, but you only want to do DHCP on one specific
> interface and may have other interfaces, then the technique in
> https://bugzilla.redhat.com/show_bug.cgi?id=1800900#c1
> may help.

I'll get something created in the next couple of weeks.  I may bug you or Micah if I have questions.


Note You need to log in before you can comment on or make changes to this bug.