Bug 1829453 - kubelet attempts to register system using hostname of localhost.localdomain
Summary: kubelet attempts to register system using hostname of localhost.localdomain
Keywords:
Status: CLOSED DUPLICATE of bug 1853584
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.3.z
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 4.6.0
Assignee: Seth Jennings
QA Contact: Sunil Choudhary
URL:
Whiteboard: Telco
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-29 14:55 UTC by Dan Williams
Modified: 2020-08-10 20:30 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-10 20:30:52 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Dan Williams 2020-04-29 14:55:39 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1828458 for the NM side fix to allow customization of the nm-online timeout.

BUT, I can't think of any reason that we would ever want kubelet in an OCP cluster to attempt to register itself with default hostnames like localhost.localdomain or localhost6.localdomain6.

Perhaps the systemd unit could just fail if that happens and let it get restarted, and eventually it would have the right hostname and succeed?

Comment 1 W. Trevor King 2020-04-29 15:22:12 UTC
Bug 1817774 is asking for alerting (or some other automated reporting) around localhost nodes.

Comment 3 Ryan Phillips 2020-06-01 19:46:16 UTC
Is the bug here that the hostname eventually gets set correctly? I think we could add an ExecStartPre script to check for a `localhost` fqdn, but does the fqdn eventually get set?

Comment 4 Dan Williams 2020-06-01 20:17:56 UTC
(In reply to Ryan Phillips from comment #3)
> Is the bug here that the hostname eventually gets set correctly? I think we
> could add an ExecStartPre script to check for a `localhost` fqdn, but does
> the fqdn eventually get set?

Yes, I believe the FQDN does get set eventually. But kubelet doesn't listen for hostname changes, it tries to register whatever is set when it starts. Sequence is something like:

1) machines starts
2) NM begins DHCP on interfaces
3) DHCP takes longer than the default 30-second nm-online timeout because Enterprise Hardware
4) systemd network-wait-online unit times out, boot proceeds
5) kubelet systemd unit is now allowed to proceed
6) kubelet starts, sees hostname of localhost.localdomain
7) shortly thereafter, DHCP completes and NM sets the machine hostname to the correct FQDN

Comment 6 Seth Jennings 2020-08-10 16:06:29 UTC
Ryan is on leave

Comment 7 Seth Jennings 2020-08-10 19:31:56 UTC
Should be fixed by https://github.com/openshift/machine-config-operator/pull/1914

Comment 10 Scott Dodson 2020-08-10 20:27:51 UTC
If this is specific to getting this fixed in 4.3.z this should depend on https://bugzilla.redhat.com/show_bug.cgi?id=1855878, correct?
If so this needs to be detached from the 4.6.0 errata which was triggered by moving it MODIFIED.

If this is not specific to getting this fixed in 4.3.z shouldn't this just be closed as a dupe of 1855878 ?

Comment 11 Seth Jennings 2020-08-10 20:30:52 UTC

*** This bug has been marked as a duplicate of bug 1853584 ***


Note You need to log in before you can comment on or make changes to this bug.