Bug 1839900

Summary: Racing condition with kubelet and hostname
Product: OpenShift Container Platform Reporter: Odilon Sousa <osousa>
Component: RHCOSAssignee: Micah Abbott <miabbott>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.zCC: aos-bugs, bbreard, imcleod, jligon, jokerman, nstielau, walters
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-26 13:21:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Odilon Sousa 2020-05-25 22:06:37 UTC
Description of problem:

After one outage on the hypervisor, the node was rebooted, and the kubelet could not start due to the hostname being localhost instead of the actual hostname of the node.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.3.18

How reproducible:

Shutdown the node turn and start right after, sometime the kubelet will start without the right hostname.

Steps to Reproduce:
1. Reboot the node and check if the hostname is localhost
2. Kubelet will not work as expected
3. Set the right hostname
4. Restart the kubelet.

Actual results:

18 13:54:10 localhost crio[1340]: time="2020-05-18 13:54:10.358249819Z" level=error msg="CNI network \"\" not found"
May 18 13:54:10 localhost systemd[1]: Started Open Container Initiative Daemon.
May 18 13:54:10 localhost systemd[1]: Starting Kubernetes Kubelet...
May 18 13:54:11 localhost hyperkube[1895]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version.
May 18 13:54:11 localhost hyperkube[1895]: I0518 13:54:11.204217    1895 flags.go:33] FLAG: --add-dir-header="false"
May 18 13:54:11 localhost hyperkube[1895]: I0518 13:54:11.204308    1895 flags.go:33] FLAG: --address="0.0.0.0"
May 18 13:54:11 localhost hyperkube[1895]: I0518 13:54:11.204314    1895 flags.go:33] FLAG: --allowed-unsafe-sysctls="[]"
May 18 13:54:11 localhost hyperkube[1895]: I0518 13:54:11.204319    1895 flags.go:33] FLAG: --alsologtostderr="false"
May 18 13:54:11 localhost hyperkube[1895]: I0518 13:54:11.204322    1895 flags.go:33] FLAG: --anonymous-auth="true"

Expected results:

May 18 13:54:13 localhost hyperkube[1895]: E0518 13:54:13.301465    1895 kubelet.go:2278] node "localhost" not found
May 18 13:54:13 localhost NetworkManager[1211]: <info>  [1589810053.3015] policy: set-hostname: set hostname to 'server3.example.com' (from address lookup)
May 18 13:54:13 server3.example.com systemd-hostnamed[1243]: Changed host name to 'server3.example.com'


Additional info:

The node is running on VMware.

Comment 2 Odilon Sousa 2020-05-25 22:09:51 UTC
I don't know if this could be related to the https://bugzilla.redhat.com/show_bug.cgi?id=1803962 .

Comment 4 Colin Walters 2020-05-26 13:21:47 UTC

*** This bug has been marked as a duplicate of bug 1803962 ***

Comment 6 Colin Walters 2020-05-26 19:36:09 UTC
This should be fixed as of e.g. 4.3.22:

```
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.22-x86_64 | grep machine-config
  machine-config-operator                       https://github.com/openshift/machine-config-operator                       c6a1e9b3d022671cef735d55eb277c140556b301
$ cd ~/src/machine-config-operator
$ git shortlog --no-merges c6a1e9b3d022671cef735d55eb277c140556b301 --grep=network-online
Ryan Phillips (1):
      Bug 1763700: kubelet: add dependency on network-online.target

W. Trevor King (1):
      templates/_base/master/units/etcd-member: Block on network-online.target
```

IOW please check that

```
[root@api ~]# grep network-online /etc/systemd/system/kubelet.service
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service
[root@api ~]# 
```