Bug 1763700 - NetworkManager & kubelet startup race condition
Summary: NetworkManager & kubelet startup race condition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.3.0
Assignee: Ryan Phillips
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1803962 (view as bug list)
Depends On:
Blocks: 1764719 1764720
TreeView+ depends on / blocked
 
Reported: 2019-10-21 11:39 UTC by Muhammad Aizuddin Zali
Modified: 2020-05-26 13:21 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1764719 1764720 1811758 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:08:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1206 0 'None' closed Bug 1763700: kubelet: add dependency on network-online.target 2020-11-26 12:16:02 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:08:56 UTC

Internal Links: 1804426

Description Muhammad Aizuddin Zali 2019-10-21 11:39:30 UTC
Description of problem:
During server reboot, especially when updating by CVO. Server registered as localhost.localdomain by the kubelet.



1. Network manager started here:
Oct 19 14:49:03 localhost.localdomain NetworkManager[1672]: <info>  [1571496543.0594] NetworkManager (version 1.14.0-14.el8) is starting... (for the first time)



2. And set localhost.localdomain as pre-configuration:
Oct 19 14:49:03 localhost.localdomain NetworkManager[1672]: <info>  [1571496543.2152] hostname: hostname: using hostnamed
Oct 19 14:49:03 localhost.localdomain NetworkManager[1672]: <info>  [1571496543.2154] hostname: hostname changed from (none) to "localhost.localdomain"



3. Here pivot started the deployment and kubelet BUT NetworkManager didnt yet completed.
Oct 19 14:49:03 localhost.localdomain rpm-ostree[1668]: BootedDeployment:
Oct 19 14:49:03 localhost.localdomain rpm-ostree[1668]: ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:09d692ec1fdf35f8f1f780f4040ba60cbc61dcb7b1560dec22de3c53f3839f70


4. NetworkManager completed here, but kubelet already sent localhost.localdomain as hostname, hence unrecognized by control plane. 
Oct 19 14:49:03 localhost.localdomain NetworkManager[1672]: <info>  [1571496543.7144] manager: NetworkManager state is now CONNECTED_GLOBAL
Oct 19 14:49:03 localhost.localdomain NetworkManager[1672]: <info>  [1571496543.7150] manager: startup complete
Oct 19 14:49:03 localhost.localdomain NetworkManager[1672]: <info>  [1571496543.7158] policy: set-hostname: set hostname to 'worker3.ocp.tmrnd.com.my' (from address lookup)


Version-Release number of selected component (if applicable):


How reproducible:
Unable to reproduce manually, it happens random.


Steps to Reproduce:
1.
2.
3.

Actual results:
kubelet started with localhost.localadmin due to NetworkManager did not yet finished configuring the network environment (DHCP).

Expected results:
Pivot and kublet should wait until NetworkManager completed before attempt to start the service.


Additional info:

Comment 12 Jonathan Lebon 2019-10-22 16:47:50 UTC
> The kubelet systemd unit waits for rpc-statd.service to start, which depends on network-online, nss-lookup, and rpcbind.socket units being ready. The network-online and nss-lookup units will only go active after the network and hostname are ready. Perhaps something is prematurely making those units ready?

IIUC, the definition for the kubelet unit comes from:
https://github.com/openshift/machine-config-operator/blob/4ca19844e2dea0fa8c2accef542942c3225ac4c5/templates/worker/01-worker-kubelet/_base/units/kubelet.yaml#L6

There, I see:

>   Wants=rpc-statd.service

This only tells systemd to try to activate that service unit as part of the transaction if the kubelet is activated. It makes no guarantees about ordering.

If we want to wait until NetworkManager is done, we can add:

```
After=network-online.target
```

There's some interesting tidbits about this in https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/. In general, it's not encouraged to depend on this unit in favour of just having software that can come up earlier and deal with changing networking conditions. I guess it applies a bit less in the OCP case though than e.g. a service running on a developer's laptop.


Anyway, I think doing this would fix the kubelet transiently reporting as localhost.localdomain bit, though I'm not entirely sure if that's the actual issue here. It seems like the concrete problem is:

> E1021 11:49:28.674945    3271 daemon.go:1136] expected target osImageURL quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:56ac1ec564bcb821ce49ffc46da9d0936751e9fa9aa3c49725fc2a33ad83a9b2
> E1021 11:49:28.674977    3271 writer.go:132] Marking Degraded due to: during bootstrap: unexpected on-disk state validating against rendered-worker-4f6018a7efbce5973afff8312f9b9a70

From the description and https://bugzilla.redhat.com/show_bug.cgi?id=1763700#c2, this is not about an initial pivot on first boot, it's just trying to update an existing worker node, right? If that's the case, seems like an issue with the MCD/pivot to me?

Do you have more logs from the MCD? Try with `oc logs -p`; it should include the logs from pivot and rpm-ostreed that the MCD proxies.

Comment 13 Ryan Phillips 2019-10-22 17:14:03 UTC
@Jonathan +1 to adding the `After=network-online.target`. It needs to be added to the master kubelet config as well [1]. 

https://github.com/openshift/machine-config-operator/blob/4ca19844e2dea0fa8c2accef542942c3225ac4c5/templates/master/01-master-kubelet/_base/units/kubelet.yaml

Comment 14 Jonathan Lebon 2019-10-22 19:11:14 UTC
> @Jonathan +1 to adding the `After=network-online.target`. It needs to be added to the master kubelet config as well [1]. 

To be clear, I think the ordering issue is one part of this, though like I mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1763700#c12, I'm not convinced it would actually fix the root cause of the update failure in this case. Seems like we need more information to diagnose.

Comment 15 Jonathan Lebon 2019-10-22 19:14:13 UTC
Sorry, I cleared the flags you set by accident in my last comment, and I see now Prow relies on this. Resetting those.

Comment 19 Micah Abbott 2019-10-29 19:35:25 UTC
Verified in 4.3.0-0.nightly-2019-10-29-140935

The original BZ was filed against 4.1, but there is no report of the original problem happening in 4.3 version.  Due to the rules enforcing how fixes are delivered to various versions, we must verify the fix in the current 'master' version (4.3).  In this case, we are simply verifying that the fix to the `kubelet.service` made in https://github.com/openshift/machine-config-operator/pull/1206 is present in recent 4.3 versions.

```
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2019-10-29-140935   True        False         5m46s   Cluster version is 4.3.0-0.nightly-2019-10-29-140935

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-129-239.us-west-1.compute.internal   Ready    master   20m   v1.16.2
ip-10-0-130-63.us-west-1.compute.internal    Ready    worker   13m   v1.16.2
ip-10-0-136-165.us-west-1.compute.internal   Ready    master   20m   v1.16.2
ip-10-0-140-83.us-west-1.compute.internal    Ready    worker   13m   v1.16.2
ip-10-0-152-194.us-west-1.compute.internal   Ready    master   20m   v1.16.2                                                                                                                                                                                                                
ip-10-0-158-57.us-west-1.compute.internal    Ready    worker   13m   v1.16.2

### Check kublet on master

$ oc debug node/ip-10-0-129-239.us-west-1.compute.internal
Starting pod/ip-10-0-129-239us-west-1computeinternal-debug ...
To use host binaries, run `chroot /host`     
Pod IP: 10.0.129.239                                     
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host              
sh-4.4# systemctl cat kubelet.service                       
# /etc/systemd/system/kubelet.service                                                                                                         
[Unit]                                       
Description=Kubernetes Kubelet                                         
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service
                                                                       
[Service]
Type=notify   
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env
                                                                       
ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider=aws \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ 
       \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --v=3

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...


### Check kublet on worker

$ oc debug node/ip-10-0-158-57.us-west-1.compute.internal
Starting pod/ip-10-0-158-57us-west-1computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.158.57
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
      --minimum-container-ttl-duration=6m0s \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
      --cloud-provider=aws \
       \
      --v=3

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-default-env.conf
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...
```

Comment 20 Nick Curry 2019-10-30 14:12:58 UTC
I am seeing something like this happening in VMware UPI 4.1. Cluster hosts are randomly getting reassigned localhost.localdomain and rejoining the cluster under that name instead of their original hostname.

Support ticket for one:
https://access.redhat.com/support/cases/#/case/02501662

Comment 21 Micah Abbott 2019-10-30 14:54:49 UTC
(In reply to Nick Curry from comment #20)
> I am seeing something like this happening in VMware UPI 4.1. Cluster hosts
> are randomly getting reassigned localhost.localdomain and rejoining the
> cluster under that name instead of their original hostname.
> 
> Support ticket for one:
> https://access.redhat.com/support/cases/#/case/02501662

There is an existing BZ for 4.1.z here - https://bugzilla.redhat.com/show_bug.cgi?id=1764719

Comment 22 Nick Curry 2019-11-05 19:05:13 UTC
Setting the hostname in /etc/hostname by modifying the ignition files or directly on the hosts after boot seems to serve as a workaround for this issue.

Comment 23 Mark Dunnett 2019-12-10 18:27:14 UTC
Is this fixed in 4.3?

Comment 24 Micah Abbott 2019-12-10 19:53:05 UTC
(In reply to Mark Dunnett from comment #23)
> Is this fixed in 4.3?

From comment#19, this was never reported as a problem in 4.3.  

However, the changes that needed to be back-ported to 4.1 will be included in the 4.3 release.

Comment 26 errata-xmlrpc 2020-01-23 11:08:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062

Comment 27 Colin Walters 2020-03-05 18:14:04 UTC
*** Bug 1803962 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.