Bug 1764719 - NetworkManager & kubelet startup race condition
Summary: NetworkManager & kubelet startup race condition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.1.z
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.1.z
Assignee: Jonathan Lebon
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1811758 1763700 1764720 1811821 1811827
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-23 15:49 UTC by Micah Abbott
Modified: 2020-03-09 20:54 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1763700
Environment:
Last Closed: 2019-12-20 00:29:47 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1240 'None' closed [release-4.1] Bug 1764719: kubelet: add dependency on network-online.target 2020-06-10 10:43:56 UTC
Red Hat Product Errata RHBA-2019:4186 None None None 2019-12-20 00:29:49 UTC

Comment 4 Michael Nguyen 2019-12-11 19:58:21 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-12-10-215901   True        False         53m     Cluster version is 4.1.0-0.nightly-2019-12-10-215901

$ oc get nodes
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-137-78.ec2.internal    Ready    worker   57m   v1.13.4+a40116faa
ip-10-0-141-27.ec2.internal    Ready    master   62m   v1.13.4+a40116faa
ip-10-0-146-133.ec2.internal   Ready    worker   57m   v1.13.4+a40116faa
ip-10-0-158-127.ec2.internal   Ready    master   62m   v1.13.4+a40116faa
ip-10-0-162-47.ec2.internal    Ready    worker   57m   v1.13.4+a40116faa
ip-10-0-169-253.ec2.internal   Ready    master   61m   v1.13.4+a40116faa
$ oc debug node/ip-10-0-137-78.ec2.internal
Starting pod/ip-10-0-137-78ec2internal-debug ...
To use host binaries, run `chroot /host`
chroot /host
If you don't see a command prompt, try pressing enter.
chroot /host
sh-4.4# systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --allow-privileged \
      --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_version=${VERSIO>
      --minimum-container-ttl-duration=6m0s \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
      --client-ca-file=/etc/kubernetes/kubelet-ca.crt \
      --cloud-provider=aws \
       \
      --anonymous-auth=false \
      --v=3 \

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target



$ oc debug node/ip-10-0-141-27.ec2.internal
Starting pod/ip-10-0-141-27ec2internal-debug ...
To use host binaries, run `chroot /host`


chroot /host
If you don't see a command prompt, try pressing enter.

sh-4.2# 
sh-4.2# chroot /host
sh-4.4# systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --rotate-certificates \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --allow-privileged \
      --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_version=${VERSIO>
      --minimum-container-ttl-duration=6m0s \
      --client-ca-file=/etc/kubernetes/kubelet-ca.crt \
      --cloud-provider=aws \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --anonymous-auth=false \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --v=3 \

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
sh-4.4#

Comment 6 errata-xmlrpc 2019-12-20 00:29:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4186


Note You need to log in before you can comment on or make changes to this bug.