Bug 1922812 - Missing daemon reload between nodeip-configuration.service and kubelet.service
Summary: Missing daemon reload between nodeip-configuration.service and kubelet.service
Keywords:
Status: CLOSED DUPLICATE of bug 1944394
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-31 17:25 UTC by Andreas Karis
Modified: 2021-07-15 14:35 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-15 14:35:34 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator issues 2375 0 None open Missing systemctl daemon-reload after nodeip-configuration.service 2021-02-15 21:56:14 UTC

Description Andreas Karis 2021-01-31 17:25:03 UTC
Tested with OCP 4.7 RC but certainly also happens in earlier versions. Baremetal UPI with IPv6/IPv4 dual stack feature. I know the feature is experimental but bare with me, the actual bug has nothing to do with the feature itself, but with  a missing daemon-reload between nodeip-configuration.service and kubelet.service

I found this in my environment because address-gen-mode (https://developer.gnome.org/NetworkManager/stable/settings-ipv6.html) by default is set to stable-privacy and between 2 reboots, my IPv6 address changed. Look at the logs [0]

What's happening is that on first boot, the node has IPv6 address x and that IP is set via nodeip-configuration.service  in /etc/systemd/system/kubelet.service.d/20-nodenet.conf

On the next boot, the IPv6 address changes (for whatever reason related to stable-privacy). nodeip-confiugration.service runs, sets the correct IP in /etc/systemd/system/kubelet.service.d/20-nodenet.conf , but kubelet still starts with the old IPv6 address. That's because a daemon-reload is missing.


----------------------------------------------

[0]

~~~
root@openshift-master-0 ~]# journalctl | egrep 'fc00|reboot' -i
Jan 31 15:54:50 localhost kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
Jan 31 15:54:50 localhost kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
Jan 31 15:54:50 localhost kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
Jan 31 15:54:50 localhost kernel:   0 base 0000C0000000 mask 3FFFC0000000 uncachable
Jan 31 15:54:50 localhost kernel: PM: Registered nosave memory: [mem 0xfeffc000-0xfeffffff]
Jan 31 15:54:50 localhost kernel: PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
Jan 31 15:54:50 localhost kernel: pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref]
Jan 31 15:54:50 localhost kernel: e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=debug msg="retrieved Address map map[0xc0002827e0:[127.0.0.1/8 lo ::1/128] 0xc000282900:[192.168.123.200/24 ens3 fc00::5c77:e4fe:8950:c79c/64]]"
Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=debug msg="Ignoring filtered route {Ifindex: 2 Dst: fc00::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=debug msg="Address fc00::5c77:e4fe:8950:c79c/64 is on interface ens3 with default route"
Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=info msg="Chosen Node IPs: [192.168.123.200 fc00::5c77:e4fe:8950:c79c]"
Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=info msg="Writing Kubelet service override with content [Service]\nEnvironment=\"KUBELET_NODE_IP=192.168.123.200\" \"KUBELET_NODE_IPS=192.168.123.200,fc00::5c77:e4fe:8950:c79c\"\n"
Jan 31 15:55:45 openshift-master-0 machine-config-daemon[2156]: I0131 15:55:45.164296    2156 update.go:1902] Rebooting node
Jan 31 15:55:45 openshift-master-0 root[2274]: machine-config-daemon[2156]: Rebooting node
Jan 31 15:55:45 openshift-master-0 machine-config-daemon[2156]: I0131 15:55:45.168312    2156 update.go:1902] initiating reboot: Completing firstboot provisioning to rendered-master-456b4a39d3924f6f623a979a90c0e5a0
Jan 31 15:55:45 openshift-master-0 root[2275]: machine-config-daemon[2156]: initiating reboot: Completing firstboot provisioning to rendered-master-456b4a39d3924f6f623a979a90c0e5a0
Jan 31 15:55:45 openshift-master-0 systemd-logind[1539]: System is rebooting.
Jan 31 15:55:45 openshift-master-0 systemd[1]: machine-config-daemon-reboot.service: Succeeded.
Jan 31 15:55:45 openshift-master-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 14ms CPU time
Jan 31 15:55:47 openshift-master-0 systemd[1]: Starting Reboot...
-- Reboot --
Jan 31 15:56:27 localhost kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
Jan 31 15:56:27 localhost kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
Jan 31 15:56:27 localhost kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
Jan 31 15:56:27 localhost kernel:   0 base 0000C0000000 mask 3FFFC0000000 uncachable
Jan 31 15:56:27 localhost kernel: PM: Registered nosave memory: [mem 0xfeffc000-0xfeffffff]
Jan 31 15:56:27 localhost kernel: PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
Jan 31 15:56:27 localhost kernel: pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref]
Jan 31 15:56:27 localhost kernel: e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=debug msg="retrieved Address map map[0xc0002906c0:[127.0.0.1/8 lo ::1/128] 0xc000290b40:[192.168.123.200/24 br-ex fc00::fc1c:1e22:b052:ef48/64]]"
Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=debug msg="Ignoring filtered route {Ifindex: 5 Dst: fc00::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=debug msg="Address fc00::fc1c:1e22:b052:ef48/64 is on interface br-ex with default route"
Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=info msg="Chosen Node IPs: [192.168.123.200 fc00::fc1c:1e22:b052:ef48]"
Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=info msg="Writing Kubelet service override with content [Service]\nEnvironment=\"KUBELET_NODE_IP=192.168.123.200\" \"KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef48\"\n"
Jan 31 15:56:46 openshift-master-0 hyperkube[1838]: I0131 15:56:46.668044    1838 flags.go:59] FLAG: --node-ip="192.168.123.200,fc00::5c77:e4fe:8950:c79c"
Jan 31 15:56:57 openshift-master-0 hyperkube[1838]: E0131 15:56:57.845567    1838 kubelet_node_status.go:586] Failed to set some node status fields: failed to validate secondaryNodeIP: node IP: "fc00::5c77:e4fe:8950:c79c" not found in the host's network interfaces
Jan 31 15:56:58 openshift-master-0 hyperkube[1838]: E0131 15:56:58.054892    1838 kubelet_node_status.go:586] Failed to set some node status fields: failed to validate secondaryNodeIP: node IP: "fc00::5c77:e4fe:8950:c79c" not found in the host's network interfaces
Jan 31 15:56:58 openshift-master-0 hyperkube[1838]: E0131 15:56:58.463134    1838 kubelet_node_status.go:586] Failed to set some node status fields: failed to validate secondaryNodeIP: node IP: "fc00::5c77:e4fe:8950:c79c" not found in the host's network interfaces
~~~

Comment 1 Andreas Karis 2021-01-31 17:30:33 UTC
I can reproduce this easily - I set the wrong IP manually in the file, then run systemctl daemon-reload and restart kubelet - the IPv6 address on the interface is fc00::fc1c:1e22:b052:ef48 and the setting should be corrected by nodeip-config service right before kubelet starts:
~~~
[root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef49"
[root@openshift-master-0 ~]# systemctl daemon-reload
[root@openshift-master-0 ~]# systemctl restart kubelet
[root@openshift-master-0 ~]#  ps aux | grep kubelet | grep node-ip
root       79896 20.0  0.8 2014484 142008 ?      Ssl  17:08   0:02 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::fc1c:1e22:b052:ef49 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2
~~~

Before kubelet started, nodeip-configuration.service ran and set the correct address in the config file:
~~~
[root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef48"
~~~

But as you can see above, kubelet did not take it. Indeed, I can restart kubelet all I want, the service configuration file changed, but systemctl daemon-reload was not run, so kubelet will never come up with the correct IP:
~~~
[root@openshift-master-0 ~]# systemctl restart kubelet
Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
[root@openshift-master-0 ~]#  ps aux | grep kubelet | grep node-ip
root       82501 14.2  0.9 2156380 148156 ?      Ssl  17:10   0:05 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::fc1c:1e22:b052:ef49 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2
~~~

And here's how to fix this manually:
~~~
root@openshift-master-0 ~]# systemctl daemon-reload
[root@openshift-master-0 ~]# systemctl restart kubelet
[root@openshift-master-0 ~]#  ps aux | grep kubelet | grep node-ip
root       91904 18.6  0.8 1940752 140464 ?      Ssl  17:19   0:03 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::fc1c:1e22:b052:ef48 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2
~~~

Comment 2 Andreas Karis 2021-01-31 17:33:23 UTC
~~~
[root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IPS} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider= \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
~~~

~~~
[root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef48"
[root@openshift-master-0 ~]# cat /etc/systemd/system/nodeip-configuration.service
[Unit]
Description=Writes IP address configuration so that kubelet and crio services select a valid node IP
Wants=network-online.target
After=network-online.target ignition-firstboot-complete.service
Before=kubelet.service crio.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
# Would prefer to do Restart=on-failure instead of this bash retry loop, but
# the version of systemd we have right now doesn't support it. It should be
# available in systemd v244 and higher.
ExecStart=/bin/bash -c " \
  until \
  /usr/bin/podman run --rm \
  --authfile /var/lib/kubelet/config.json \
  --net=host \
  --volume /etc/systemd/system:/etc/systemd/system:z \
  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \
  node-ip \
  set --retry-on-failure; \
  do \
  sleep 5; \
  done"

[Install]
RequiredBy=kubelet.service
~~~

Comment 3 Andreas Karis 2021-01-31 23:39:23 UTC
Here's the fix:
~~~
[root@openshift-master-0 ~]# ps aux | grep kubelet
root        1856  1.7  0.6 2014036 103448 ?      Ssl  23:31   0:03 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::abff:51ff:cf1f:6d6c --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2
root        2306  0.0  0.0  12792  1088 pts/0    S+   23:34   0:00 grep --color=auto kubelet
[root@openshift-master-0 ~]# ip -6 a ls dev br-ex
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    inet6 fc00::cb09:6043:4da9:239f/64 scope global dynamic noprefixroute 
       valid_lft 86349sec preferred_lft 14349sec
    inet6 fe80::de43:21c0:c08b:fbc7/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@openshift-master-0 ~]# systemctl restart kubelet
Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
[root@openshift-master-0 ~]# vi /etc/systemd/system/nodeip-configuration.service
[root@openshift-master-0 ~]# # if I daemo^C
[root@openshift-master-0 ~]# cat !$
cat /etc/systemd/system/nodeip-configuration.service
[Unit]
Description=Writes IP address configuration so that kubelet and crio services select a valid node IP
Wants=network-online.target
After=network-online.target ignition-firstboot-complete.service
Before=kubelet.service crio.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
# Would prefer to do Restart=on-failure instead of this bash retry loop, but
# the version of systemd we have right now doesn't support it. It should be
# available in systemd v244 and higher.
ExecStart=/bin/bash -c " \
  until \
  /usr/bin/podman run --rm \
  --authfile /var/lib/kubelet/config.json \
  --net=host \
  --volume /etc/systemd/system:/etc/systemd/system:z \
  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \
  node-ip \
  set --retry-on-failure; \
  do \
  sleep 5; \
  done; \
  systemctl daemon-reload"

[Install]
RequiredBy=kubelet.service
[root@openshift-master-0 ~]# # if I daemon-reload now, I also reload kubelet, so let's reset kubelet to the earlier state
[root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service
kubelet.service           kubelet.service.d/        kubelet.service.requires/ 
[root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service.d/20-
20-logging.conf  20-nodenet.conf  
[root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service.d/20-nodenet.conf 
[root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service.d/20-nodenet.conf 
[root@openshift-master-0 ~]# cat !$
cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::abff:51ff:cf1f:6d6c"
[root@openshift-master-0 ~]# systemctl daemon-reload
[root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::abff:51ff:cf1f:6d6c"
[root@openshift-master-0 ~]# ip a | grep kubelet
[root@openshift-master-0 ~]# syspps ^C
[root@openshift-master-0 ~]# ps aux | grep kubelet
root        2460  1.5  0.6 1939792 106736 ?      Ssl  23:34   0:03 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::abff:51ff:cf1f:6d6c --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2
root        2768  0.0  0.0  12792  1080 pts/0    S+   23:38   0:00 grep --color=auto kubelet
[root@openshift-master-0 ~]# systemctl restart kubelet
[root@openshift-master-0 ~]# ps aux | grep kubelet
root        2969  2.8  0.6 1939792 100432 ?      Ssl  23:38   0:00 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::cb09:6043:4da9:239f --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2
root        3038  0.0  0.0  12792  1084 pts/0    S+   23:38   0:00 grep --color=auto kubelet
[root@openshift-master-0 ~]# ip -6 a ls dev br-ex
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    inet6 fc00::cb09:6043:4da9:239f/64 scope global dynamic noprefixroute 
       valid_lft 86354sec preferred_lft 14354sec
    inet6 fe80::de43:21c0:c08b:fbc7/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
~~~

Comment 4 Andreas Karis 2021-02-01 00:11:10 UTC
Workaround in my lab:
~~~
# workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1922812
if $IPV6 ; then
mkdir -p /root/fake-root-master/etc/systemd/system/nodeip-configuration.service.d/
mkdir -p /root/fake-root-worker/etc/systemd/system/nodeip-configuration.service.d/
cat << 'EOF' > /root/fake-root-master/etc/systemd/system/nodeip-configuration.service.d/10-execstart.conf
[Service]
ExecStart=
ExecStart=/bin/bash -c " \
  until \
  /usr/bin/podman run --rm \
  --authfile /var/lib/kubelet/config.json \
  --net=host \
  --volume /etc/systemd/system:/etc/systemd/system:z \
  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \
  node-ip \
  set --retry-on-failure; \
  do \
  sleep 5; \
  done; \
  systemctl daemon-reload"
EOF
cat << 'EOF' > /root/fake-root-worker/etc/systemd/system/nodeip-configuration.service.d/10-execstart.conf
[Service]
ExecStart=
ExecStart=/bin/bash -c " \
  until \
  /usr/bin/podman run --rm \
  --authfile /var/lib/kubelet/config.json \
  --net=host \
  --volume /etc/systemd/system:/etc/systemd/system:z \
  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \
  node-ip \
  set --retry-on-failure; \
  do \
  sleep 5; \
  done; \
  systemctl daemon-reload"
EOF
fi

for type in bootstrap master worker ; do
        filetranspiler/filetranspile -f /root/fake-root-${type}/ -i openshift-install/${type}.ign > /root/openshift-install/${type}.transpiled.ign
        # cat /root/openshift-install/${type}.transpiled.ign | jq '.ignition.config.append[0].source = "https://192.168.123.10:22623/config/'${type}'"' | tee /root/openshift-install/${type}.transpiled.jq.ign
done

echo "Copying ignition config files"
for type in bootstrap master worker; do
        \cp /root/openshift-install/${type}.transpiled.ign /httpboot/openshift-${type}/${type}.ign
        chmod +r /httpboot/openshift-${type}/${type}.ign
done
~~~

Comment 5 Andreas Karis 2021-03-17 21:07:40 UTC
Just to illustrate this further:

Let's say I update /etc/systemd/system/kubelet.service.d/20-nodenet.conf manually (but for whatever reason, see my earlier comments, this has not been set correctly by the initial configuration service e.g. because of ipv6 stable-privacy).

In this environment, 192.168.123.221 is the correct IP. I set the wrong one on purpose with:

~~~
[root@openshift-worker-1 ~]# cat  /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.222" "KUBELET_NODE_IPS=192.168.123.222"
[root@openshift-worker-1 ~]# systemctl daemon-reload
[root@openshift-worker-1 ~]# systemctl restart kubelet
[root@openshift-worker-1 ~]# cat  /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.222" "KUBELET_NODE_IPS=192.168.123.222"
[root@openshift-worker-1 ~]# reboot
Connection to openshift-worker-1.example.com closed by remote host.
Connection to openshift-worker-1.example.com closed.
~~~


After reboot:
~~~
[root@openshift-jumpserver-0 ~]# ssh core@openshift-worker-1.example.com
Red Hat Enterprise Linux CoreOS 47.83.202103051045-0
  Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html

---
Last login: Wed Mar 17 20:29:40 2021 from 192.168.123.1
[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[core@openshift-worker-1 ~]$ sudo -i
[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[root@openshift-worker-1 ~]# ps aux | grep kubel
root        4526  2.2  0.0 2752388 93952 ?       Ssl  21:01   0:00 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --node-ip=192.168.123.222 --address=192.168.123.222 --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider= --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7b8e2e2857d8ac3499c9eb4e449cc3296409f1da21aa21d0140134d611e65b84 --v=2
root        4760  0.0  0.0  12792  1072 pts/0    S+   21:02   0:00 grep --color=auto kubel
[root@openshift-worker-1 ~]# cat  /etc/systemd/system/kubelet.service.d/20-nodenet.conf
[Service]
Environment="KUBELET_NODE_IP=192.168.123.221" "KUBELET_NODE_IPS=192.168.123.221"
~~~

Due to the missing daemon-reload, the --node-ip is still not set right. So we can either: systemctl daemon-reload ; systemctl restart kubelet ... or simply reboot the node, now.

Comment 6 Dan Winship 2021-07-15 14:35:34 UTC
Belatedly noticing this bug; this is fixed as of 4.7.8.

*** This bug has been marked as a duplicate of bug 1944394 ***


Note You need to log in before you can comment on or make changes to this bug.