Tested with OCP 4.7 RC but certainly also happens in earlier versions. Baremetal UPI with IPv6/IPv4 dual stack feature. I know the feature is experimental but bare with me, the actual bug has nothing to do with the feature itself, but with a missing daemon-reload between nodeip-configuration.service and kubelet.service I found this in my environment because address-gen-mode (https://developer.gnome.org/NetworkManager/stable/settings-ipv6.html) by default is set to stable-privacy and between 2 reboots, my IPv6 address changed. Look at the logs [0] What's happening is that on first boot, the node has IPv6 address x and that IP is set via nodeip-configuration.service in /etc/systemd/system/kubelet.service.d/20-nodenet.conf On the next boot, the IPv6 address changes (for whatever reason related to stable-privacy). nodeip-confiugration.service runs, sets the correct IP in /etc/systemd/system/kubelet.service.d/20-nodenet.conf , but kubelet still starts with the old IPv6 address. That's because a daemon-reload is missing. ---------------------------------------------- [0] ~~~ root@openshift-master-0 ~]# journalctl | egrep 'fc00|reboot' -i Jan 31 15:54:50 localhost kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved Jan 31 15:54:50 localhost kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved Jan 31 15:54:50 localhost kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved Jan 31 15:54:50 localhost kernel: 0 base 0000C0000000 mask 3FFFC0000000 uncachable Jan 31 15:54:50 localhost kernel: PM: Registered nosave memory: [mem 0xfeffc000-0xfeffffff] Jan 31 15:54:50 localhost kernel: PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff] Jan 31 15:54:50 localhost kernel: pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref] Jan 31 15:54:50 localhost kernel: e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff] Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=debug msg="retrieved Address map map[0xc0002827e0:[127.0.0.1/8 lo ::1/128] 0xc000282900:[192.168.123.200/24 ens3 fc00::5c77:e4fe:8950:c79c/64]]" Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=debug msg="Ignoring filtered route {Ifindex: 2 Dst: fc00::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}" Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=debug msg="Address fc00::5c77:e4fe:8950:c79c/64 is on interface ens3 with default route" Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=info msg="Chosen Node IPs: [192.168.123.200 fc00::5c77:e4fe:8950:c79c]" Jan 31 15:55:24 openshift-master-0 bash[1737]: time="2021-01-31T15:55:24Z" level=info msg="Writing Kubelet service override with content [Service]\nEnvironment=\"KUBELET_NODE_IP=192.168.123.200\" \"KUBELET_NODE_IPS=192.168.123.200,fc00::5c77:e4fe:8950:c79c\"\n" Jan 31 15:55:45 openshift-master-0 machine-config-daemon[2156]: I0131 15:55:45.164296 2156 update.go:1902] Rebooting node Jan 31 15:55:45 openshift-master-0 root[2274]: machine-config-daemon[2156]: Rebooting node Jan 31 15:55:45 openshift-master-0 machine-config-daemon[2156]: I0131 15:55:45.168312 2156 update.go:1902] initiating reboot: Completing firstboot provisioning to rendered-master-456b4a39d3924f6f623a979a90c0e5a0 Jan 31 15:55:45 openshift-master-0 root[2275]: machine-config-daemon[2156]: initiating reboot: Completing firstboot provisioning to rendered-master-456b4a39d3924f6f623a979a90c0e5a0 Jan 31 15:55:45 openshift-master-0 systemd-logind[1539]: System is rebooting. Jan 31 15:55:45 openshift-master-0 systemd[1]: machine-config-daemon-reboot.service: Succeeded. Jan 31 15:55:45 openshift-master-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 14ms CPU time Jan 31 15:55:47 openshift-master-0 systemd[1]: Starting Reboot... -- Reboot -- Jan 31 15:56:27 localhost kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved Jan 31 15:56:27 localhost kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved Jan 31 15:56:27 localhost kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved Jan 31 15:56:27 localhost kernel: 0 base 0000C0000000 mask 3FFFC0000000 uncachable Jan 31 15:56:27 localhost kernel: PM: Registered nosave memory: [mem 0xfeffc000-0xfeffffff] Jan 31 15:56:27 localhost kernel: PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff] Jan 31 15:56:27 localhost kernel: pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref] Jan 31 15:56:27 localhost kernel: e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff] Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=debug msg="retrieved Address map map[0xc0002906c0:[127.0.0.1/8 lo ::1/128] 0xc000290b40:[192.168.123.200/24 br-ex fc00::fc1c:1e22:b052:ef48/64]]" Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=debug msg="Ignoring filtered route {Ifindex: 5 Dst: fc00::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}" Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=debug msg="Address fc00::fc1c:1e22:b052:ef48/64 is on interface br-ex with default route" Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=info msg="Chosen Node IPs: [192.168.123.200 fc00::fc1c:1e22:b052:ef48]" Jan 31 15:56:46 openshift-master-0 bash[1652]: time="2021-01-31T15:56:46Z" level=info msg="Writing Kubelet service override with content [Service]\nEnvironment=\"KUBELET_NODE_IP=192.168.123.200\" \"KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef48\"\n" Jan 31 15:56:46 openshift-master-0 hyperkube[1838]: I0131 15:56:46.668044 1838 flags.go:59] FLAG: --node-ip="192.168.123.200,fc00::5c77:e4fe:8950:c79c" Jan 31 15:56:57 openshift-master-0 hyperkube[1838]: E0131 15:56:57.845567 1838 kubelet_node_status.go:586] Failed to set some node status fields: failed to validate secondaryNodeIP: node IP: "fc00::5c77:e4fe:8950:c79c" not found in the host's network interfaces Jan 31 15:56:58 openshift-master-0 hyperkube[1838]: E0131 15:56:58.054892 1838 kubelet_node_status.go:586] Failed to set some node status fields: failed to validate secondaryNodeIP: node IP: "fc00::5c77:e4fe:8950:c79c" not found in the host's network interfaces Jan 31 15:56:58 openshift-master-0 hyperkube[1838]: E0131 15:56:58.463134 1838 kubelet_node_status.go:586] Failed to set some node status fields: failed to validate secondaryNodeIP: node IP: "fc00::5c77:e4fe:8950:c79c" not found in the host's network interfaces ~~~
I can reproduce this easily - I set the wrong IP manually in the file, then run systemctl daemon-reload and restart kubelet - the IPv6 address on the interface is fc00::fc1c:1e22:b052:ef48 and the setting should be corrected by nodeip-config service right before kubelet starts: ~~~ [root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef49" [root@openshift-master-0 ~]# systemctl daemon-reload [root@openshift-master-0 ~]# systemctl restart kubelet [root@openshift-master-0 ~]# ps aux | grep kubelet | grep node-ip root 79896 20.0 0.8 2014484 142008 ? Ssl 17:08 0:02 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::fc1c:1e22:b052:ef49 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2 ~~~ Before kubelet started, nodeip-configuration.service ran and set the correct address in the config file: ~~~ [root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef48" ~~~ But as you can see above, kubelet did not take it. Indeed, I can restart kubelet all I want, the service configuration file changed, but systemctl daemon-reload was not run, so kubelet will never come up with the correct IP: ~~~ [root@openshift-master-0 ~]# systemctl restart kubelet Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units. [root@openshift-master-0 ~]# ps aux | grep kubelet | grep node-ip root 82501 14.2 0.9 2156380 148156 ? Ssl 17:10 0:05 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::fc1c:1e22:b052:ef49 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2 ~~~ And here's how to fix this manually: ~~~ root@openshift-master-0 ~]# systemctl daemon-reload [root@openshift-master-0 ~]# systemctl restart kubelet [root@openshift-master-0 ~]# ps aux | grep kubelet | grep node-ip root 91904 18.6 0.8 1940752 140464 ? Ssl 17:19 0:03 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::fc1c:1e22:b052:ef48 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2 ~~~
~~~ [root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Wants=rpc-statd.service network-online.target crio.service After=network-online.target crio.service [Service] Type=notify ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state EnvironmentFile=/etc/os-release EnvironmentFile=-/etc/kubernetes/kubelet-workaround EnvironmentFile=-/etc/kubernetes/kubelet-env ExecStart=/usr/bin/hyperkube \ kubelet \ --config=/etc/kubernetes/kubelet.conf \ --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --runtime-cgroups=/system.slice/crio.service \ --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \ --node-ip=${KUBELET_NODE_IPS} \ --minimum-container-ttl-duration=6m0s \ --cloud-provider= \ --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ \ --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \ --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b \ --v=${KUBELET_LOG_LEVEL} Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ~~~ ~~~ [root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::fc1c:1e22:b052:ef48" [root@openshift-master-0 ~]# cat /etc/systemd/system/nodeip-configuration.service [Unit] Description=Writes IP address configuration so that kubelet and crio services select a valid node IP Wants=network-online.target After=network-online.target ignition-firstboot-complete.service Before=kubelet.service crio.service [Service] # Need oneshot to delay kubelet Type=oneshot # Would prefer to do Restart=on-failure instead of this bash retry loop, but # the version of systemd we have right now doesn't support it. It should be # available in systemd v244 and higher. ExecStart=/bin/bash -c " \ until \ /usr/bin/podman run --rm \ --authfile /var/lib/kubelet/config.json \ --net=host \ --volume /etc/systemd/system:/etc/systemd/system:z \ quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \ node-ip \ set --retry-on-failure; \ do \ sleep 5; \ done" [Install] RequiredBy=kubelet.service ~~~
Here's the fix: ~~~ [root@openshift-master-0 ~]# ps aux | grep kubelet root 1856 1.7 0.6 2014036 103448 ? Ssl 23:31 0:03 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::abff:51ff:cf1f:6d6c --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2 root 2306 0.0 0.0 12792 1088 pts/0 S+ 23:34 0:00 grep --color=auto kubelet [root@openshift-master-0 ~]# ip -6 a ls dev br-ex 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 fc00::cb09:6043:4da9:239f/64 scope global dynamic noprefixroute valid_lft 86349sec preferred_lft 14349sec inet6 fe80::de43:21c0:c08b:fbc7/64 scope link noprefixroute valid_lft forever preferred_lft forever [root@openshift-master-0 ~]# systemctl restart kubelet Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units. [root@openshift-master-0 ~]# vi /etc/systemd/system/nodeip-configuration.service [root@openshift-master-0 ~]# # if I daemo^C [root@openshift-master-0 ~]# cat !$ cat /etc/systemd/system/nodeip-configuration.service [Unit] Description=Writes IP address configuration so that kubelet and crio services select a valid node IP Wants=network-online.target After=network-online.target ignition-firstboot-complete.service Before=kubelet.service crio.service [Service] # Need oneshot to delay kubelet Type=oneshot # Would prefer to do Restart=on-failure instead of this bash retry loop, but # the version of systemd we have right now doesn't support it. It should be # available in systemd v244 and higher. ExecStart=/bin/bash -c " \ until \ /usr/bin/podman run --rm \ --authfile /var/lib/kubelet/config.json \ --net=host \ --volume /etc/systemd/system:/etc/systemd/system:z \ quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \ node-ip \ set --retry-on-failure; \ do \ sleep 5; \ done; \ systemctl daemon-reload" [Install] RequiredBy=kubelet.service [root@openshift-master-0 ~]# # if I daemon-reload now, I also reload kubelet, so let's reset kubelet to the earlier state [root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service kubelet.service kubelet.service.d/ kubelet.service.requires/ [root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service.d/20- 20-logging.conf 20-nodenet.conf [root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service.d/20-nodenet.conf [root@openshift-master-0 ~]# vi /etc/systemd/system/kubelet.service.d/20-nodenet.conf [root@openshift-master-0 ~]# cat !$ cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::abff:51ff:cf1f:6d6c" [root@openshift-master-0 ~]# systemctl daemon-reload [root@openshift-master-0 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.200" "KUBELET_NODE_IPS=192.168.123.200,fc00::abff:51ff:cf1f:6d6c" [root@openshift-master-0 ~]# ip a | grep kubelet [root@openshift-master-0 ~]# syspps ^C [root@openshift-master-0 ~]# ps aux | grep kubelet root 2460 1.5 0.6 1939792 106736 ? Ssl 23:34 0:03 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::abff:51ff:cf1f:6d6c --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2 root 2768 0.0 0.0 12792 1080 pts/0 S+ 23:38 0:00 grep --color=auto kubelet [root@openshift-master-0 ~]# systemctl restart kubelet [root@openshift-master-0 ~]# ps aux | grep kubelet root 2969 2.8 0.6 1939792 100432 ? Ssl 23:38 0:00 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=192.168.123.200,fc00::cb09:6043:4da9:239f --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9519ae9a0a3e262e311c7f12a08adb2568e29e1576d2c6c229fd5d355c551d4b --v=2 root 3038 0.0 0.0 12792 1084 pts/0 S+ 23:38 0:00 grep --color=auto kubelet [root@openshift-master-0 ~]# ip -6 a ls dev br-ex 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 fc00::cb09:6043:4da9:239f/64 scope global dynamic noprefixroute valid_lft 86354sec preferred_lft 14354sec inet6 fe80::de43:21c0:c08b:fbc7/64 scope link noprefixroute valid_lft forever preferred_lft forever ~~~
Workaround in my lab: ~~~ # workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1922812 if $IPV6 ; then mkdir -p /root/fake-root-master/etc/systemd/system/nodeip-configuration.service.d/ mkdir -p /root/fake-root-worker/etc/systemd/system/nodeip-configuration.service.d/ cat << 'EOF' > /root/fake-root-master/etc/systemd/system/nodeip-configuration.service.d/10-execstart.conf [Service] ExecStart= ExecStart=/bin/bash -c " \ until \ /usr/bin/podman run --rm \ --authfile /var/lib/kubelet/config.json \ --net=host \ --volume /etc/systemd/system:/etc/systemd/system:z \ quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \ node-ip \ set --retry-on-failure; \ do \ sleep 5; \ done; \ systemctl daemon-reload" EOF cat << 'EOF' > /root/fake-root-worker/etc/systemd/system/nodeip-configuration.service.d/10-execstart.conf [Service] ExecStart= ExecStart=/bin/bash -c " \ until \ /usr/bin/podman run --rm \ --authfile /var/lib/kubelet/config.json \ --net=host \ --volume /etc/systemd/system:/etc/systemd/system:z \ quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1e1542aa0934233fd1515872d2e1be4f1f1e5ce0c8d35860eb2847badd3c609 \ node-ip \ set --retry-on-failure; \ do \ sleep 5; \ done; \ systemctl daemon-reload" EOF fi for type in bootstrap master worker ; do filetranspiler/filetranspile -f /root/fake-root-${type}/ -i openshift-install/${type}.ign > /root/openshift-install/${type}.transpiled.ign # cat /root/openshift-install/${type}.transpiled.ign | jq '.ignition.config.append[0].source = "https://192.168.123.10:22623/config/'${type}'"' | tee /root/openshift-install/${type}.transpiled.jq.ign done echo "Copying ignition config files" for type in bootstrap master worker; do \cp /root/openshift-install/${type}.transpiled.ign /httpboot/openshift-${type}/${type}.ign chmod +r /httpboot/openshift-${type}/${type}.ign done ~~~
Just to illustrate this further: Let's say I update /etc/systemd/system/kubelet.service.d/20-nodenet.conf manually (but for whatever reason, see my earlier comments, this has not been set correctly by the initial configuration service e.g. because of ipv6 stable-privacy). In this environment, 192.168.123.221 is the correct IP. I set the wrong one on purpose with: ~~~ [root@openshift-worker-1 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.222" "KUBELET_NODE_IPS=192.168.123.222" [root@openshift-worker-1 ~]# systemctl daemon-reload [root@openshift-worker-1 ~]# systemctl restart kubelet [root@openshift-worker-1 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.222" "KUBELET_NODE_IPS=192.168.123.222" [root@openshift-worker-1 ~]# reboot Connection to openshift-worker-1.example.com closed by remote host. Connection to openshift-worker-1.example.com closed. ~~~ After reboot: ~~~ [root@openshift-jumpserver-0 ~]# ssh core.com Red Hat Enterprise Linux CoreOS 47.83.202103051045-0 Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`). WARNING: Direct SSH access to machines is not recommended; instead, make configuration changes via `machineconfig` objects: https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html --- Last login: Wed Mar 17 20:29:40 2021 from 192.168.123.1 [systemd] Failed Units: 1 NetworkManager-wait-online.service [core@openshift-worker-1 ~]$ sudo -i [systemd] Failed Units: 1 NetworkManager-wait-online.service [root@openshift-worker-1 ~]# ps aux | grep kubel root 4526 2.2 0.0 2752388 93952 ? Ssl 21:01 0:00 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --node-ip=192.168.123.222 --address=192.168.123.222 --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider= --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7b8e2e2857d8ac3499c9eb4e449cc3296409f1da21aa21d0140134d611e65b84 --v=2 root 4760 0.0 0.0 12792 1072 pts/0 S+ 21:02 0:00 grep --color=auto kubel [root@openshift-worker-1 ~]# cat /etc/systemd/system/kubelet.service.d/20-nodenet.conf [Service] Environment="KUBELET_NODE_IP=192.168.123.221" "KUBELET_NODE_IPS=192.168.123.221" ~~~ Due to the missing daemon-reload, the --node-ip is still not set right. So we can either: systemctl daemon-reload ; systemctl restart kubelet ... or simply reboot the node, now.
Belatedly noticing this bug; this is fixed as of 4.7.8. *** This bug has been marked as a duplicate of bug 1944394 ***