Bug 1819484 - nodeip-configuration 'Failed to find suitable node ip'
Summary: nodeip-configuration 'Failed to find suitable node ip'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Antoni Segura Puimedon
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks: 1771572 1817594
TreeView+ depends on / blocked
 
Reported: 2020-04-01 00:07 UTC by Antoni Segura Puimedon
Modified: 2020-07-13 17:24 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The systemd service doing control plane IP detection and configuring for Kubelet and CRI-O could run before any control plane IP was configured. Consequence: Fail to configure Kubelet and CRI-O. Issues connecting to them Fix: Retry on failure Result: Retries until the interface has a control plane IP configured.
Clone Of: 1817594
Environment:
Last Closed: 2020-07-13 17:24:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1601 0 None closed Bug 1819484: Nodeip retry on failure 2021-02-19 08:52:59 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:24:49 UTC

Comment 4 Victor Voronkov 2020-04-07 09:21:27 UTC
To verify the bug on OCP 4.5.0-0.nightly-2020-04-04-073956
we modified the systemd configuration to let the nodeip service start before network-online.target

sudo vi /etc/systemd/system/nodeip-configuration.service
[Unit]
Description=Writes IP address configuration so that kubelet and crio services select a valid node IP
# This only applies to VIP managing environments where the kubelet and crio IP
# address picking logic is flawed and may end up selecting an address from a
# different subnet or a deprecated address
#Wants=network-online.target
After=ignition-firstboot-complete.service
Before=kubelet.service crio.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/usr/local/bin/nodeip-finder --retry-on-failure fd2e:6f44:5dd8::5

[Install]
WantedBy=multi-user.target



##### then rebooted the node and watch the log:

journalctl -u nodeip-configuration.service
-- Reboot --
Apr 07 08:33:16 localhost systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Failed to find suitable node ip. Retrying...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Failed to find suitable node ip. Retrying...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Failed to find suitable node ip. Retrying...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Checking V6Route(fd2e:6f44:5dd8::/64, dev=enp5s0) for Address(fd2e:6f44:5dd8::100/128, dev=enp5s0)
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fd2e:6f44:5dd8:: and fd2e:6f44:5dd8:0:ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fd2e:6f44:5dd8::100 and fd2e:6f44:5dd8::100
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: VIP Subnet fd2e:6f44:5dd8::/64
Apr 07 08:33:19 localhost nodeip-finder[1304]: Processing CustomAction for target
Apr 07 08:33:19 localhost nodeip-finder[1304]:   parser = 140099139248984
Apr 07 08:33:19 localhost nodeip-finder[1304]:   values = 'fd2e:6f44:5dd8::5'
Apr 07 08:33:19 localhost nodeip-finder[1304]:   option_string = None
Apr 07 08:33:19 localhost systemd[1]: Started Writes IP address configuration so that kubelet and crio services select a valid node IP.
Apr 07 08:33:19 localhost systemd[1]: nodeip-configuration.service: Consumed 162ms CPU time



sudo systemctl status crio.service
Warning: The unit file, source configuration file or drop-ins of crio.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-default-env.conf, 20-nodenet.conf, 20-stream-address.conf
   Active: active (running) since Tue 2020-04-07 08:33:49 UTC; 13min ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 1469 (crio)
    Tasks: 46
   Memory: 194.5M
      CPU: 1min 36.320s
   CGroup: /system.slice/crio.service
           └─1469 /usr/bin/crio --stream-address=fd2e:6f44:5dd8::100 --enable-metrics=true --metrics-port=9537


sudo systemctl status kubelet.service
Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-default-env.conf, 20-nodenet.conf
   Active: active (running) since Tue 2020-04-07 08:33:49 UTC; 15min ago
  Process: 1909 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 1907 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 1911 (kubelet)
    Tasks: 75
   Memory: 255.6M
      CPU: 1min 48.590s
   CGroup: /system.slice/kubelet.service
           └─1911 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfi> --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=fd2e:6f44:5dd8::100 --address=fd2e:6f44:5dd8::100 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --v=3

cluster state is good, all nodes ready

Comment 6 errata-xmlrpc 2020-07-13 17:24:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.