Bug 1819484

Summary: nodeip-configuration 'Failed to find suitable node ip'
Product: OpenShift Container Platform Reporter: Antoni Segura Puimedon <asegurap>
Component: Machine Config OperatorAssignee: Antoni Segura Puimedon <asegurap>
Status: CLOSED ERRATA QA Contact: Victor Voronkov <vvoronko>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: asegurap, dtrainor, eminguez, jparrill, jsaucier, kboumedh, tschaibl, vlaad, vvoronko, ykashtan
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The systemd service doing control plane IP detection and configuring for Kubelet and CRI-O could run before any control plane IP was configured. Consequence: Fail to configure Kubelet and CRI-O. Issues connecting to them Fix: Retry on failure Result: Retries until the interface has a control plane IP configured.
Story Points: ---
Clone Of: 1817594 Environment:
Last Closed: 2020-07-13 17:24:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1771572, 1817594    

Comment 4 Victor Voronkov 2020-04-07 09:21:27 UTC
To verify the bug on OCP 4.5.0-0.nightly-2020-04-04-073956
we modified the systemd configuration to let the nodeip service start before network-online.target

sudo vi /etc/systemd/system/nodeip-configuration.service
[Unit]
Description=Writes IP address configuration so that kubelet and crio services select a valid node IP
# This only applies to VIP managing environments where the kubelet and crio IP
# address picking logic is flawed and may end up selecting an address from a
# different subnet or a deprecated address
#Wants=network-online.target
After=ignition-firstboot-complete.service
Before=kubelet.service crio.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/usr/local/bin/nodeip-finder --retry-on-failure fd2e:6f44:5dd8::5

[Install]
WantedBy=multi-user.target



##### then rebooted the node and watch the log:

journalctl -u nodeip-configuration.service
-- Reboot --
Apr 07 08:33:16 localhost systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Failed to find suitable node ip. Retrying...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Failed to find suitable node ip. Retrying...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Failed to find suitable node ip. Retrying...
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Filtering out Address(::1/128, dev=lo) due to it having host scope
Apr 07 08:33:19 localhost nodeip-finder[1304]: Checking V6Route(fd2e:6f44:5dd8::/64, dev=enp5s0) for Address(fd2e:6f44:5dd8::100/128, dev=enp5s0)
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fd2e:6f44:5dd8:: and fd2e:6f44:5dd8:0:ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fd2e:6f44:5dd8::100 and fd2e:6f44:5dd8::100
Apr 07 08:33:19 localhost nodeip-finder[1304]: Is fd2e:6f44:5dd8::5 between fe80:: and fe80::ffff:ffff:ffff:ffff
Apr 07 08:33:19 localhost nodeip-finder[1304]: VIP Subnet fd2e:6f44:5dd8::/64
Apr 07 08:33:19 localhost nodeip-finder[1304]: Processing CustomAction for target
Apr 07 08:33:19 localhost nodeip-finder[1304]:   parser = 140099139248984
Apr 07 08:33:19 localhost nodeip-finder[1304]:   values = 'fd2e:6f44:5dd8::5'
Apr 07 08:33:19 localhost nodeip-finder[1304]:   option_string = None
Apr 07 08:33:19 localhost systemd[1]: Started Writes IP address configuration so that kubelet and crio services select a valid node IP.
Apr 07 08:33:19 localhost systemd[1]: nodeip-configuration.service: Consumed 162ms CPU time



sudo systemctl status crio.service
Warning: The unit file, source configuration file or drop-ins of crio.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-default-env.conf, 20-nodenet.conf, 20-stream-address.conf
   Active: active (running) since Tue 2020-04-07 08:33:49 UTC; 13min ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 1469 (crio)
    Tasks: 46
   Memory: 194.5M
      CPU: 1min 36.320s
   CGroup: /system.slice/crio.service
           └─1469 /usr/bin/crio --stream-address=fd2e:6f44:5dd8::100 --enable-metrics=true --metrics-port=9537


sudo systemctl status kubelet.service
Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-default-env.conf, 20-nodenet.conf
   Active: active (running) since Tue 2020-04-07 08:33:49 UTC; 15min ago
  Process: 1909 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 1907 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 1911 (kubelet)
    Tasks: 75
   Memory: 255.6M
      CPU: 1min 48.590s
   CGroup: /system.slice/kubelet.service
           └─1911 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfi> --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip=fd2e:6f44:5dd8::100 --address=fd2e:6f44:5dd8::100 --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --v=3

cluster state is good, all nodes ready

Comment 6 errata-xmlrpc 2020-07-13 17:24:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409