Bug 1821950

Summary: nodeip-configuration.service fails when multiple ipv6 default routes are installed
Product: OpenShift Container Platform Reporter: Marius Cornea <mcornea>
Component: InstallerAssignee: Ben Nemec <bnemec>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED DUPLICATE Docs Contact:
Severity: urgent    
Priority: urgent CC: asegurap, sasha, stbenjam, vvoronko, yprokule
Version: 4.4Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-09 17:32:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2020-04-07 22:51:14 UTC
Description of problem:

nodeip-configuration.service fails when multiple ipv6 default routes are installed.

[root@openshift-worker-0 core]# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
2620:52:0:2e1d::80/121 dev ens1f1 proto ra metric 102 pref medium
fd00:1101::1c dev ens1f0 proto kernel metric 101 pref medium
fd00:1101::/64 dev ens1f0 proto ra metric 101 pref medium
fe80::/64 dev enp1s0f4u4 proto kernel metric 100 pref medium
fe80::/64 dev ens1f0 proto kernel metric 101 pref medium
fe80::/64 dev ens1f1 proto kernel metric 102 pref medium
fe80::/64 dev ens1f2 proto kernel metric 103 pref medium
fe80::/64 dev ens1f3 proto kernel metric 104 pref medium
default via fe80::5000:8c34:2d2f:55a0 dev ens1f0 proto ra metric 101 pref medium
default proto ra metric 102 
	nexthop via fe80::200:5eff:fe00:201 dev ens1f1 weight 1 
	nexthop via fe80::d207:ca01:5521:2700 dev ens1f1 weight 1 
	nexthop via fe80::2e21:3101:55e3:8f00 dev ens1f1 weight 1 pref medium

[root@openshift-worker-0 core]# systemctl restart nodeip-configuration.service
Job for nodeip-configuration.service failed because the control process exited with error code.
See "systemctl status nodeip-configuration.service" and "journalctl -xe" for details.


[root@openshift-worker-0 core]# systemctl status nodeip-configuration.service
● nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services select a valid node IP
   Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2020-04-07 22:48:48 UTC; 10s ago
  Process: 8885 ExecStart=/usr/local/bin/nodeip-finder 10.46.29.199 (code=exited, status=1/FAILURE)
 Main PID: 8885 (code=exited, status=1/FAILURE)
      CPU: 65ms

Apr 07 22:48:48 openshift-worker-0 nodeip-finder[8885]:     for route in (V6Route.from_line(rline) for rline in route_out.splitlines()):
Apr 07 22:48:48 openshift-worker-0 nodeip-finder[8885]:   File "/var/usrlocal/bin/non_virtual_ip", line 163, in <genexpr>
Apr 07 22:48:48 openshift-worker-0 nodeip-finder[8885]:     for route in (V6Route.from_line(rline) for rline in route_out.splitlines()):
Apr 07 22:48:48 openshift-worker-0 nodeip-finder[8885]:   File "/var/usrlocal/bin/non_virtual_ip", line 81, in from_line
Apr 07 22:48:48 openshift-worker-0 nodeip-finder[8885]:     return cls(**attrs)
Apr 07 22:48:48 openshift-worker-0 nodeip-finder[8885]: TypeError: __init__() got an unexpected keyword argument '\'
Apr 07 22:48:48 openshift-worker-0 systemd[1]: nodeip-configuration.service: Main process exited, code=exited, status=1/FAILURE
Apr 07 22:48:48 openshift-worker-0 systemd[1]: nodeip-configuration.service: Failed with result 'exit-code'.
Apr 07 22:48:48 openshift-worker-0 systemd[1]: Failed to start Writes IP address configuration so that kubelet and crio services select a valid node IP.
Apr 07 22:48:48 openshift-worker-0 systemd[1]: nodeip-configuration.service: Consumed 65ms CPU time



[root@openshift-worker-0 core]# /usr/local/bin/nodeip-finder 10.46.29.199
Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Filtering out Address(::1/128, dev=lo) due to it having host scope
Checking V6Route(fd00:1101::/64, dev=ens1f0) for Address(fd00:1101::1c/128, dev=ens1f0)
Traceback (most recent call last):
  File "/usr/local/bin/nodeip-finder", line 73, in <module>
    main()
  File "/usr/local/bin/nodeip-finder", line 54, in main
    first: non_virtual_ip.Address = first_candidate_addr(api_vip)
  File "/usr/local/bin/nodeip-finder", line 31, in first_candidate_addr
    iface_addrs = list(non_virtual_ip.interface_addrs(filters))
  File "/var/usrlocal/bin/non_virtual_ip", line 163, in interface_addrs
    for route in (V6Route.from_line(rline) for rline in route_out.splitlines()):
  File "/var/usrlocal/bin/non_virtual_ip", line 163, in <genexpr>
    for route in (V6Route.from_line(rline) for rline in route_out.splitlines()):
  File "/var/usrlocal/bin/non_virtual_ip", line 81, in from_line
    return cls(**attrs)
TypeError: __init__() got an unexpected keyword argument '\'


If I remove the default proto ra metric 102 route the scripts runs fine:

[root@openshift-worker-0 core]# ip -6 r del default proto ra metric 102

[root@openshift-worker-0 core]# /usr/local/bin/nodeip-finder 10.46.29.199
Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope
Filtering out Address(::1/128, dev=lo) due to it having host scope
Checking V6Route(fd00:1101::/64, dev=ens1f0) for Address(fd00:1101::1c/128, dev=ens1f0)
Is 10.46.29.199 between fd00:1101:: and fd00:1101::ffff:ffff:ffff:ffff
Is 10.46.29.199 between fd00:1101::1c and fd00:1101::1c
Is 10.46.29.199 between fe80:: and fe80::ffff:ffff:ffff:ffff
Is 10.46.29.199 between 10.46.29.128 and 10.46.29.255
Is 10.46.29.199 between fe80:: and fe80::ffff:ffff:ffff:ffff
Is 10.46.29.199 between fe80:: and fe80::ffff:ffff:ffff:ffff
Is 10.46.29.199 between fe80:: and fe80::ffff:ffff:ffff:ffff
Is 10.46.29.199 between 16.1.15.0 and 16.1.15.3
Is 10.46.29.199 between fe80:: and fe80::ffff:ffff:ffff:ffff
VIP Subnet 10.46.29.128/25


Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-04-04-025830

How reproducible:
100%

Comment 1 Ben Nemec 2020-04-09 16:44:49 UTC
This should have been fixed in 4.4 by https://github.com/openshift/machine-config-operator/pull/1616 .

Comment 2 Ben Nemec 2020-04-09 17:32:04 UTC

*** This bug has been marked as a duplicate of bug 1817236 ***