Bug 1990625 - Ironic agent registers with SLAAC address with privacy-stable
Summary: Ironic agent registers with SLAAC address with privacy-stable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Ben Nemec
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks: 2008210
TreeView+ depends on / blocked
 
Reported: 2021-08-05 18:35 UTC by davis phillips
Modified: 2022-03-12 04:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: addr-gen-mode connection parameter not persisted to OVNKubernetes bridge. Consequence: IPv6 addresses may change when the bridge is created, which breaks the cluster because node ip changes are not supported. Fix: Maintain addr-gen-mode when creating bridge. Result: IP address is consistent throughout the deployment process.
Clone Of:
: 2008210 (view as bug list)
Environment:
Last Closed: 2022-03-12 04:37:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2770 0 None open Bug 1990625: configure-ovs: Persist addr-gen-mode for ipv6 connections 2021-09-16 15:24:20 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:37:46 UTC

Description davis phillips 2021-08-05 18:35:23 UTC
Description of problem:
In an environment where SLAAC is enabled the default ironic addr-gen-mode is 	stable-privacy. This causes inconsistencies in agent_url registration if the SLAAC address comes online before the DHCPv6 or static address. 

After cleaning and reboot, the address will change and the agent_url will be pointing towards a non-existent endpoint. After OCP installation the following addr-gen-mode settings are applied:

# grep -rni eui64
system-connections/default_connection.nmconnection:19:addr-gen-mode=eui64
systemConnectionsMerged/default_connection.nmconnection:19:addr-gen-mode=eui64

Example baremetal node list:

[root@localhost /]# baremetal node list
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name               | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| 3d05c231-1b27-4702-8acf-1f53ae7f22e7 | openshift-master-0 | None          | power on    | clean wait         | True        |
| 79c92ad8-8004-46e5-8be3-1917a1123ff7 | openshift-master-2 | None          | power on    | clean wait         | True        |
| 9bb1d830-77ac-41b5-9c66-d3cd3dad379b | openshift-master-1 | None          | power on    | available          | False       |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+

9bb1d830-77ac-41b5-9c66-d3cd3dad379b 
'agent_url': 'https://[fd00:4888:2000:1099::13]:9999'

79c92ad8-8004-46e5-8be3-1917a1123ff7
'agent_url': 'https://[fd00:4888:2000:1099:9af2:b3ff:fe2c:e2e8]:9999'

3d05c231-1b27-4702-8acf-1f53ae7f22e7
'agent_url': 'https://[fd00:4888:2000:1099:9af2:b3ff:fe2c:f2a1]:9999'

Version-Release number of selected component (if applicable):


Steps to Reproduce:
1. Enable SLAAC and DHCPv6 on network
2. Enable boot to redfish or virtual media image method in install-config.

Actual results:
Inconsistent nodes registering properly with ironic

Expected results:
Nodes properly registering regardless of DHCPv6 address or SLAAC address

Additional info:

Comment 1 Bob Fournier 2021-08-06 01:11:22 UTC
Ironic isn't setting the interfaces to use a particular addr_gen_mode but on the master nodes some interfaces are set to 1 (stable-privacy) and some to 0 (eui64). On master-1 in a dev-scripts setup for example I see all interfaces using eui64 except for enp2s0, e.g:
[core@master-1 conf]$ cat ./enp1s0/addr_gen_mode
0
[core@master-1 conf]$ cat ./enp2s0/addr_gen_mode
1

Davis pointed out a similar bug - https://bugzilla.redhat.com/show_bug.cgi?id=1873021. According to that bug - "NetworkManager defaults to addr-gen-mode=stable-privacy (if you create a profile with nmcli/D-Bus/libnm that doesn't set it otherwise). If you want eui64, you need to explicitly set it (if you create a profile with nmcli/D-Bus/libnm)."

Comment 2 Bob Fournier 2021-08-06 15:22:04 UTC
Some additional observations focused on how we can make the change to set addr_gen_mode=eui64 for all interfaces. 

There is an RFE request here - https://bugzilla.redhat.com/show_bug.cgi?id=1743161 to allow the configuration of addr-gen-mode to the global NM config such that it could be set similar to how dchp-duid is done (e.g. https://github.com/openshift/installer/pull/5110), but it appears that RFE will not be implemented.

On my setup I see that the interface for which addr_gen_mode is set stable-privacy does not have an nmcli connection:
[core@master-1 enp2s0]$ nmcli connection show enp2s0
Error: enp2s0 - no such connection profile.

While the interface that its set to eui64 does have a profile but the value in that profile for addr_gen_mode is different than what is actually set:
$ nmcli connection show enp1s0 | grep gen
ipv6.addr-gen-mode:                     stable-privacy

Another difference is during init, NetworkManager is using the WiredConnection policy for enp2s0, so its possible its picking up the stable-privacy there:
ug 04 15:24:47 localhost NetworkManager[1297]: <info>  [1628090687.3835] policy: set 'Wired Connection' (enp2s0) as default for IPv4 routing and DNS
Aug 04 15:24:47 master-1 configure-ovs.sh[1359]: Wired Connection  17a36ef8-3dc3-40bf-bb4b-ca82e5c7a9e5  ethernet  enp2s0
Aug 04 15:24:52 master-1 configure-ovs.sh[1359]: Wired Connection  17a36ef8-3dc3-40bf-bb4b-ca82e5c7a9e5  ethernet  enp2s0

That policy isn't used for enp1s0:
Aug 04 15:29:48 master-1 NetworkManager[1297]: <warn>  [1628090988.7331] device (enp1s0): Activation: failed for connection 'Wired Connection'


I'd like to compare this to the NetworkManager settings in Davis' setup so we can figure out the best way to set addr_gen_mode to eui64 for all interfaces.

Comment 3 Bob Fournier 2021-08-10 02:07:57 UTC
Also the only settings for NM use eui64

$ sudo ls /etc/NetworkManager/system-connections
default_connection.nmconnection

$ sudo cat /etc/NetworkManager/system-connections/default_connection.nmconnection 
...
[ipv6]
addr-gen-mode=eui64
dhcp-timeout=90
dns-search=
method=auto

Comment 4 Bob Fournier 2021-08-18 14:44:02 UTC
It appears that this configuration is being done by configure-ovs.sh on the baremetal node. Moving the component to the team which has more knowledge of this script.

We can see the script has run on nodes:

[core@master-2 ~] $ sudo journalctl -l | grep configure-ovs
...
Aug 12 14:45:12 master-2 configure-ovs.sh[1558]: Wired Connection  a25499df-2308-4a60-bc87-6725991b22e8  ethernet  enp1s0
Aug 12 14:45:12 master-2 configure-ovs.sh[1558]: Wired Connection  a25499df-2308-4a60-bc87-6725991b22e8  ethernet  enp2s0
Aug 12 14:45:12 master-2 configure-ovs.sh[1558]: Wired Connection  a3bb9cbf-d6d8-41d5-b5a5-b1dbd453822e  ethernet  --

Comment 5 Ben Nemec 2021-09-15 17:14:18 UTC
Is this using OVN-Kubernetes or OpenShiftSDN? configure-ovs is essentially a noop on the latter and the log lines mentioned above are output regardless of whether the script does anything, so I want to make sure we're looking in the right place.

Comment 6 Bob Fournier 2021-09-15 17:51:39 UTC
Ben - I saw that script output on just a standard dev-scripts run with IPv4, I believe that its using OVN-Kubernetes. 

I'd be interested in what Davis was using with the original problem in his setup, so changing needinfo.

Comment 7 davis phillips 2021-09-15 19:35:34 UTC
Hey guys - Yes, IPv6 requires OVN-Kubernetes. In this case it was an environment with SLAAC and DHCPv6 enabled. Unfortunately, I dont recall if the br-ex was created with the SLAAC address.

Comment 8 Ben Nemec 2021-09-16 15:22:37 UTC
Okay, in that case I think we just need to persist the mode when creating the bridge. We're already doing some similar things with the DHCP parameters.

I've pushed a patch that should get attached to this bz as soon as I make the bot happy. It would be good if you can test that before it merges and ensure it fixes your problem.

Comment 9 Ben Nemec 2021-09-22 20:03:36 UTC
Whoops, I changed the subcomponent and it reset the assignees. Changing them back.

Comment 15 errata-xmlrpc 2022-03-12 04:37:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.