Description of problem: In an environment where SLAAC is enabled the default ironic addr-gen-mode is stable-privacy. This causes inconsistencies in agent_url registration if the SLAAC address comes online before the DHCPv6 or static address. After cleaning and reboot, the address will change and the agent_url will be pointing towards a non-existent endpoint. After OCP installation the following addr-gen-mode settings are applied: # grep -rni eui64 system-connections/default_connection.nmconnection:19:addr-gen-mode=eui64 systemConnectionsMerged/default_connection.nmconnection:19:addr-gen-mode=eui64 Example baremetal node list: [root@localhost /]# baremetal node list +--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+ | 3d05c231-1b27-4702-8acf-1f53ae7f22e7 | openshift-master-0 | None | power on | clean wait | True | | 79c92ad8-8004-46e5-8be3-1917a1123ff7 | openshift-master-2 | None | power on | clean wait | True | | 9bb1d830-77ac-41b5-9c66-d3cd3dad379b | openshift-master-1 | None | power on | available | False | +--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+ 9bb1d830-77ac-41b5-9c66-d3cd3dad379b 'agent_url': 'https://[fd00:4888:2000:1099::13]:9999' 79c92ad8-8004-46e5-8be3-1917a1123ff7 'agent_url': 'https://[fd00:4888:2000:1099:9af2:b3ff:fe2c:e2e8]:9999' 3d05c231-1b27-4702-8acf-1f53ae7f22e7 'agent_url': 'https://[fd00:4888:2000:1099:9af2:b3ff:fe2c:f2a1]:9999' Version-Release number of selected component (if applicable): Steps to Reproduce: 1. Enable SLAAC and DHCPv6 on network 2. Enable boot to redfish or virtual media image method in install-config. Actual results: Inconsistent nodes registering properly with ironic Expected results: Nodes properly registering regardless of DHCPv6 address or SLAAC address Additional info:
Ironic isn't setting the interfaces to use a particular addr_gen_mode but on the master nodes some interfaces are set to 1 (stable-privacy) and some to 0 (eui64). On master-1 in a dev-scripts setup for example I see all interfaces using eui64 except for enp2s0, e.g: [core@master-1 conf]$ cat ./enp1s0/addr_gen_mode 0 [core@master-1 conf]$ cat ./enp2s0/addr_gen_mode 1 Davis pointed out a similar bug - https://bugzilla.redhat.com/show_bug.cgi?id=1873021. According to that bug - "NetworkManager defaults to addr-gen-mode=stable-privacy (if you create a profile with nmcli/D-Bus/libnm that doesn't set it otherwise). If you want eui64, you need to explicitly set it (if you create a profile with nmcli/D-Bus/libnm)."
Some additional observations focused on how we can make the change to set addr_gen_mode=eui64 for all interfaces. There is an RFE request here - https://bugzilla.redhat.com/show_bug.cgi?id=1743161 to allow the configuration of addr-gen-mode to the global NM config such that it could be set similar to how dchp-duid is done (e.g. https://github.com/openshift/installer/pull/5110), but it appears that RFE will not be implemented. On my setup I see that the interface for which addr_gen_mode is set stable-privacy does not have an nmcli connection: [core@master-1 enp2s0]$ nmcli connection show enp2s0 Error: enp2s0 - no such connection profile. While the interface that its set to eui64 does have a profile but the value in that profile for addr_gen_mode is different than what is actually set: $ nmcli connection show enp1s0 | grep gen ipv6.addr-gen-mode: stable-privacy Another difference is during init, NetworkManager is using the WiredConnection policy for enp2s0, so its possible its picking up the stable-privacy there: ug 04 15:24:47 localhost NetworkManager[1297]: <info> [1628090687.3835] policy: set 'Wired Connection' (enp2s0) as default for IPv4 routing and DNS Aug 04 15:24:47 master-1 configure-ovs.sh[1359]: Wired Connection 17a36ef8-3dc3-40bf-bb4b-ca82e5c7a9e5 ethernet enp2s0 Aug 04 15:24:52 master-1 configure-ovs.sh[1359]: Wired Connection 17a36ef8-3dc3-40bf-bb4b-ca82e5c7a9e5 ethernet enp2s0 That policy isn't used for enp1s0: Aug 04 15:29:48 master-1 NetworkManager[1297]: <warn> [1628090988.7331] device (enp1s0): Activation: failed for connection 'Wired Connection' I'd like to compare this to the NetworkManager settings in Davis' setup so we can figure out the best way to set addr_gen_mode to eui64 for all interfaces.
Also the only settings for NM use eui64 $ sudo ls /etc/NetworkManager/system-connections default_connection.nmconnection $ sudo cat /etc/NetworkManager/system-connections/default_connection.nmconnection ... [ipv6] addr-gen-mode=eui64 dhcp-timeout=90 dns-search= method=auto
It appears that this configuration is being done by configure-ovs.sh on the baremetal node. Moving the component to the team which has more knowledge of this script. We can see the script has run on nodes: [core@master-2 ~] $ sudo journalctl -l | grep configure-ovs ... Aug 12 14:45:12 master-2 configure-ovs.sh[1558]: Wired Connection a25499df-2308-4a60-bc87-6725991b22e8 ethernet enp1s0 Aug 12 14:45:12 master-2 configure-ovs.sh[1558]: Wired Connection a25499df-2308-4a60-bc87-6725991b22e8 ethernet enp2s0 Aug 12 14:45:12 master-2 configure-ovs.sh[1558]: Wired Connection a3bb9cbf-d6d8-41d5-b5a5-b1dbd453822e ethernet --
Is this using OVN-Kubernetes or OpenShiftSDN? configure-ovs is essentially a noop on the latter and the log lines mentioned above are output regardless of whether the script does anything, so I want to make sure we're looking in the right place.
Ben - I saw that script output on just a standard dev-scripts run with IPv4, I believe that its using OVN-Kubernetes. I'd be interested in what Davis was using with the original problem in his setup, so changing needinfo.
Hey guys - Yes, IPv6 requires OVN-Kubernetes. In this case it was an environment with SLAAC and DHCPv6 enabled. Unfortunately, I dont recall if the br-ex was created with the SLAAC address.
Okay, in that case I think we just need to persist the mode when creating the bridge. We're already doing some similar things with the DHCP parameters. I've pushed a patch that should get attached to this bz as soon as I make the bot happy. It would be good if you can test that before it merges and ensure it fixes your problem.
Whoops, I changed the subcomponent and it reset the assignees. Changing them back.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056