Description of problem: With the latest image of RHEL 9 Beta, overcloud deployment is failing due to complete failure in overcloud networking. The MAC's generated on the VLAN interfaces of overcloud nodes are same across the node, leading to failure of overcloud networking (all networks). Here's an example, from 2 compute node: Compute-1: 18: vlan301: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff inet 192.17.1.83/24 brd 192.17.1.255 scope global vlan301 valid_lft forever preferred_lft forever 19: vlan302: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 5e:9c:9d:a1:8a:3b brd ff:ff:ff:ff:ff:ff inet 192.17.3.28/24 brd 192.17.3.255 scope global vlan302 valid_lft forever preferred_lft forever 21: vlan304: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether da:8a:82:62:6f:e9 brd ff:ff:ff:ff:ff:ff inet 192.17.2.22/24 brd 192.17.2.255 scope global vlan304 valid_lft forever preferred_lft forever Compute-2: 11: vlan301: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff inet 192.17.1.19/24 brd 192.17.1.255 scope global vlan301 valid_lft forever preferred_lft forever inet6 fe80::3c9b:81ff:fe39:883/64 scope link dadfailed tentative valid_lft forever preferred_lft forever 12: vlan304: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether da:8a:82:62:6f:e9 brd ff:ff:ff:ff:ff:ff inet 192.17.2.145/24 brd 192.17.2.255 scope global vlan304 valid_lft forever preferred_lft forever inet6 fe80::d88a:82ff:fe62:6fe9/64 scope link dadfailed tentative valid_lft forever preferred_lft forever 13: vlan302: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 5e:9c:9d:a1:8a:3b brd ff:ff:ff:ff:ff:ff inet 192.17.3.148/24 brd 192.17.3.255 scope global vlan302 valid_lft forever preferred_lft forever inet6 fe80::5c9c:9dff:fea1:8a3b/64 scope link dadfailed tentative valid_lft forever preferred_lft forever Let's grep the mac address for internal api vlan (vlan301) [3e:9b:81:39:08:83] across all nodes: # for i in `metalsmith -c 'IP Addresses' -f value list | cut -d "=" -f2 ` ; do ssh heat-admin@$i "hostname && ip a |grep 3e:9b:81:39:08:83" ; done compute-2 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff compute-3 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff compute-1 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff compute-0 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff computehci-2 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff computehci-4 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff computehci-0 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff computehci-1 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff computehci-3 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff cephstorage-0 cephstorage-2 cephstorage-1 controller-0 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff controller-1 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff controller-2 link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff I suppose this might be coming from the NetworkManager at RHEL layer, but it would be worth your review first and then if deemed legitimate can be moved to RHEL. Version-Release number of selected component (if applicable): # rpm -qa |grep -i Network NetworkManager-libnm-1.36.0-0.10.el9.x86_64 glib-networking-2.68.3-1.el9.x86_64 NetworkManager-1.36.0-0.10.el9.x86_64 NetworkManager-team-1.36.0-0.10.el9.x86_64 dracut-network-055-30.git20220216.el9.x86_64 NetworkManager-tui-1.36.0-0.10.el9.x86_64 containernetworking-plugins-1.0.1-3.el9.x86_64 openstack-network-scripts-10.11-1.el9osttrunk.x86_64 network-scripts-openvswitch2.15-2.15.0-45.el9fdp.x86_64 rhosp-network-scripts-openvswitch-2.15-6.el9osttrunk.noarch How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: MAC's are same. Expected results: They should be different for nodes to communicate. Additional info:
Yeah, this is a problem with virt-customize I believe. We have seen this upstream as well and worked around it in CI: https://review.opendev.org/c/openstack/tripleo-ci/+/835536/1#message-220a51b780f14ebe9373e3848c0b732ce503e656 I'll add dfg:hardprov for comment, but it's probably a virt-customize issues afaik.
*** Bug 2081591 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days