Bug 2069309

Summary: Overcloud deployment failures due to failure in overcloud networking
Product: Red Hat OpenStack Reporter: Ketan Mehta <kmehta>
Component: tripleo-ansibleAssignee: Terry Wilson <twilson>
Status: CLOSED NOTABUG QA Contact: Joe H. Rahme <jhakimra>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 17.0 (Wallaby)CC: bshephar, cjeanner, egarciar, jkreger, lmartins, sandyada, schari
Target Milestone: gaKeywords: Triaged
Target Release: 17.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-10 08:33:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1554546    
Bug Blocks:    

Description Ketan Mehta 2022-03-28 17:11:34 UTC
Description of problem:

With the latest image of RHEL 9 Beta, overcloud deployment is failing due to complete failure in overcloud networking.

The MAC's generated on the VLAN interfaces of overcloud nodes are same across the node, leading to failure of overcloud networking (all networks).

Here's an example, from 2 compute node:

Compute-1:

18: vlan301: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
    inet 192.17.1.83/24 brd 192.17.1.255 scope global vlan301
       valid_lft forever preferred_lft forever
19: vlan302: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 5e:9c:9d:a1:8a:3b brd ff:ff:ff:ff:ff:ff
    inet 192.17.3.28/24 brd 192.17.3.255 scope global vlan302
       valid_lft forever preferred_lft forever
21: vlan304: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether da:8a:82:62:6f:e9 brd ff:ff:ff:ff:ff:ff
    inet 192.17.2.22/24 brd 192.17.2.255 scope global vlan304
       valid_lft forever preferred_lft forever

Compute-2:

11: vlan301: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
    inet 192.17.1.19/24 brd 192.17.1.255 scope global vlan301
       valid_lft forever preferred_lft forever
    inet6 fe80::3c9b:81ff:fe39:883/64 scope link dadfailed tentative 
       valid_lft forever preferred_lft forever
12: vlan304: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether da:8a:82:62:6f:e9 brd ff:ff:ff:ff:ff:ff
    inet 192.17.2.145/24 brd 192.17.2.255 scope global vlan304
       valid_lft forever preferred_lft forever
    inet6 fe80::d88a:82ff:fe62:6fe9/64 scope link dadfailed tentative 
       valid_lft forever preferred_lft forever
13: vlan302: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 5e:9c:9d:a1:8a:3b brd ff:ff:ff:ff:ff:ff
    inet 192.17.3.148/24 brd 192.17.3.255 scope global vlan302
       valid_lft forever preferred_lft forever
    inet6 fe80::5c9c:9dff:fea1:8a3b/64 scope link dadfailed tentative 
       valid_lft forever preferred_lft forever

Let's grep the mac address for internal api vlan (vlan301) [3e:9b:81:39:08:83] across all nodes:

# for i in  `metalsmith -c 'IP Addresses' -f value list | cut -d "=" -f2 ` ; do ssh heat-admin@$i "hostname && ip a |grep 3e:9b:81:39:08:83"  ; done                               
compute-2
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
compute-3
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
compute-1
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
compute-0
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-2
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-4
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-0
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-1
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-3
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
cephstorage-0
cephstorage-2
cephstorage-1
controller-0
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
controller-1
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
controller-2
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff

I suppose this might be coming from the NetworkManager at RHEL layer, but it would be worth your review first and then if deemed legitimate can be moved to RHEL.

Version-Release number of selected component (if applicable):

# rpm -qa |grep -i Network

NetworkManager-libnm-1.36.0-0.10.el9.x86_64
glib-networking-2.68.3-1.el9.x86_64
NetworkManager-1.36.0-0.10.el9.x86_64
NetworkManager-team-1.36.0-0.10.el9.x86_64
dracut-network-055-30.git20220216.el9.x86_64
NetworkManager-tui-1.36.0-0.10.el9.x86_64
containernetworking-plugins-1.0.1-3.el9.x86_64
openstack-network-scripts-10.11-1.el9osttrunk.x86_64
network-scripts-openvswitch2.15-2.15.0-45.el9fdp.x86_64
rhosp-network-scripts-openvswitch-2.15-6.el9osttrunk.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

MAC's are same.

Expected results:

They should be different for nodes to communicate.

Additional info:

Comment 2 Brendan Shephard 2022-03-29 06:10:44 UTC
Yeah, this is a problem with virt-customize I believe. We have seen this upstream as well and worked around it in CI:
https://review.opendev.org/c/openstack/tripleo-ci/+/835536/1#message-220a51b780f14ebe9373e3848c0b732ce503e656 

I'll add dfg:hardprov for comment, but it's probably a virt-customize issues afaik.

Comment 5 schari 2022-05-17 06:12:11 UTC
*** Bug 2081591 has been marked as a duplicate of this bug. ***

Comment 10 Red Hat Bugzilla 2023-09-15 01:53:22 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days