Bug 2069309 - Overcloud deployment failures due to failure in overcloud networking
Summary: Overcloud deployment failures due to failure in overcloud networking
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ga
: 17.0
Assignee: Terry Wilson
QA Contact: Joe H. Rahme
URL:
Whiteboard:
: 2081591 (view as bug list)
Depends On: 1554546
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-28 17:11 UTC by Ketan Mehta
Modified: 2023-09-15 01:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-10 08:33:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-14345 0 None None None 2022-03-28 17:24:50 UTC

Description Ketan Mehta 2022-03-28 17:11:34 UTC
Description of problem:

With the latest image of RHEL 9 Beta, overcloud deployment is failing due to complete failure in overcloud networking.

The MAC's generated on the VLAN interfaces of overcloud nodes are same across the node, leading to failure of overcloud networking (all networks).

Here's an example, from 2 compute node:

Compute-1:

18: vlan301: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
    inet 192.17.1.83/24 brd 192.17.1.255 scope global vlan301
       valid_lft forever preferred_lft forever
19: vlan302: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 5e:9c:9d:a1:8a:3b brd ff:ff:ff:ff:ff:ff
    inet 192.17.3.28/24 brd 192.17.3.255 scope global vlan302
       valid_lft forever preferred_lft forever
21: vlan304: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether da:8a:82:62:6f:e9 brd ff:ff:ff:ff:ff:ff
    inet 192.17.2.22/24 brd 192.17.2.255 scope global vlan304
       valid_lft forever preferred_lft forever

Compute-2:

11: vlan301: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
    inet 192.17.1.19/24 brd 192.17.1.255 scope global vlan301
       valid_lft forever preferred_lft forever
    inet6 fe80::3c9b:81ff:fe39:883/64 scope link dadfailed tentative 
       valid_lft forever preferred_lft forever
12: vlan304: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether da:8a:82:62:6f:e9 brd ff:ff:ff:ff:ff:ff
    inet 192.17.2.145/24 brd 192.17.2.255 scope global vlan304
       valid_lft forever preferred_lft forever
    inet6 fe80::d88a:82ff:fe62:6fe9/64 scope link dadfailed tentative 
       valid_lft forever preferred_lft forever
13: vlan302: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 5e:9c:9d:a1:8a:3b brd ff:ff:ff:ff:ff:ff
    inet 192.17.3.148/24 brd 192.17.3.255 scope global vlan302
       valid_lft forever preferred_lft forever
    inet6 fe80::5c9c:9dff:fea1:8a3b/64 scope link dadfailed tentative 
       valid_lft forever preferred_lft forever

Let's grep the mac address for internal api vlan (vlan301) [3e:9b:81:39:08:83] across all nodes:

# for i in  `metalsmith -c 'IP Addresses' -f value list | cut -d "=" -f2 ` ; do ssh heat-admin@$i "hostname && ip a |grep 3e:9b:81:39:08:83"  ; done                               
compute-2
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
compute-3
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
compute-1
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
compute-0
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-2
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-4
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-0
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-1
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
computehci-3
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
cephstorage-0
cephstorage-2
cephstorage-1
controller-0
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
controller-1
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff
controller-2
    link/ether 3e:9b:81:39:08:83 brd ff:ff:ff:ff:ff:ff

I suppose this might be coming from the NetworkManager at RHEL layer, but it would be worth your review first and then if deemed legitimate can be moved to RHEL.

Version-Release number of selected component (if applicable):

# rpm -qa |grep -i Network

NetworkManager-libnm-1.36.0-0.10.el9.x86_64
glib-networking-2.68.3-1.el9.x86_64
NetworkManager-1.36.0-0.10.el9.x86_64
NetworkManager-team-1.36.0-0.10.el9.x86_64
dracut-network-055-30.git20220216.el9.x86_64
NetworkManager-tui-1.36.0-0.10.el9.x86_64
containernetworking-plugins-1.0.1-3.el9.x86_64
openstack-network-scripts-10.11-1.el9osttrunk.x86_64
network-scripts-openvswitch2.15-2.15.0-45.el9fdp.x86_64
rhosp-network-scripts-openvswitch-2.15-6.el9osttrunk.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

MAC's are same.

Expected results:

They should be different for nodes to communicate.

Additional info:

Comment 2 Brendan Shephard 2022-03-29 06:10:44 UTC
Yeah, this is a problem with virt-customize I believe. We have seen this upstream as well and worked around it in CI:
https://review.opendev.org/c/openstack/tripleo-ci/+/835536/1#message-220a51b780f14ebe9373e3848c0b732ce503e656 

I'll add dfg:hardprov for comment, but it's probably a virt-customize issues afaik.

Comment 5 schari 2022-05-17 06:12:11 UTC
*** Bug 2081591 has been marked as a duplicate of this bug. ***

Comment 10 Red Hat Bugzilla 2023-09-15 01:53:22 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.