Bug 1902230
| Summary: | server addresses and interface_list inconsistent after mac change | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Maciej Relewicz <mrelewicz> | ||||||
| Component: | openstack-heat | Assignee: | Harald Jensås <hjensas> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | David Rosenfeld <drosenfe> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 16.1 (Train) | CC: | bfournie, dasmith, eglynn, hbrock, hjensas, jhakimra, jschluet, jslagle, kchamart, mburns, ramishra, rurena, sbaker, sbauza, sgordon, smooney, tmurray, vkoul, vromanso | ||||||
| Target Milestone: | Upstream M1 | Keywords: | Reopened, Triaged | ||||||
| Target Release: | 16.1 (Train on RHEL 8.2) | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | openstack-heat-13.1.0-1.20220227033356.48b730a.el8ost | Doc Type: | No Doc Update | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 2059341 (view as bug list) | Environment: | |||||||
| Last Closed: | 2022-12-07 20:24:45 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 2059341 | ||||||||
| Attachments: |
|
||||||||
Could you please supply the following: - the overcloud deploy command used - the enviroment files used for deployment - the generated config-download ansible playbook directory Later on we may also need an sosreport. Are you maybe using IP-from-pool templates? Hi, We are not using ip-from-pool. Lab was redeployed, so i cant send to you config-download directory. Situation occurred several times in various our labs, usually redeployment solved the problem. Lab configuration wasnt changed so I can send templates. Do you need any specific templates? Deployment command: ``` openstack overcloud deploy --timeout 240 --stack overcloud \ --libvirt-type kvm --templates /home/stack/tripleo-heat-templates \ -r /home/stack/tripleo-heat-templates/environments/contrail/roles_data.yaml \ -n /home/stack/tripleo-heat-templates/environments/contrail/network_data.yaml \ -e /home/stack/tripleo-heat-templates/environments/overcloud_containers.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/contrail-docker-registry.yaml \ -e /home/stack/tripleo-heat-templates/environments/docker-ha.yaml \ -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/contrail-plugins.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/contrail-services.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/contrail-net.yaml \ -e /home/stack/tripleo-heat-templates/environments/enable-tls.yaml \ -e /home/stack/tripleo-heat-templates/environments/inject-trust-anchor-hiera.yaml \ -e /home/stack/tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail-tls.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/disable-telemetry.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/deployment-artifacts.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/storage-environment.yaml \ -e /home/stack/tripleo-heat-templates/environments/contrail/environment-extra.yaml \ ``` Maciej Please provide a copy of teh templates being used. below template which was used to create network config.
```
heat_template_version: queens
description: >
Software Config to drive os-net-config to configure multiple interfaces.
parameters:
ControlPlaneIp:
default: '192.168.213.0/24'
description: IP address/subnet on the ctlplane network
type: string
ControlPlaneSubnetCidr: # Override this via parameter_defaults
default: '24'
description: The subnet CIDR of the control plane network.
type: string
ControlPlaneDefaultRoute: # Override this via parameter_defaults
default: '192.168.213.1'
description: The default route of the control plane network.
type: string
ControlPlaneMtu:
default: '1500'
description: MTU of the Control Plane Network
type: number
ControlPlaneNetworkMtu:
default: '1500'
description: MTU of the Control Plane Network (rhosp13 comp)
type: number
ControlPlaneStaticRoutes:
default: []
description: >
Routes for the ctlplane network traffic.
JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
Unless the default is changed, the parameter is automatically resolved
from the subnet host_routes attribute.
type: json
InternalApiIpSubnet:
default: '172.16.0.0/24'
description: IP address/subnet on the InternalApi network
type: string
InternalApiNetworkVlanID:
default: 226
description: Vlan ID for the InternalApi network traffic.
type: number
InternalApiInterfaceDefaultRoute: # Not used by default in this template
default: '172.16.0.1'
description: The default route of the InternalApi network.
type: string
InternalApiMtu:
default: '1500'
description: MTU of the InternalApi Network
type: number
InternalApiNetworkMtu:
default: '1500'
description: MTU of the InternalApi Network
type: number
InternalApiSupernet:
default: ''
description: Supernet on the InternalApi network
type: string
InternalApiInterfaceRoutes:
default: []
description: >
Routes for the internal_api network traffic.
JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
Unless the default is changed, the parameter is automatically resolved
from the subnet host_routes attribute.
type: json
ManagementIpSubnet:
default: '192.168.1.0/24'
description: IP address/subnet on the Management network
type: string
ManagementNetworkVlanID:
default: 225
description: Vlan ID for the Management network traffic.
type: number
ManagementInterfaceDefaultRoute: # Not used by default in this template
default: '192.168.1.1'
description: The default route of the Management network.
type: string
ManagementMtu:
default: '1500'
description: MTU of the Management Network
type: number
ManagementNetworkMtu:
default: '1500'
description: MTU of the Management Network
type: number
ManagementSupernet:
default: ''
description: Supernet on the Management network
type: string
ManagementInterfaceRoutes:
default: []
description: >
Routes for the internal_api network traffic.
JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
Unless the default is changed, the parameter is automatically resolved
from the subnet host_routes attribute.
type: json
StorageIpSubnet:
default: '172.16.1.0/24'
description: IP address/subnet on the Storage network
type: string
StorageNetworkVlanID:
default: 227
description: Vlan ID for the Storage network traffic.
type: number
StorageInterfaceDefaultRoute: # Not used by default in this template
default: '172.16.1.1'
description: The default route of the Storage network.
type: string
StorageMtu:
default: '1500'
description: MTU of the Storage Network
type: number
StorageNetworkMtu:
default: '1500'
description: MTU of the Storage Network
type: number
StorageSupernet:
default: ''
description: Supernet on the Storage network
type: string
StorageInterfaceRoutes:
default: []
description: >
Routes for the internal_api network traffic.
JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
Unless the default is changed, the parameter is automatically resolved
from the subnet host_routes attribute.
type: json
StorageMgmtIpSubnet:
default: '172.16.3.0/24'
description: IP address/subnet on the StorageMgmt network
type: string
StorageMgmtNetworkVlanID:
default: 224
description: Vlan ID for the StorageMgmt network traffic.
type: number
StorageMgmtInterfaceDefaultRoute: # Not used by default in this template
default: '172.16.3.1'
description: The default route of the StorageMgmt network.
type: string
StorageMgmtMtu:
default: '1500'
description: MTU of the StorageMgmt Network
type: number
StorageMgmtNetworkMtu:
default: '1500'
description: MTU of the StorageMgmt Network
type: number
StorageMgmtSupernet:
default: ''
description: Supernet on the StorageMgmt network
type: string
StorageMgmtInterfaceRoutes:
default: []
description: >
Routes for the internal_api network traffic.
JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
Unless the default is changed, the parameter is automatically resolved
from the subnet host_routes attribute.
type: json
TenantIpSubnet:
default: '172.16.81.0/24'
description: IP address/subnet on the Tenant network
type: string
TenantNetworkVlanID:
default: 228
description: Vlan ID for the Tenant network traffic.
type: number
TenantInterfaceDefaultRoute: # Not used by default in this template
default: '172.16.81.1'
description: The default route of the Tenant network.
type: string
TenantMtu:
default: '1500'
description: MTU of the Tenant Network
type: number
TenantNetworkMtu:
default: '1500'
description: MTU of the Tenant Network
type: number
TenantSupernet:
default: ''
description: Supernet on the Tenant network
type: string
TenantInterfaceRoutes:
default: []
description: >
Routes for the internal_api network traffic.
JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
Unless the default is changed, the parameter is automatically resolved
from the subnet host_routes attribute.
type: json
ExternalIpSubnet:
default: '10.87.4.61/25'
description: IP address/subnet on the External network
type: string
ExternalNetworkVlanID:
default: 1008
description: Vlan ID for the External network traffic.
type: number
ExternalInterfaceDefaultRoute: # Not used by default in this template
default: '10.87.4.126'
description: The default route of the External network.
type: string
ExternalMtu:
default: '1500'
description: MTU of the External Network
type: number
ExternalNetworkMtu:
default: '1500'
description: MTU of the External Network
type: number
ExternalSupernet:
default: ''
description: Supernet on the External network
type: string
ExternalInterfaceRoutes:
default: []
description: >
Routes for the internal_api network traffic.
JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}]
Unless the default is changed, the parameter is automatically resolved
from the subnet host_routes attribute.
type: json
DnsServers: # Override this via parameter_defaults
default: ["172.29.131.50","172.29.143.50","172.29.143.60","172.29.139.60"]
description: A list of DNS servers (2 max for some implementations) that will be added to resolv.conf.
type: comma_delimited_list
EC2MetadataIp: # Override this via parameter_defaults
default: '192.168.213.1'
description: The IP address of the EC2 metadata server.
type: string
resources:
OsNetConfigImpl:
type: OS::Heat::SoftwareConfig
properties:
group: script
inputs:
- name: disable_configure_safe_defaults
default: true
config:
str_replace:
template:
get_file: ../../network/scripts/run-os-net-config.sh
params:
$network_config:
network_config:
- addresses:
- ip_netmask:
list_join:
- /
- - get_param: ControlPlaneIp
- get_param: ControlPlaneSubnetCidr
dns_servers:
get_param: DnsServers
mtu:
get_param: ControlPlaneMtu
name: nic1
routes:
- default: true
next_hop:
get_param: ControlPlaneDefaultRoute
type: interface
use_dhcp: false
- addresses:
- ip_netmask:
get_param: InternalApiIpSubnet
device: nic1
mtu:
get_param: InternalApiMtu
type: vlan
vlan_id:
get_param: InternalApiNetworkVlanID
- name: nic2
type: interface
use_dhcp: false
- addresses:
- ip_netmask:
get_param: TenantIpSubnet
mtu:
get_param: TenantMtu
name: nic3
type: interface
outputs:
OS::stack_id:
description: The OsNetConfigImpl resource.
value:
get_resource: OsNetConfigImpl
```
This is likely an issue with the yaql query to try and figure out the ctlplane cidr or ControlPlaneSubnetCidr is set to None for some reason. https://github.com/openstack/tripleo-heat-templates/blob/stable/train/puppet/role.role.j2.yaml#L415-L421 I don't suppose you could provide a tar of the plan from swift that hit this? I dont have plan. It appears from time to time, but i think its not an environment specific. Ok the next time it happens, reopen this bug and please capture the plan. If you could provide all templates used that will help. I'm going to close this for now because we've not seen this in any of our testing. We hit a problem again. Different lab, different role (openstack controller) but the same situation. One node from role got missing netmask. Reported officialy here: Case #02848223, files attached to case. ControlPlaneSubnetCidr does not appear to be specified anywhere in the templates as a parameter so it's inheriting the default, however we have '' and '24' used in different places as defaults. So my assumption is that based on which file gets loaded first, which value is used. '' would result in None while '24' would be correct. It should be noted that we provide '' as the default for all of our files while the environment/contrail/* files have '24' as the defaults. This variable is defined in environments/contrail/contrail-net-single.yaml but I don't believe that file is used. That being said we've actually dropped usage of this parameter in the future so I'm not certain if there's a different way to handle this. Perhaps Harold has some additional views on this. So, if ControlPlaneSubnetCidr/ControlPlaneDefaultRoute has not been passed in parameter_defaults, they are fetched from the server attributes.
#cat contrailanalyticsdatabase-role.yaml
.....
NetworkConfig:
type: OS::TripleO::ContrailAnalyticsDatabase::Net::SoftwareConfig
properties:
ControlPlaneIp: "{{ ctlplane_ip }}"
ControlPlaneSubnetCidr:
if:
- ctlplane_subnet_cidr_set
- {get_param: ControlPlaneSubnetCidr}
- yaql:
expression: str("{0}".format($.data).split("/")[-1])
data: {get_attr: [ContrailAnalyticsDatabase, addresses, ctlplane, 0, subnets, 0, cidr]} << here
ControlPlaneDefaultRoute:
if:
- ctlplane_default_route_set
- {get_param: ControlPlaneDefaultRoute}
- {get_attr: [ContrailAnalyticsDatabase, addresses, ctlplane, 0, subnets, 0, gateway_ip]} << here
.....
What you specify in the nic config templates (ex. contrail-nic-config-Contrail.yaml used for this role) as the default value for ControlPlaneSubnetCidr/ControlPlaneDefaultRoute parameters is irrelevant.
if the attributes come as None after the server has been created (or at least nova tells us that), then you would have the inconsistencies that you notice. I still don't know why that should be the case, may be something to do with the contrail neutron plugin?
The easier way to workaround in this case is to set the ControlPlaneSubnetCidr and ControlPlaneDefaultRoute in contrail-net.yaml (like it's in contrail-single-net.yaml)
ok, we can implement this WA. But please notice that this templates are working correctly with rhosp13, something changed which caused problem. This change is required due to the simplification work that has been done in the spine and leaf parameters. I had another look at this, I want to investigate this a bit further.
There seem to be a race, as the following lookups return None for one server in the role, and the proper value for the other servers in the role.
{get_attr: [ContrailAnalyticsDatabase, addresses, ctlplane, 0, subnets, 0, cidr]}
{get_attr: [ContrailAnalyticsDatabase, addresses, ctlplane, 0, subnets, 0, gateway_ip]}
In the cidr case, the yaql returns "None" as a string.
In the gateway_ip, the result of None is "" (empty string in the template).
We can actually see that Heat fails to fetch the resource attributes here:
sosreport-undercloud-2021-01-19-rowsoys/var/log/containers/heat/heat-engine.log.1:2021-01-19 14:37:06.816 23 WARNING heat.engine.resources.openstack.nova.server [req-a2ff4a7d-03f8-4a13-8a6b-a41a667e5081 - admin - default default] Failed to fetch resource attributes: Port None could not be found.
This logging is from here in the code: https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/openstack/nova/server.py#L1156
So it seems we are passing "None" to the neutron show_port call on L1152.
AFICT, from Nova log's the "server.interface_list()" call at L1127 succeded.
[root@hjensas nova]# egrep -R "2021-01-19 14:3" | grep os-interface
nova-api.log.1:2021-01-19 14:30:25.795 22 INFO nova.api.openstack.requestlog [req-5d93c974-110d-449d-a08e-baedcd197fb8 d9720f950cfb46ae83ccd19991d11659 61c79c312700420b999a25bd0dceee62 - default default] 192.168.213.1 "GET /v2.1/servers/8e0b3798-5ff7-42a4-8fdc-6329bca39080/os-interface" status: 200 len: 300 microversion: 2.79 time: 0.242941
nova-api.log.1:2021-01-19 14:37:03.717 21 INFO nova.api.openstack.requestlog [req-c6050784-e617-4135-bbd8-0bfab0a9a454 d9720f950cfb46ae83ccd19991d11659 61c79c312700420b999a25bd0dceee62 - default default] 192.168.213.1 "GET /v2.1/servers/87c7fd6e-1189-4995-a359-5204031d0b56/os-interface" status: 200 len: 302 microversion: 2.79 time: 0.160901
nova-api.log.1:2021-01-19 14:37:04.187 21 INFO nova.api.openstack.requestlog [req-33f92513-de1c-4549-b432-e4805048d084 d9720f950cfb46ae83ccd19991d11659 61c79c312700420b999a25bd0dceee62 - default default] 192.168.213.1 "GET /v2.1/servers/87c7fd6e-1189-4995-a359-5204031d0b56/os-interface" status: 200 len: 302 microversion: 2.79 time: 0.182955
nova-api.log.1:2021-01-19 14:37:05.776 27 INFO nova.api.openstack.requestlog [req-61dcfe5b-f591-458f-b0f9-8b681e51bd8f d9720f950cfb46ae83ccd19991d11659 61c79c312700420b999a25bd0dceee62 - default default] 192.168.213.1 "GET /v2.1/servers/301a47f5-2619-4f36-8c28-d454546a5ca1/os-interface" status: 200 len: 301 microversion: 2.79 time: 0.192470
nova-api.log.1:2021-01-19 14:37:06.192 24 INFO nova.api.openstack.requestlog [req-62922e60-4e4c-407b-889a-32a81e129d45 d9720f950cfb46ae83ccd19991d11659 61c79c312700420b999a25bd0dceee62 - default default] 192.168.213.1 "GET /v2.1/servers/1aac7f3b-e9e2-4d0b-8c4d-c0054f1a2c84/os-interface" status: 200 len: 301 microversion: 2.79 time: 0.189072
nova-api.log.1:2021-01-19 14:37:06.752 21 INFO nova.api.openstack.requestlog [req-c74bc9ef-f741-46e4-9760-26916bd35304 d9720f950cfb46ae83ccd19991d11659 61c79c312700420b999a25bd0dceee62 - default default] 192.168.213.1 "GET /v2.1/servers/1aac7f3b-e9e2-4d0b-8c4d-c0054f1a2c84/os-interface" status: 200 len: 301 microversion: 2.79 time: 0.146652
nova-api.log.1:2021-01-19 14:37:06.753 26 INFO nova.api.openstack.requestlog [req-b3532a9c-4dbf-4c95-bd26-0205b0d7e086 d9720f950cfb46ae83ccd19991d11659 61c79c312700420b999a25bd0dceee62 - default default] 192.168.213.1 "GET /v2.1/servers/301a47f5-2619-4f36-8c28-d454546a5ca1/os-interface" status: 200 len: 301 microversion: 2.79 time: 0.169607
I think we would have to add some debug logging in Heat to try to figure this out.
@Maciej, since this seems to be an intermittent problem that we are not able to reproduce internally.
Could you please add the debug logging as shown below to the heat running on your undercloud and try to reproduce? (add each line starting with LOG.debug('BZ1902230 ...)
The steps to edit the file is:
(undercloud) [centos@undercloud ~]$ sudo su -
[root@undercloud ~]# podman mount heat_engine
/var/lib/containers/storage/overlay/9a5f2ccd8d7d79eb6ec688fca72bbe837d58d0b89775e6af5f6e8e8fc797228d/merged
[root@undercloud ~]# vim /var/lib/containers/storage/overlay/9a5f2ccd8d7d79eb6ec688fca72bbe837d58d0b89775e6af5f6e8e8fc797228d/merged/usr/lib/python3.6/site-packages/heat/engine/resources/openstack/nova/server.py
*** make the changes and save the file ***
[root@undercloud ~]# podman umount heat_engine
[root@undercloud ~]# systemctl restart tripleo_heat_engine
NOTE: The long random string is uniq to your deployment so you can't simply copy and paste the commands above.
def _add_attrs_for_address(self, server, extend_networks=True):
"""Adds port id, subnets and network attributes to addresses list.
This method is used only for resolving attributes.
:param server: The server resource
:param extend_networks: When False the network is not extended, i.e
the net is returned without replacing name on
id.
"""
LOG.debug('BZ1902230 - type server: %s', type(server))
nets = copy.deepcopy(server.addresses) or {}
LOG.debug('BZ1902230 - nets: %s', nets)
ifaces = server.interface_list()
LOG.debug('BZ1902230 - ifaces: %s', ifaces)
ip_mac_mapping_on_port_id = dict(((iface.fixed_ips[0]['ip_address'],
iface.mac_addr), iface.port_id)
for iface in ifaces)
LOG.debug('BZ1902230 - ip_mac_mapping_on_port_id: %s', ip_mac_mapping_on_port_id)
for net_name in nets:
for addr in nets[net_name]:
addr['port'] = ip_mac_mapping_on_port_id.get(
(addr['addr'], addr['OS-EXT-IPS-MAC:mac_addr']))
# _get_live_networks() uses this method to get reality_nets.
# We don't need to get subnets and network in that case. Only
# do the external calls if extend_networks is true, i.e called
# from _resolve_attribute()
if not extend_networks:
continue
try:
port = self.client('neutron').show_port(
addr['port'])['port']
except Exception as ex:
addr['subnets'], addr['network'] = None, None
LOG.warning("Failed to fetch resource attributes: %s", ex)
continue
addr['subnets'] = self._get_subnets_attr(port['fixed_ips'])
addr['network'] = self._get_network_attr(port['network_id'])
if extend_networks:
return self._extend_networks(nets)
else:
return nets
Hi, Environment was redeployed with WA. Currently problem doesnt exist. When we hit it agan I wll back to you. (In reply to Maciej Relewicz from comment #21) > Hi, > > Environment was redeployed with WA. Currently problem doesnt exist. When we > hit it agan I wll back to you. Ok, I will do some internal testing to see if I can re-produce the issue. I was not able to reproduce this issue by using a Heat stack, I did however succeed creating a reproducer for this issue using python and just a snippet of Heat code. There is a race in the Nova+Ironic+Neutron as it only happens for a few instances when creating ~10 instances at the same time, less than 20% occurrence on my test system. (I will attach the Heat based reproduces as well as the python reproducer, in theory the Heat template reproducer should trigger it as well but it did'nt on my test environment.)
The problem is that there is a MAC address mismatch in what is returned in the nova "server.addresses" call at L1136 [1] and the the server.interface_list() call at L1137 [2].
Heat builds a dict[3] "ip_mac_mapping_on_port_id" with a tuple as keys from the result of the server.interface_list() call, the dict keys are: (ip_address, mac_address). The dict is used to resolve the neutron port id. Then the IP address and mac address from the "server.addresses" call at L1136 [1] is used to do a lookup from the "ip_mac_mapping_on_port_id" dict at L1143-L1144 [4]. This lookup fails because the mac address (OS-EXT-IPS-MAC:mac_addr) from "server.addresses" does not always match the mac address on the neutron port.
This issue is likely to only happen when Ironic is used, since Ironic does a MAC address update on the neutron port setting the neutron ports MAC address to match the physical network interface.
I.e :
1. Nova creates neutron port
2. Neutron creates a port and auto-generates a MAC address
3. Port info is passed from Nova to Ironic
4. Ironic changes the MAC address of the neutron port to match physical hardware
5. In some cases, Nova still returns the "original" auto-generated MAC address in "server.addresses"
Output of the reproducer
------------------------
(undercloud) [centos@undercloud reproducer]$ python3 reproducer.py
2021-02-08 09:26:29.246874 :: Started building Server: test-server-0
2021-02-08 09:26:30.305732 :: Started building Server: test-server-4
2021-02-08 09:26:30.346258 :: Started building Server: test-server-3
2021-02-08 09:26:29.214739 :: Started building Server: test-server-1
2021-02-08 09:26:32.525734 :: Started building Server: test-server-2
2021-02-08 09:26:31.730258 :: Started building Server: test-server-5
2021-02-08 09:26:34.174303 :: Started building Server: test-server-6
2021-02-08 09:26:35.317740 :: Started building Server: test-server-8
2021-02-08 09:26:37.816171 :: Started building Server: test-server-7
2021-02-08 09:26:38.352557 :: Started building Server: test-server-9
2021-02-08 09:33:36.471098 :: Server: test-server-1 :: >>>> OK <<<<
addr['port'] == 43abfccc-3908-4aee-9d16-caf7216715c7
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.29', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:81:f9:dc', 'port': '43abfccc-3908-4aee-9d16-caf7216715c7'}]}
ifaces == [<NetworkInterface: 43abfccc-3908-4aee-9d16-caf7216715c7>]
ip_mac_mapping_on_port_id == {('192.168.24.29', 'fa:16:3e:81:f9:dc'): '43abfccc-3908-4aee-9d16-caf7216715c7'}
2021-02-08 09:33:56.018447 :: Server: test-server-5 :: >>>> REPRODUCED <<<<
!!!! addr['port'] == None
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.11', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:75:06:86', 'port': None}]}
ifaces == [<NetworkInterface: edd51ad8-bf58-456c-b134-773e41dcb64f>]
ip_mac_mapping_on_port_id == {('192.168.24.11', 'fa:16:3e:e3:83:ba'): 'edd51ad8-bf58-456c-b134-773e41dcb64f'}
2021-02-08 09:34:08.696165 :: Server: test-server-4 :: >>>> REPRODUCED <<<<
!!!! addr['port'] == None
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.13', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:3d:88:72', 'port': None}]}
ifaces == [<NetworkInterface: 24dcbefb-6d73-4fb9-b553-12f4fd44e98e>]
ip_mac_mapping_on_port_id == {('192.168.24.13', 'fa:16:3e:d9:8c:f0'): '24dcbefb-6d73-4fb9-b553-12f4fd44e98e'}
2021-02-08 09:34:10.301490 :: Server: test-server-3 :: >>>> OK <<<<
addr['port'] == 13ea58c4-d149-4ea0-b1a4-36e96b378002
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.14', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:0f:25:91', 'port': '13ea58c4-d149-4ea0-b1a4-36e96b378002'}]}
ifaces == [<NetworkInterface: 13ea58c4-d149-4ea0-b1a4-36e96b378002>]
ip_mac_mapping_on_port_id == {('192.168.24.14', 'fa:16:3e:0f:25:91'): '13ea58c4-d149-4ea0-b1a4-36e96b378002'}
2021-02-08 09:34:34.959875 :: Server: test-server-9 :: >>>> OK <<<<
addr['port'] == 9c2703be-b24a-4359-b690-62367427c665
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.24', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:73:eb:0d', 'port': '9c2703be-b24a-4359-b690-62367427c665'}]}
ifaces == [<NetworkInterface: 9c2703be-b24a-4359-b690-62367427c665>]
ip_mac_mapping_on_port_id == {('192.168.24.24', 'fa:16:3e:73:eb:0d'): '9c2703be-b24a-4359-b690-62367427c665'}
2021-02-08 09:34:36.400174 :: Server: test-server-0 :: >>>> OK <<<<
addr['port'] == a42e342e-9b12-48ed-8917-b0c6e5931334
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.21', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:71:9e:25', 'port': 'a42e342e-9b12-48ed-8917-b0c6e5931334'}]}
ifaces == [<NetworkInterface: a42e342e-9b12-48ed-8917-b0c6e5931334>]
ip_mac_mapping_on_port_id == {('192.168.24.21', 'fa:16:3e:71:9e:25'): 'a42e342e-9b12-48ed-8917-b0c6e5931334'}
2021-02-08 09:34:39.679962 :: Server: test-server-2 :: >>>> OK <<<<
addr['port'] == 55812703-9d06-4e34-8e2b-ed026c720217
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.20', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:c9:b4:f9', 'port': '55812703-9d06-4e34-8e2b-ed026c720217'}]}
ifaces == [<NetworkInterface: 55812703-9d06-4e34-8e2b-ed026c720217>]
ip_mac_mapping_on_port_id == {('192.168.24.20', 'fa:16:3e:c9:b4:f9'): '55812703-9d06-4e34-8e2b-ed026c720217'}
2021-02-08 09:34:46.754812 :: Server: test-server-6 :: >>>> OK <<<<
addr['port'] == ed1281e8-3af6-4661-abd9-819e602eceb4
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.17', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:d7:39:e8', 'port': 'ed1281e8-3af6-4661-abd9-819e602eceb4'}]}
ifaces == [<NetworkInterface: ed1281e8-3af6-4661-abd9-819e602eceb4>]
ip_mac_mapping_on_port_id == {('192.168.24.17', 'fa:16:3e:d7:39:e8'): 'ed1281e8-3af6-4661-abd9-819e602eceb4'}
2021-02-08 09:34:47.008518 :: Server: test-server-8 :: >>>> OK <<<<
addr['port'] == 79528a4b-1721-4725-a480-a04c41afc48e
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.15', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:2a:ed:ff', 'port': '79528a4b-1721-4725-a480-a04c41afc48e'}]}
ifaces == [<NetworkInterface: 79528a4b-1721-4725-a480-a04c41afc48e>]
ip_mac_mapping_on_port_id == {('192.168.24.15', 'fa:16:3e:2a:ed:ff'): '79528a4b-1721-4725-a480-a04c41afc48e'}
2021-02-08 09:34:47.108197 :: Server: test-server-7 :: >>>> OK <<<<
addr['port'] == 99efe0ee-4321-40c9-953f-f898b67ea5bc
nets == {'ctlplane': [{'version': 4, 'addr': '192.168.24.30', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:1c:50:42', 'port': '99efe0ee-4321-40c9-953f-f898b67ea5bc'}]}
ifaces == [<NetworkInterface: 99efe0ee-4321-40c9-953f-f898b67ea5bc>]
ip_mac_mapping_on_port_id == {('192.168.24.30', 'fa:16:3e:1c:50:42'): '99efe0ee-4321-40c9-953f-f898b67ea5bc'}
[1] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/openstack/nova/server.py#L1136
[2] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/openstack/nova/server.py#L1137
[3] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/openstack/nova/server.py#L1138-L1140
[4] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/openstack/nova/server.py#L1143-L1144
Created attachment 1755673 [details]
Python reproducer script
Created attachment 1755675 [details]
Heat reproducer template
NOTE: I was'nt able to reproduce whit this. But it should reproduce the issue, it seems on my lab the timeings never seem to add up to trigger the race.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8795 |
Description of problem: During overcloud deployment configuration for os-net-config is generated incorrectly. The problem usually occurs for one node from the role, for the rest the problem does not occur. Example below, two nodes overcloudlek-cadb-0 and overcloudlek-cadb-1, the same role: ContrailAnalyticsDatabase, the same network configuration in yaml, differences are in os-net-config. On overcloudlek-cadb-0 is broken configuration. Configuration was deliver by undercloud broken. on nodes: [root@overcloudlek-cadb-0 ~]# cat /etc/os-net-config/config.json {"network_config": [{"addresses": [{"ip_netmask": "192.168.213.62/None"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": ""}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "172.16.0.150/24"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "172.16.81.186/24"}], "mtu": 1500, "name": "nic3", "type": "interface"}]} [root@overcloudlek-cadb-1 ~]# cat /etc/os-net-config/config.json {"network_config": [{"addresses": [{"ip_netmask": "192.168.213.184/24"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": "192.168.213.1"}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "172.16.0.132/24"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "172.16.81.156/24"}], "mtu": 1500, "name": "nic3", "type": "interface"}]} on undercloud: (undercloud) [stack@undercloud ansible]$ diff /var/lib/mistral/overcloud/ContrailAnalyticsDatabase/overcloudlek-cadb-0/NetworkConfig /var/lib/mistral/overcloud/ContrailAnalyticsDatabase/overcloudlek-cadb-1/NetworkConfig 11c11 < # {"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/None"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": ""}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ internal_api_ip ~ '/' ~ internal_api_cidr }}"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ tenant_ip ~ '/' ~ tenant_cidr }}"}], "mtu": 1500, "name": "nic3", "type": "interface"}]} : the json serialized os-net-config config to apply --- > # {"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/24"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": "192.168.213.1"}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ internal_api_ip ~ '/' ~ internal_api_cidr }}"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ tenant_ip ~ '/' ~ tenant_cidr }}"}], "mtu": 1500, "name": "nic3", "type": "interface"}]} : the json serialized os-net-config config to apply 67c67 < if [ -n '{"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/None"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": ""}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ internal_api_ip ~ '/' ~ internal_api_cidr }}"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ tenant_ip ~ '/' ~ tenant_cidr }}"}], "mtu": 1500, "name": "nic3", "type": "interface"}]}' ]; then --- > if [ -n '{"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/24"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": "192.168.213.1"}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ internal_api_ip ~ '/' ~ internal_api_cidr }}"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ tenant_ip ~ '/' ~ tenant_cidr }}"}], "mtu": 1500, "name": "nic3", "type": "interface"}]}' ]; then 80c80 < echo '{"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/None"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": ""}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ internal_api_ip ~ '/' ~ internal_api_cidr }}"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ tenant_ip ~ '/' ~ tenant_cidr }}"}], "mtu": 1500, "name": "nic3", "type": "interface"}]}' > /etc/os-net-config/config.json --- > echo '{"network_config": [{"addresses": [{"ip_netmask": "{{ ctlplane_ip }}/24"}], "dns_servers": ["10.10.11.50", "10.10.12.50", "10.10.11.60", "10.10.12.60"], "mtu": 1500, "name": "nic1", "routes": [{"default": true, "next_hop": "192.168.213.1"}], "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ internal_api_ip ~ '/' ~ internal_api_cidr }}"}], "device": "nic1", "mtu": 1500, "type": "vlan", "vlan_id": 226}, {"name": "nic2", "type": "interface", "use_dhcp": false}, {"addresses": [{"ip_netmask": "{{ tenant_ip ~ '/' ~ tenant_cidr }}"}], "mtu": 1500, "name": "nic3", "type": "interface"}]}' > /etc/os-net-config/config.json Version-Release number of selected component (if applicable): rhosp16.1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: os-net-config generated correctly Additional info: