Bug 2120156

Summary: Two Ctlplane interfaces are created on overcloud when using some custom NIC template.
Product: Red Hat OpenStack Reporter: yatanaka
Component: openstack-tripleo-commonAssignee: Harald Jensås <hjensas>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.0 (Wallaby)CC: hjensas, igallagh, jschluet, mburns, ramishra, sbaker, slinaber, spower, tkajinam
Target Milestone: gaKeywords: Triaged
Target Release: 17.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-common-15.4.1-0.20220705010409.51f6577.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:24:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yatanaka 2022-08-22 02:10:09 UTC
Description of problem:

When I executed `openstack overcloud node provision` with the following custom NIC template which is quite similar to /usr/share/ansible/roles/tripleo_network_config/templates/bonds_vlans/bonds_vlans.j2:
~~~
(undercloud) [stack@undercloud ~]$ cat /home/stack/templates/overcloud-baremetal-deploy.yaml
- name: Controller
  count: 3
  defaults:
    networks:
      #    - network: ctlplane
      #      vif: true
    - network: external
      subnet: external_subnet
    - network: internal_api
      subnet: internal_api_subnet
    - network: storage
      subnet: storage_subnet
    - network: storage_mgmt
      subnet: storage_mgmt_subnet
    - network: tenant
      subnet: tenant_subnet
    network_config:
      template: /home/stack/templates/two_interfaces.j2
      default_route_network:
      - external
  instances:
  - hostname: overcloud-controller-0
    name: controller0
  - hostname: overcloud-controller-1
    name: controller1 
  - hostname: overcloud-controller-2
    name: controller2
- name: Compute
  count: 2
  defaults:
    networks:
      #    - network: ctlplane
      #      vif: true
    - network: internal_api
      subnet: internal_api_subnet
    - network: tenant
      subnet: tenant_subnet
    - network: storage
      subnet: storage_subnet
    network_config:
      template: /home/stack/templates/two_interfaces.j2 <============(*)
  instances:
  - hostname: overcloud-novacompute-0
    name: compute0
  - hostname: overcloud-novacompute-1
    name: compute1


[stack@undercloud ~]$ cat templates/two_interfaces.j2
---
{% set mtu_list = [ctlplane_mtu] %}
{% for network in role_networks %}
{{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}
{%- endfor %}
{% set min_viable_mtu = mtu_list | max %}
network_config:
- type: interface
  name: nic1
  mtu: {{ ctlplane_mtu }}
  use_dhcp: false
  addresses:
  - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}
  routes: {{ ctlplane_host_routes }}
- type: ovs_bridge
  name: {{ neutron_physical_bridge_name }}
  dns_servers: {{ ctlplane_dns_nameservers }}
  domain: {{ dns_search_domains }}
  members:
  - type: interface
    name: nic2
    mtu: {{ min_viable_mtu }}
    primary: true
{% for network in role_networks %}
  - type: vlan
    mtu: {{ lookup('vars', networks_lower[network] ~ '_mtu') }}
    vlan_id: {{ lookup('vars', networks_lower[network] ~ '_vlan_id') }}
    addresses:
    - ip_netmask: {{ lookup('vars', networks_lower[network] ~ '_ip') }}/{{ lookup('vars', networks_lower[network] ~ '_cidr') }}
    routes: {{ lookup('vars', networks_lower[network] ~ '_host_routes') }}
{% endfor %}


(undercloud) [stack@undercloud ~]$ openstack overcloud node provision --stack overcloud --network-config --output /home/stack/templates/overcloud-baremetal-deployed.yaml /home/stack/templates/overcloud-baremetal-deploy.yaml
~~~


Two ctlplane interfaces were created on overcloud nodes.
~~~
[root@overcloud-controller-0 ~]# cat /etc/os-net-config/config.yaml
---
network_config:
- type: interface
  name: nic1
  mtu: 1500
  use_dhcp: false
  addresses:
  - ip_netmask: 192.168.24.22/24 <===============================(*)
  routes: []
- type: ovs_bridge
  name: br-ex
  dns_servers: ['10.0.0.1']
  domain: []
  members:
  - type: interface
    name: nic2
    mtu: 1500
    primary: true
  - type: vlan
    mtu: 1500
    vlan_id: 50
    addresses:
    - ip_netmask: 172.16.0.127/24
    routes: []
  - type: vlan
    mtu: 1500
    vlan_id: 1
    addresses:
    - ip_netmask: 192.168.24.22/24 <===============================(*)
    routes: []
  - type: vlan
    mtu: 1500
    vlan_id: 40
    addresses:
    - ip_netmask: 172.16.3.150/24
    routes: []
  - type: vlan
    mtu: 1500
    vlan_id: 30
    addresses:
    - ip_netmask: 172.16.1.26/24
    routes: []
  - type: vlan
    mtu: 1500
    vlan_id: 10
    addresses:
    - ip_netmask: 10.0.0.182/24
    routes: [{'default': True, 'nexthop': '10.0.0.1'}]
  - type: vlan
    mtu: 1500
    vlan_id: 20
    addresses:
    - ip_netmask: 172.16.2.186/24
    routes: []

[root@overcloud-controller-0 ~]# ip -o -4 a 
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: enp1s0    inet 192.168.24.22/24 brd 192.168.24.255 scope global enp1s0\       valid_lft forever preferred_lft forever <===============================(*)
14: vlan30    inet 172.16.1.26/24 brd 172.16.1.255 scope global vlan30\       valid_lft forever preferred_lft forever
15: vlan20    inet 172.16.2.186/24 brd 172.16.2.255 scope global vlan20\       valid_lft forever preferred_lft forever
16: vlan50    inet 172.16.0.127/24 brd 172.16.0.255 scope global vlan50\       valid_lft forever preferred_lft forever
17: vlan1    inet 192.168.24.22/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever <===============================(*)
18: vlan40    inet 172.16.3.150/24 brd 172.16.3.255 scope global vlan40\       valid_lft forever preferred_lft forever
19: vlan10    inet 10.0.0.182/24 brd 10.0.0.255 scope global vlan10\       valid_lft forever preferred_lft forever
~~~

The cause is `role_networks` contains `ctlplane`.
That's why ctlplane interface is duplicated.
This issue occurred even if I commented out `ctlplane` from overcloud-baremetal-deploy.yaml.

I think the same issue can occur when using the following examples because these use `role_network`.

  - https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_network_config/templates/single_nic_vlans/single_nic_vlans.j2
  - https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_network_config/templates/single_nic_linux_bridge_vlans/single_nic_linux_bridge_vlans.j2
  - https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_network_config/templates/bonds_vlans/bonds_vlans.j2
  - https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_network_config/templates/2_linux_bonds_vlans/2_linux_bonds_vlans.j2

On the other hand, this issue doesn't occur by the following examples because `networks_all` doesn't contains `ctlplane`.

  - https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_network_config/templates/multiple_nics/multiple_nics.j2
  - https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_network_config/templates/multiple_nics_vlans/multiple_nics_vlans.j2


As a workaround, I changed the custom NIC template as below.
This excludes `ctlplane` from `role_networks`.
~~~
<Changed from>
{% for network in role_networks %}

<Changed to>
{% for network in role_networks if network not in ["ctlplane"] %}
~~~


Version-Release number of selected component (if applicable):
RHOSP 17.0 beta


How reproducible:
Execute `openstack overcloud node provision` using `bonds_vlans.j2`.


Actual results:
Two ctlplane interfaces are created.


Expected results:
Only one ctlplane interface is created.

Comment 17 errata-xmlrpc 2022-09-21 12:24:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543