Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2219641

Summary:	metalsmith does not recognize subnet field under network in overcloud-baremetal-deploy.yaml as per documentation
Product:	Red Hat OpenStack	Reporter:	Jaison Raju <jraju>
Component:	python-metalsmith	Assignee:	Harald Jensås <hjensas>
Status:	CLOSED ERRATA	QA Contact:	James E. LaBarre <jlabarre>
Severity:	high	Docs Contact:
Priority:	high
Version:	17.1 (Wallaby)	CC:	eshames, hjensas, jraju, mariel, sbaker
Target Milestone:	z2	Keywords:	Triaged
Target Release:	17.1
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	python-metalsmith-1.4.4-17.1.20230815101022.5e7461e.el9ost	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-01-16 14:32:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jaison Raju 2023-07-04 17:06:45 UTC

Description of problem:
The example in https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/director_installation_and_usage/assembly_provisioning-and-deploying-your-overcloud#proc_provisioning-bare-metal-nodes-for-the-overcloud_ironic_provisioning section 5 for overcloud-baremetal-deploy.yaml with subnet information would fail in current 17.1 (python3-metalsmith-1.4.4-1.20230517141000.5e7461e.el9ost.noarch.rpm)

Our downstream package is missing this patch:
commit 264836d59ac741424c3fad4d47e51073722c848f
Author: Harald Jensås <hjensas>
Date:   Thu Dec 9 15:20:29 2021 +0100

    Allow both 'network' and 'subnet' in NIC


Version-Release number of selected component (if applicable):
17.1
python3-metalsmith-1.4.4-1.20230517141000.5e7461e.el9ost.noarch.rpm

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
The following command fails.
(undercloud) [stack@undercloud ~]$ openstack overcloud node provision --stack central --network-config -o /home/stack/templates/central/deployed_metal.yaml /home/stack/templates/central/overcloud-baremetal-deploy.yaml
The error seen (formatted in a more readable form) is:
Deploy attempt failed on node f18-h21-000-r640.tng.rdu2.scalelab.redhat.com (UUID 2fcd3a61-e212-48fc-9c4c-91b832a3ca9e), cleaning up
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 393, in provision_node
    nics.validate()
  File "/usr/lib/python3.9/site-packages/metalsmith/_nics.py", line 60, in validate
    result.append(('network', self._get_network(nic)))
  File "/usr/lib/python3.9/site-packages/metalsmith/_nics.py", line 136, in _get_network
    raise exceptions.InvalidNIC(
metalsmith.exceptions.InvalidNIC: Unexpected fields for a network: subnet

Expected results:


Additional info:
We are testing edge deployments with l3 routes on 17.1. This is a blocker for us.
I will try to see if I can manually patch just this fix.

Comment 1 Harald Jensås 2023-07-06 18:00:19 UTC

Hey, we have a job doing this in downstream CI:

- name: Compute1
  count: 2
  hostname_format: 'c1-compute-%index%'
  defaults:
    profile: compute
    network_config:
      template: /home/stack/virt/network/three-nics-vlans/compute.j2
    networks:
    - network: ctlplane
      vif: true
    - network: internal_api
      subnet: internal_api1_subnet
    - network: storage
      subnet: storage1_subnet
    - network: tenant
      subnet: tenant1_subnet
  instances:
  - name: compute-0
    hostname: compute-0
  - name: compute-1
    hostname: compute-1

- Which example are you using? Is it the specific nodes example?

Can you attach your - home/stack/templates/central/overcloud-baremetal-deploy.yaml file please?

Comment 2 Harald Jensås 2023-07-06 18:17:30 UTC

So this is happening on the "ctlplane" network?
 In that case, the workaround is to simply not specify the subnet. 
 The correct subnet will be used automatically based on the physical_network bridge mappings in neutron.
 The physical network property on the baremetal ports must be set, but this happens automatically when you introspect the nodes with OSP 17.x director.

Comment 3 Jaison Raju 2023-07-18 02:05:33 UTC

Introspection succeeds. I got this example from the document.
I am not sure if it is happening for ctlplane. Here is the file I used. Since I am doing this on baremetal, I made a few changes, but this file should be similar:

 cat templates/central/overcloud-baremetal-deploy.yaml-bak                                                                                                                                                                                                                   [10/1558]
- name: Controller0                                                                                                                                            
  count: 3       
  defaults:             
    resource_class: baremetal.control
    profile: control
    network_config:    
      default_route_network:
      - External    
      template: /home/stack/templates/central/network/leaf0/controller0.j2
    networks:      
    - network: ctlplane                  
      subnet: leaf0
      vif: true
    - network: storage
      subnet: storage_leaf0_subnet
    - network: storage_mgmt
      subnet: storage_mgmt_leaf0_subnet
    - network: internal_api
      subnet: internalapi_leaf0_subnet
    - network: tenant
      subnet: tenant_leaf0_subnet
    - network: external
      subnet: external_leaf0_subnet
  ansible_playbooks:
    - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml
      extra_vars:
        growvols_args: >
          /=10GB
          /tmp=1GB
          /var/log=10GB
          /var/log/audit=1GB
          /home=10GB
          /srv=10GB
          /var=100%
- name: ComputeHCI-r640
  count: 4
  defaults:
    resource_class: baremetal.computel0
    profile: compute
    network_config:
      template: /home/stack/templates/central/network/leaf0/computehci-r640.j2
    networks:
    - network: ctlplane
      subnet: leaf0
      vif: true
    - network: storage
      subnet: storage_leaf0_subnet
    - network: internal_api
      subnet: internalapi_leaf0_subnet
    - network: tenant
      subnet: tenant_leaf0_subnet
  ansible_playbooks:
    - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml
      extra_vars:
        growvols_args: >
          /=10GB
          /tmp=1GB
          /var/log=10GB
          /var/log/audit=1GB
          /home=10GB
          /srv=10GB
          /var=100%
I have set physical network property as per the documentation. I have tried your recommendation and it helps me proceed, but pxeboot fails to find mac under pxelinux.cfg/<mac> . Is removing the subnet the solution for this dcn deployment? Should we remove this from documentation example?

The error I am facing now is probably not because of removing the subnet as the ports created for the nodes have physical-network 'ctlplane'.
Any idea what could be the issue?
http://perf1.lab.bos.redhat.com/jaison/edge-l3/osp-edge-backup2.tar.xz

Comment 4 Jaison Raju 2023-07-18 08:00:27 UTC

the pxe boot issue was resolved too.

Comment 10 James E. LaBarre 2023-12-04 22:42:51 UTC

Confirmed edits are in place in python3-metalsmith-1.4.4-17.1.20230815101022.5e7461e.el9ost.noarch.rpm from latest compose RHOS-17.1-RHEL-9-20231122.n.1

This compose ran phases 1, 2 & 3 with no errors in the package.

Comment 19 errata-xmlrpc 2024-01-16 14:32:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0209