Description of problem: TripleO Heat Templates can define VLANs per NIC for roles (controller, compute etc.) for isolated networks. The problem is that the network switches that the NICs are attached to may not have been set up properly for these VLANs. As of OSP-11, LLDP data for baremetal nodes is captured during the Ironic inspection process which may have VLAN info for the attached switch ports. This VLAN info can be checked during a pre-deployment validation to ensure that the VLANs configured in THT nic config files are also configured on the switch. The NIC alias to real NIC name conversion must be done during this validation similar to what os-net-config does in order to map the NICs configured in nic config files to actual NIC names in the introspected data. Since roles can use different VLANs, e.g. the controller may use additional networks than compute so would use additional VLANs, the challenge is to map the roles to Ironic nodes in the pre-deployment phase. This mapping may not be available in this pre-deployment validation phase. It may be necessary to only check that ALL configured VLANs per switch port are available on the switches. In other words, if a role in THT has eth0 with VLANs 10, 11, and 12, all Ironic nodes must have LLDP data indicating that the switch port attached to eth0 has VLANs 10, 11, 12. If it is possible to map the roles to Ironic nodes in this phase then it will be possible to check, for example, that all controller nodes have a switch port mapped to eth0 with VLANs 10, 11, and 12. Version-Release number of selected component (if applicable): OSP-11 How reproducible: Always Steps to Reproduce: 1. Incorrectly configure network switch VLANs different than THT nic config files 2. Run deployment 3. Actual results: Deployment may fail eventually depending on which VLANs are incorrect. Expected results: Pre-deployment validation will detect that switch is incorrectly configured and return error. Additional info:
Verification is pending fix for https://bugzilla.redhat.com/show_bug.cgi?id=1554248
Using: openstack-tripleo-validations-8.4.0-1.el7ost.noarch Verified by including patch https://review.openstack.org/#/c/563969 which has merged to stable/queens. $ openstack action execution run tripleo.plan.create_container '{"container":"my-templates"}' $ swift upload my-templates /home/stack/templates/ $ openstack workflow execution create tripleo.plan_management.v1.create_deployment_plan '{"container":"my-templates"}' $ export TRIPLEO_PLAN_NAME=my-templates Using a network switch running lldp attached to 2 nodes with following vlans configured on switch: $ openstack baremetal introspection interface list host2 +-----------+-------------------+------------------------------+-------------------+----------------+ | Interface | MAC Address | Switch Port VLAN IDs | Switch Chassis ID | Switch Port ID | +-----------+-------------------+------------------------------+-------------------+----------------+ | em1 | b0:83:fe:c6:63:86 | [101, 102, 104, 2001, 2002] | 64:64:9b:32:f3:00 | ge-0/0/25 | | em2 | b0:83:fe:c6:63:87 | [101, 104, 2001, 2002, 2003] | 64:64:9b:32:f3:00 | ge-1/0/25 | | p2p2 | a0:36:9f:52:7f:b3 | [101, 102, 104, 2001, 2002] | 64:64:9b:32:f3:00 | ge-1/0/26 | | p2p1 | a0:36:9f:52:7f:b2 | [101, 102, 104, 2001, 2002] | 64:64:9b:32:f3:00 | ge-0/0/26 | +-----------+-------------------+------------------------------+-------------------+----------------+ $ openstack baremetal introspection interface list host3 +-----------+-------------------+------------------------------+-------------------+----------------+ | Interface | MAC Address | Switch Port VLAN IDs | Switch Chassis ID | Switch Port ID | +-----------+-------------------+------------------------------+-------------------+----------------+ | em1 | b0:83:fe:c6:53:21 | [101, 102, 104, 2001, 2002] | 64:64:9b:32:f3:00 | ge-0/0/23 | | em2 | b0:83:fe:c6:53:22 | [101, 104, 2001, 2002, 2003] | 64:64:9b:32:f3:00 | ge-1/0/23 | | p2p2 | a0:36:9f:52:7e:d9 | [101, 102, 104, 2001, 2002] | 64:64:9b:32:f3:00 | ge-1/0/24 | | p2p1 | a0:36:9f:52:7e:d8 | [101, 102, 104, 2001, 2002] | 64:64:9b:32:f3:00 | ge-0/0/24 | +-----------+-------------------+------------------------------+-------------------+----------------+ Verified passing case: $ ansible-playbook -i /usr/bin/tripleo-ansible-inventory /usr/share/openstack-tripleo-validations/validations/switch-vlans.yaml PLAY [undercloud] ********************************************************************************************** TASK [Gathering Facts] ***************************************************************************************** ok: [localhost] TASK [Get Ironic Inspector swift auth_url] ********************************************************************* ok: [localhost] TASK [Get Ironic Inspector swift password] ********************************************************************* ok: [localhost] TASK [Check that switch vlans are present if used in nic-config files] ***************************************** ok: [localhost] PLAY RECAP ***************************************************************************************************** localhost : ok=4 changed=0 unreachable=0 failed=0 Verified failing case: templates changed to use vlans which aren't being reported via lldp on switch. ansible-playbook -i /usr/bin/tripleo-ansible-inventory /usr/share/openstack-tripleo-validations/validations/switch-vlans.yaml PLAY [undercloud] ********************************************************************************************** TASK [Gathering Facts] ***************************************************************************************** ok: [localhost] TASK [Get Ironic Inspector swift auth_url] ********************************************************************* ok: [localhost] TASK [Get Ironic Inspector swift password] ********************************************************************* ok: [localhost] TASK [Check that switch vlans are present if used in nic-config files] ***************************************** fatal: [localhost]: FAILED! => {"changed": false, "msg": "VLAN ID 777 not on attached switch\nVLAN ID 888 not on attached switch\nVLAN ID 1010 not on attached switch\nVLAN ID 2020 not on attached switch\nVLAN ID 777 not on attached switch\nVLAN ID 888 not on attached switch\nVLAN ID 999 not on attached switch\nVLAN ID 1010 not on attached switch\nVLAN ID 2030 not on attached switch"} [WARNING]: Could not create retry file '/usr/share/openstack-tripleo-validations/validations/switch- vlans.retry'. [Errno 13] Permission denied: u'/usr/share/openstack-tripleo-validations/validations /switch-vlans.retry' PLAY RECAP ***************************************************************************************************** localhost : ok=3 changed=0 unreachable=0 failed=1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086