Bug 1471531

Summary: [RFE] Add TripleO validation of VLANs using introspected LLDP data
Product: Red Hat OpenStack Reporter: Bob Fournier <bfournie>
Component: openstack-tripleo-validationsAssignee: Bob Fournier <bfournie>
Status: CLOSED ERRATA QA Contact: Omri Hochman <ohochman>
Severity: high Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: achernet, augol, gchamoul, jjoyce, jschluet, mlammon, racedoro, sclewis, slinaber, tvignaud
Target Milestone: Upstream M1Keywords: FutureFeature, Triaged
Target Release: 13.0 (Queens)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-validations-8.1.1-0.20180119231917.2ff3c79.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:32:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1554248    
Bug Blocks:    

Description Bob Fournier 2017-07-16 17:22:29 UTC
Description of problem:
TripleO Heat Templates can define VLANs per NIC for roles (controller, compute etc.) for isolated networks.  The problem is that the network switches that the NICs are attached to may not have been set up properly for these VLANs.  As of OSP-11, LLDP data for baremetal nodes is captured during the Ironic inspection process which may have VLAN info for the attached switch ports.  This VLAN info can be checked during a pre-deployment validation to ensure that the VLANs configured in THT nic config files are also configured on the switch.  

The NIC alias to real NIC name conversion must be done during this validation similar to what os-net-config does in order to map the NICs configured in nic config files to actual NIC names in the introspected data.

Since roles can use different VLANs, e.g. the controller may use additional networks than compute so would use additional VLANs, the challenge is to map the roles to Ironic nodes in the pre-deployment phase.  This mapping may not be available in this pre-deployment validation phase.  It may be necessary to only check that ALL configured VLANs per switch port are available on the switches. In other words, if a role in THT has eth0 with VLANs 10, 11, and 12, all Ironic nodes must have LLDP data indicating that the switch port attached to eth0 has VLANs 10, 11, 12.  If it is possible to map the roles to Ironic nodes in this phase then it will be possible to check, for  example, that all controller nodes have a switch port mapped to eth0 with VLANs 10, 11, and 12.

Version-Release number of selected component (if applicable):
OSP-11


How reproducible:
Always


Steps to Reproduce:
1.  Incorrectly configure network switch VLANs different than THT nic config files
2.  Run deployment
3.

Actual results:
Deployment may fail eventually depending on which VLANs are incorrect.


Expected results:
Pre-deployment validation will detect that switch is incorrectly configured and return error.


Additional info:

Comment 4 Bob Fournier 2018-04-13 16:42:31 UTC
Verification is pending fix for https://bugzilla.redhat.com/show_bug.cgi?id=1554248

Comment 5 Bob Fournier 2018-04-30 18:52:54 UTC
Using:
openstack-tripleo-validations-8.4.0-1.el7ost.noarch

Verified by including patch https://review.openstack.org/#/c/563969 which has merged to stable/queens.

$ openstack action execution run tripleo.plan.create_container '{"container":"my-templates"}'

$ swift upload my-templates /home/stack/templates/

$ openstack workflow execution create tripleo.plan_management.v1.create_deployment_plan '{"container":"my-templates"}'

$ export TRIPLEO_PLAN_NAME=my-templates

Using a network switch running lldp attached to 2 nodes with following vlans configured on switch:
$ openstack baremetal introspection interface list host2
+-----------+-------------------+------------------------------+-------------------+----------------+
| Interface | MAC Address       | Switch Port VLAN IDs         | Switch Chassis ID | Switch Port ID |
+-----------+-------------------+------------------------------+-------------------+----------------+
| em1       | b0:83:fe:c6:63:86 | [101, 102, 104, 2001, 2002]  | 64:64:9b:32:f3:00 | ge-0/0/25      |
| em2       | b0:83:fe:c6:63:87 | [101, 104, 2001, 2002, 2003] | 64:64:9b:32:f3:00 | ge-1/0/25      |
| p2p2      | a0:36:9f:52:7f:b3 | [101, 102, 104, 2001, 2002]  | 64:64:9b:32:f3:00 | ge-1/0/26      |
| p2p1      | a0:36:9f:52:7f:b2 | [101, 102, 104, 2001, 2002]  | 64:64:9b:32:f3:00 | ge-0/0/26      |
+-----------+-------------------+------------------------------+-------------------+----------------+
$ openstack baremetal introspection interface list host3
+-----------+-------------------+------------------------------+-------------------+----------------+
| Interface | MAC Address       | Switch Port VLAN IDs         | Switch Chassis ID | Switch Port ID |
+-----------+-------------------+------------------------------+-------------------+----------------+
| em1       | b0:83:fe:c6:53:21 | [101, 102, 104, 2001, 2002]  | 64:64:9b:32:f3:00 | ge-0/0/23      |
| em2       | b0:83:fe:c6:53:22 | [101, 104, 2001, 2002, 2003] | 64:64:9b:32:f3:00 | ge-1/0/23      |
| p2p2      | a0:36:9f:52:7e:d9 | [101, 102, 104, 2001, 2002]  | 64:64:9b:32:f3:00 | ge-1/0/24      |
| p2p1      | a0:36:9f:52:7e:d8 | [101, 102, 104, 2001, 2002]  | 64:64:9b:32:f3:00 | ge-0/0/24      |
+-----------+-------------------+------------------------------+-------------------+----------------+

Verified passing case:

$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory /usr/share/openstack-tripleo-validations/validations/switch-vlans.yaml

PLAY [undercloud] **********************************************************************************************

TASK [Gathering Facts] *****************************************************************************************
ok: [localhost]

TASK [Get Ironic Inspector swift auth_url] *********************************************************************
ok: [localhost]

TASK [Get Ironic Inspector swift password] *********************************************************************
ok: [localhost]

TASK [Check that switch vlans are present if used in nic-config files] *****************************************
ok: [localhost]

PLAY RECAP *****************************************************************************************************
localhost                  : ok=4    changed=0    unreachable=0    failed=0   


Verified failing case:
templates changed to use vlans which aren't being reported via lldp on switch. 

ansible-playbook -i /usr/bin/tripleo-ansible-inventory /usr/share/openstack-tripleo-validations/validations/switch-vlans.yaml

PLAY [undercloud] **********************************************************************************************

TASK [Gathering Facts] *****************************************************************************************
ok: [localhost]

TASK [Get Ironic Inspector swift auth_url] *********************************************************************
ok: [localhost]

TASK [Get Ironic Inspector swift password] *********************************************************************
ok: [localhost]

TASK [Check that switch vlans are present if used in nic-config files] *****************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "VLAN ID 777 not on attached switch\nVLAN ID 888 not on attached switch\nVLAN ID 1010 not on attached switch\nVLAN ID 2020 not on attached switch\nVLAN ID 777 not on attached switch\nVLAN ID 888 not on attached switch\nVLAN ID 999 not on attached switch\nVLAN ID 1010 not on attached switch\nVLAN ID 2030 not on attached switch"}
 [WARNING]: Could not create retry file '/usr/share/openstack-tripleo-validations/validations/switch-
vlans.retry'.         [Errno 13] Permission denied: u'/usr/share/openstack-tripleo-validations/validations
/switch-vlans.retry'


PLAY RECAP *****************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=1

Comment 7 errata-xmlrpc 2018-06-27 13:32:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086