Bug 2101409

Summary: resource_provider_hypervisors not included in the sriov agent configuration
Product: Red Hat OpenStack Reporter: Eduardo Olivares <eolivare>
Component: puppet-neutronAssignee: Miro Tomaska <mtomaska>
Status: CLOSED ERRATA QA Contact: Fiorella Yanac <fyanac>
Severity: high Docs Contact:
Priority: urgent    
Version: 17.0 (Wallaby)CC: averdagu, chrisw, ekuris, jjoyce, jschluet, mtomaska, ralonsoh, scohen, slinaber, spower, tkajinam, tvignaud
Target Milestone: betaKeywords: AutomationBlocker, Regression, Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-neutron-18.5.1-0.20220428001500.3bdf311.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:23:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2102466, 2103019    
Bug Blocks:    

Description Eduardo Olivares 2022-06-27 12:10:55 UTC
Description of problem:
The following parameter is included in a THT file:
  ExtraConfig:
    neutron::agents::ml2::sriov::resource_provider_hypervisors: "enp7s0f3:%{hiera('fqdn_canonical')},enp5s0f0:%{hiera('fqdn_canonical')}"

Link to the THT file:
https://code.engineering.redhat.com/gerrit/plugins/gitiles/Neutron-QE/+/refs/heads/master/BM_heat_template/ospd-17-vlan-sriov-hybrid-ha-ovn-squad-titan09/network-environment.yaml#48

On OSP16.2, that value is added to the sriov_nic section within the sriov agent config file on the compute nodes:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.2_director-rhel-virthost-3cont_2comp-ipv4-vlan-sriov/54/computesriov-0/var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/sriov_agent.ini.gz
[sriov_nic]
physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000
resource_provider_hypervisors=enp7s0f3:computesriov-0.localdomain,enp5s0f0:computesriov-0.localdomain

On OSP17, that value is not present in the sriov agent configuration:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp-ipv4-vlan-sriov/3/compute-0/var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/sriov_agent.ini.gz
[sriov_nic]
physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000


Without that configuration, the sriov tests covering the maximum bandwidth placement enforcement fail when they try to obtain the max bw values from the placement API:
https://code.engineering.redhat.com/gerrit/plugins/gitiles/rhos-qe-tests/tempest_neutron_plugin/+/master/neutron_plugin/tests/scenario/test_qos.py#1267




Version-Release number of selected component (if applicable):
RHOS-17.0-RHEL-9-20220615.n.2

How reproducible:
100%

Steps to Reproduce:
1. tempest run -r test_minbw_placement_enforcement_sriov_egress (or "openstack resource provider list" and check no entries include "NIC Switch agent")

Comment 3 Takashi Kajinami 2022-06-30 02:27:18 UTC
I've checked the hieradata file in that node but could not find the key defined in ExtraConfig.

/etc/puppet/hieradata/extraconfig.json.gz
~~~
{
    "neutron::agents::l3::extensions": "fip_qos"
}
~~~

I think this is not the puppet issue but something caused by the change in TripleO/Heat
about the way how it merges template files. (it no longer do deep-merge by default)

What you can try would be adding

parameter_merge_strategies
  ExtraConfig: merge

in that template file.

By the way that [sriov_nic] resource_provider_hypervisors was required to workaround bz 1989820
and my understanding is that it is no longer required to use minimum qos rule.
If not then we'd need to look into the problem from Neutron's PoV.

Comment 4 Takashi Kajinami 2022-06-30 02:50:36 UTC
As I mentioned in comment:3, the issue is supposed to be fixed in Neutron and I'm not sure whether overriding hypervisor is still required,
but you might want to backport
 https://review.opendev.org/c/openstack/tripleo-heat-templates/+/796402
which is merged in master as a "safe-guard"

Comment 15 Fiorella Yanac 2022-07-13 09:30:03 UTC
OSP17 environment with OVN+SRIOV configured
verified with puddle:RHOS-17.0-RHEL-9-20220711.n.1

 [stack@undercloud-0 tempest-dir]$ openstack resource provider list
+--------------------------------------+------------------------------------------------------+------------+
| uuid                                 | name                                                 | generation |
+--------------------------------------+------------------------------------------------------+------------+
| ed237545-5a39-40ae-8983-9e27cd9afb1a | computesriov-0.localdomain                           |        465 |
| d9f94b62-15c7-44fc-9872-345f6a04fae1 | computesriov-1.localdomain                           |        419 |
| 376826fd-f904-58b4-a553-ba9001f5c537 | computesriov-1.localdomain:NIC Switch agent          |          0 |
| 35b1343b-08d8-5bcf-8fa6-2fba99cef411 | computesriov-1.localdomain:NIC Switch agent:enp7s0f3 |         31 |
| 57a35465-c0c7-595e-b103-2ba7d769285a | computesriov-1.localdomain:NIC Switch agent:enp5s0f0 |         31 |
| ac24419c-77b2-5951-b8ad-2dc0e5f9ca3e | computesriov-0.localdomain:NIC Switch agent          |          0 |
| 79db8911-b47c-5246-b55e-94da0c3688f5 | computesriov-0.localdomain:NIC Switch agent:enp7s0f3 |         32 |
| 95fc4865-db33-5f68-bc72-051086f7d6be | computesriov-0.localdomain:NIC Switch agent:enp5s0f0 |         32 |
+--------------------------------------+------------------------------------------------------+------------+

On each compute: /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/sriov_agent.ini

compute-0
[sriov_nic]
physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000
resource_provider_hypervisors=enp7s0f3:computesriov-0.localdomain,enp5s0f0:computesriov-0.localdomain


compute-1 
[sriov_nic]
physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000
resource_provider_hypervisors=enp7s0f3:computesriov-1.localdomain,enp5s0f0:computesriov-1.localdomain


the test_minbw_placement_enforcement_sriov_egress[1] is Passed
[1] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp-ipv4-vlan-sriov/10/testReport/neutron_plugin.tests.scenario.test_qos/QosTestSriovMinBwPlacementEnforcementTest/test_minbw_placement_enforcement_sriov_egress_id_ad4d9c2a_de45_4a05_a70e_78e953a8463d_/

Comment 20 errata-xmlrpc 2022-09-21 12:23:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543