Description of problem: Previously (in OSP16 release) we were able to use bandwidth aware scheduling in order to schedule an instance with minimum bandwidth. It appears that this is no longer working in our 16.1 deployments. We have supplied the correct 'NeutronSriovResourceProviderBandwidths' TripleO parameter and ensured that the correct 'resource_provider_bandwidths' values were populated in SR-IOV NIC agent. The error that we are receiving in the log is: nova-scheduler.log:2021-04-21 10:04:24.296 23 ERROR nova.scheduler.client.report [req-91a26cb1-2823-433a-9ef7-22f5d997f789 42b9e8e4fc67453aaecb179e1417bda0 61f0d411b9c4486c9cec8144367f3b6e - default default] Failed to retrieve allocation candidates from placement API for filters: RequestGroup(aggregates=[],forbidden_aggregates=set([]),forbidden_traits=set(['COMPUTE_STATUS_DISABLED']),in_tree=None,provider_uuids=[],requester_id=None,required_traits=set(['COMPUTE_IMAGE_TYPE_QCOW2']),resources={DISK_GB=20,MEMORY_MB=8192,PCPU=6},use_same_provider=False), RequestGroup(aggregates=[],forbidden_aggregates=set([]),forbidden_traits=set([]),in_tree=None,provider_uuids=[],requester_id='a0f133d7-60fb-4477-8c69-21b70a9c4c57',required_traits=set(['CUSTOM_PHYSNET_MELLANOX_SRIOV_1','CUSTOM_VNIC_TYPE_DIRECT']),resources={NET_BW_EGR_KILOBIT_PER_SEC=25000000},use_same_provider=True) Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20210415.n.0 How reproducible: In all of the setups we have tried (OVS/OVN backends and even 16.2 which is out of scope of this bug). Steps to Reproduce: 1. Deploy an environment with the required bandwidth parameters 2. Attempt to spawn an instance with SR-IOV minimum bandwidth. Actual results: Instance fails to spawn. Expected results: Instance spawns successfully. Additional info: Will provide SOS report in comments.
I think this is a misconfiguration in my deployment, after looking in the documentation again (https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/configuration_reference/neutron_2) I have noticed that it appears that I am missing 'resource_provider_hypervisors' parameter which is not yet exposed through TripleO BZ#1936383. Regardless, I will try to add the parameter manually and will update this bug.
you might be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1949385 i have not fully looked at your sos reports to confim but the default hostnames shoudl match. you should not need to update resource_provider_hypervisors just resource_provider_bandwidths
looking at the sriov agent config i can see that the bandwith is correctly listed. [sriov_nic] physical_device_mappings=sriov-1:enp6s0f2,sriov-2:enp6s0f3,mellanox-sriov-1:enp4s0f0,mellanox-sriov-2:enp4s0f1 resource_provider_bandwidths=enp6s0f2:10000000:10000000,enp6s0f3:10000000:10000000,enp4s0f0:40000000:40000000,enp4s0f1:40000000:40000000 looking at /etc/hostname its "computeovsdpdksriov-tigon10-0" so we would expect the placement RP to be named "computeovsdpdksriov-tigon10-0" can you list the resource providers in the environment so we can review.
this is likely a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1900500 actully.
*** This bug has been marked as a duplicate of bug 1900500 ***