Description of problem: When attempting to spawn a guest instance on top of SR-IOV enabled Mellanox Connect-X5 with minimum bandwidth QoS, the scheduling fails. It is important to note that this does not occur if we attach the QoS after the instance is spawned. Once the policy is attached post spawn, it is applied to the VF and behaves correctly. The issue only occurs when attempting to schedule an initial guest with QoS policy attached to port. Create network: openstack network create --provider-network-type vlan --provider-physical-network sriov-3 sriov-mx5-net Create subnet: openstack subnet create --subnet-range 60.0.20.0/24 --dhcp --network sriov-mx5-net sriov-mx5-net_subnet Create QoS policy: openstac network qos policy create sriov-qos Create QoS minimum bandwidth rule: openstack network qos rule create --type minimum-bandwidth --min-kbps 4000000 --egress sriov-qos Creare port: openstack port create --network sriov-mx5-net --vnic-type direct --qos-policy sriov-qos sriov-mx5-direct-port Create server: openstack server create --flavor m1.medium.huge_pages_cpu_pinning_numa_node-0 --image rhel-guest-image-7-6-210-x86-64-qcow2 --nic port-id=sriov-mx5-direct-port TEST_SERVER Wait for server to enter 'ERROR' state and grab the fault: openstack server show -c status -c fault TEST_SERVER -c id -f shell fault="{'code': 500, 'created': '2020-02-10T06:12:00Z', 'message': 'No valid host was found. ', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 1333, in schedule_and_build_instances\n instance_uuids, return_alternates=True)\n File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 839, in _schedule_instances\n return_alternates=return_alternates)\n File "/usr/lib/python3.6/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations\n instance_uuids, return_objects, return_alternates)\n File "/usr/lib/python3.6/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations\n return cctxt.call(ctxt, \'select_destinations\', **msg_args)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call\n transport_options=self.transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send\n transport_options=transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 646, in send\n transport_options=transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 636, in _send\n raise result\nnova.exception_Remote.NoValidHost_Remote: No valid host was found. \nTraceback (most recent call last):\n\n File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 235, in inner\n return func(*args, **kwargs)\n\n File "/usr/lib/python3.6/site-packages/nova/scheduler/manager.py", line 199, in select_destinations\n raise exception.NoValidHost(reason="")\n\nnova.exception.NoValidHost: No valid host was found. \n\n'}" id="36da87fb-2363-417f-bcda-b4f4b2e8d2bc" status="ERROR" The following error appears on controller node: 020-02-10 06:11:58.841 22 DEBUG nova.scheduler.request_filter [req-ea61557d-9ac1-4699-b83f-67ee88778162 c9f98d7a25e44fd58646792151e8c297 90af442de83f4404a9b87580b869dcb9 - default default] Request filter 'compute_status_filter' took 0.0 seconds wrapper /usr/lib/python3.6/site-packages/nova/scheduler/request_filter.py:44 2020-02-10 06:11:58.842 22 DEBUG nova.scheduler.utils [req-ea61557d-9ac1-4699-b83f-67ee88778162 c9f98d7a25e44fd58646792151e8c297 90af442de83f4404a9b87580b869dcb9 - default default] Translating request for VCPU=6 to VCPU=0,PCPU=6 _translate_pinning_policies /usr/lib/python3.6/site-packages/nova/scheduler/utils.py:246 2020-02-10 06:12:00.272 22 ERROR nova.scheduler.client.report [req-ea61557d-9ac1-4699-b83f-67ee88778162 c9f98d7a25e44fd58646792151e8c297 90af442de83f4404a9b87580b869dcb9 - default default] Failed to retrieve allocation candidates from placement API for filters: RequestGroup(aggregates=[],forbidden_aggregates=set([]),forbidden_traits=set(['COMPUTE_STATUS_DISABLED']),in_tree=None,provider_uuids=[],requester_id=None,required_traits=set(['COMPUTE_IMAGE_TYPE_QCOW2']),resources={DISK_GB=20,MEMORY_MB=8192,PCPU=6},use_same_provider=False), RequestGroup(aggregates=[],forbidden_aggregates=set([]),forbidden_traits=set([]),in_tree=None,provider_uuids=[],requester_id='9fcdf4dd-e774-4e8a-a38e-b941744a8c88',required_traits=set(['CUSTOM_VNIC_TYPE_DIRECT','CUSTOM_PHYSNET_SRIOV_6']),resources={NET_BW_EGR_KILOBIT_PER_SEC=4000000},use_same_provider=True) Got 400: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n No such trait(s): CUSTOM_PHYSNET_SRIOV_3. ", "code": "placement.undefined_code", "request_id": "req-6d8139cf-bcda-4df0-b86e-736c73f596af"}]}. We can see that the port has some additional values attached in 'resource_request' attribute: openstack port create --network sriov-mx5-net --vnic-type direct --qos-policy sriov-qos sriov-mx5-direct-port -c resource_request -f shell resource_request="{'required': ['CUSTOM_PHYSNET_SRIOV_3', 'CUSTOM_VNIC_TYPE_DIRECT'], 'resources': {'NET_BW_EGR_KILOBIT_PER_SEC': 4000000}}" Version-Release number of selected component (if applicable): RHOS_TRUNK-16.0-RHEL-8-20200204.n.1 How reproducible: Always Steps to Reproduce: 1. Deploy overcloud with SR-IOV capabilities with hardware supporting Min QoS 2. Attempt to spawn a guest instance with min QoS attached to port Actual results: Guest instances fails to spawn Expected results: Guest instance is scheduled successfully. Additional info: Will attach SOS reports in comments.
Sorry, Posted incorrect command to view port info, here is the command I wanted to post: openstack port show sriov-mx5-direct-port -c resource_request -f shell resource_request="{'required': ['CUSTOM_PHYSNET_SRIOV_3', 'CUSTOM_VNIC_TYPE_DIRECT'], 'resources': {'NET_BW_EGR_KILOBIT_PER_SEC': 4000000}}"
In Train release (OSP16) the behaviour for minimum QoS has changed. It was misconfiguration on my side, following the documentation https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/networking_guide/QoS#proc-guaranteed-min-bw I was able to spawn a guest with a minimum QoS attached to port.