Bug 1344315
Summary: | SRIOV PF/VF allocation fails with NUMA aware flavor Edit | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Ricardo Noriega <rnoriega> |
Component: | openstack-nova | Assignee: | Vladik Romanovsky <vromanso> |
Status: | CLOSED ERRATA | QA Contact: | Prasanth Anbalagan <panbalag> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 9.0 (Mitaka) | CC: | berrange, dasmith, eglynn, kchamart, sbauza, sferdjao, sgordon, srevivo, vromanso, yrachman |
Target Milestone: | async | Keywords: | ZStream |
Target Release: | 9.0 (Mitaka) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-nova-13.1.1-3.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-21 14:08:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ricardo Noriega
2016-06-09 12:03:40 UTC
Neutron port was created with the option "--binding:vnic-type direct". Booting an instance with --nic causes NUMATopologyFILTER to fail. Without the option, booting an instance works. Please check logs below. Few things to note - The error was observed with both flavors 200 and 300 below, *Instance boot was done after deleting all existing instances. * pci_devices table shows that resources are still available. ********* VERSION ********* [root@serverX ~(keystone_admin)]# yum list installed | grep openstack-nova openstack-nova-api.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle openstack-nova-cert.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle openstack-nova-common.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle openstack-nova-compute.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle openstack-nova-conductor.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle openstack-nova-console.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle openstack-nova-novncproxy.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle openstack-nova-scheduler.noarch 1:13.1.1-4.el7ost @rhelosp-9.0-puddle ********** LOGS ********** [root@rhos-compute-node-02 ~(keystone_admin)]# nova flavor-show 200 +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Property | Value | +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 5 | | extra_specs | {"hw:cpu_policy": "dedicated", "hw:cpu_thread_policy": "prefer", "pci_passthrough:alias": "pci_pass_test:1", "hw:numa_nodes": "1", "hw:numa_mempolicy": "strict"} | | id | 200 | | name | pci-pass | | os-flavor-access:is_public | True | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 1 | +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ [root@rhos-compute-node-02 ~(keystone_admin)]# [root@rhos-compute-node-02 ~(keystone_admin)]# [root@rhos-compute-node-02 ~(keystone_admin)]# nova flavor-show 300 +----------------------------+-----------------------------------------------------------------------------------------------------------------------+ | Property | Value | +----------------------------+-----------------------------------------------------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 5 | | extra_specs | {"hw:cpu_policy": "dedicated", "hw:cpu_thread_policy": "prefer", "hw:numa_nodes": "1", "hw:numa_mempolicy": "strict"} | | id | 300 | | name | pci-pass1 | | os-flavor-access:is_public | True | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 1 | +----------------------------+-----------------------------------------------------------------------------------------------------------------------+ [root@rhos-compute-node-02 ~(keystone_admin)]# ******************************************** WITHOUT --nic option in boot ********************************************* [root@serverX ~(keystone_admin)]# nova show vm1 +--------------------------------------+----------------------------------------------------------+ | Property | Value | +--------------------------------------+----------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | serverX.lab.eng.rdu2.redhat.com | | OS-EXT-SRV-ATTR:hypervisor_hostname | serverX.lab.eng.rdu2.redhat.com | | OS-EXT-SRV-ATTR:instance_name | instance-0000000a | | OS-EXT-STS:power_state | 1 | | OS-EXT-STS:task_state | - | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2016-09-14T18:42:27.000000 | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | config_drive | | | created | 2016-09-14T18:42:19Z | | flavor | pci-pass (200) | | hostId | 715eec11b0869d3f063f023d3a53bcaf1357a62a4e596f9bcb986a08 | | id | 48d81369-b26b-46e8-94b8-ab35543e9506 | | image | cirros (e1819103-0254-4b2b-a323-38cd6143073d) | | key_name | - | | metadata | {} | | name | vm1 | | os-extended-volumes:volumes_attached | [] | | progress | 0 | | public network | 172.24.4.231 | | security_groups | default | | status | ACTIVE | | tenant_id | 0bd41cf0d4bd4eddacfcf5a51b2b13cf | | updated | 2016-09-14T18:42:27Z | | user_id | 42fa6f918169480589dca471b5240457 | +--------------------------------------+----------------------------------------------------------+ [root@serverX ~(keystone_admin)]# ***************************************************** WITH --nic option in boot ***************************************************** NUMATopologyFilter returned 0 hosts 2016-09-14 22:13:40.707 15706 INFO nova.filters [req-52a796ad-aef4-4ab5-9d79-70af7bdae0e8 42fa6f918169480589dca471b5240457 0bd41cf0d4bd4eddacfcf5a51b2b13cf - - -] Filtering removed all hosts for the request with instance ID '32f0b53e-3390-4d50-b9b3-9ab21913377c'. Filter results: ['RetryFilter: (start: 1, end: 1)', 'AvailabilityZoneFilter: (start: 1, end: 1)', 'RamFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'CoreFilter: (start: 1, end: 1)', 'AggregateInstanceExtraSpecsFilter: (start: 1, end: 1)', 'NUMATopologyFilter: (start: 1, end: 0)'] ==> nova-conductor.log <== 2016-09-14 22:13:40.710 15770 WARNING nova.scheduler.utils [req-52a796ad-aef4-4ab5-9d79-70af7bdae0e8 42fa6f918169480589dca471b5240457 0bd41cf0d4bd4eddacfcf5a51b2b13cf - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 150, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 104, in select_destinations dests = self.driver.select_destinations(ctxt, spec_obj) File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 74, in select_destinations raise exception.NoValidHost(reason=reason) NoValidHost: No valid host was found. There are not enough hosts available. ****************** PCI_DEVICES TABLE ****************** MariaDB [nova]> select * from pci_devices; +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+-----------------------------------+---------------+------------+-----------+--------------+ | created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+-----------------------------------+---------------+------------+-----------+--------------+ | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 1 | 1 | 0000:07:10.0 | 1520 | 8086 | type-VF | pci_0000_07_10_0 | label_8086_1520 | available | {"phys_function": "0000:06:00.0"} | NULL | NULL | 0 | 0000:06:00.0 | | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 2 | 1 | 0000:07:10.1 | 1520 | 8086 | type-VF | pci_0000_07_10_1 | label_8086_1520 | available | {"phys_function": "0000:06:00.1"} | NULL | NULL | 0 | 0000:06:00.1 | | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 3 | 1 | 0000:07:10.2 | 1520 | 8086 | type-VF | pci_0000_07_10_2 | label_8086_1520 | available | {"phys_function": "0000:06:00.2"} | NULL | NULL | 0 | 0000:06:00.2 | | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 4 | 1 | 0000:07:10.3 | 1520 | 8086 | type-VF | pci_0000_07_10_3 | label_8086_1520 | available | {"phys_function": "0000:06:00.3"} | NULL | NULL | 0 | 0000:06:00.3 | | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 5 | 1 | 0000:07:10.4 | 1520 | 8086 | type-VF | pci_0000_07_10_4 | label_8086_1520 | available | {"phys_function": "0000:06:00.0"} | NULL | NULL | 0 | 0000:06:00.0 | | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 6 | 1 | 0000:07:10.5 | 1520 | 8086 | type-VF | pci_0000_07_10_5 | label_8086_1520 | available | {"phys_function": "0000:06:00.1"} | NULL | NULL | 0 | 0000:06:00.1 | | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 7 | 1 | 0000:07:10.6 | 1520 | 8086 | type-VF | pci_0000_07_10_6 | label_8086_1520 | available | {"phys_function": "0000:06:00.2"} | NULL | NULL | 0 | 0000:06:00.2 | | 2016-09-14 17:59:14 | 2016-09-14 19:53:03 | NULL | 0 | 8 | 1 | 0000:07:10.7 | 1520 | 8086 | type-VF | pci_0000_07_10_7 | label_8086_1520 | available | {"phys_function": "0000:06:00.3"} | NULL | NULL | 0 | 0000:06:00.3 | +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+-----------------------------------+---------------+------------+-----------+--------------+ Hi, Well, it's happening because you are trying to allocate VFs that has physical_network: None, but all of the devices you have whitelisted has the physical_network: physnet1 [{'count': 8, 'product_id': u'1520', u'dev_type': u'type-VF', 'numa_node': 0, 'vendor_id': u'8086', u'physical_network': u'physnet1'}] Removing this tag makes the filters pass. However, it comes back to the binding error which is coming from neutron. It might be because you didn't configure the sriov agent (I'm not an expert here..) I would suggest to follow [1] and [2] to set it up. Using the pci aliases (pci passthrough without a neutron port) - it just works fine. Thank you! Vladik [1]https://docs.google.com/document/d/1qQbJlLI1hSlE4uwKpmVd0BoGSDBd8Z0lTzx5itQ6WL0/edit#heading=h.aj2vev1y0yj6 and [2]http://docs.openstack.org/mitaka/networking-guide/config-sriov.html Vladik, Thanks for looking in to it. It was a configuration issue (missing the SRIOV agent part). It works fine now. Marking it as VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1916.html |