Created attachment 1489893 [details] nova-compute.log Description of problem: Attempting to boot an instance with SR-IOV port with premade port (but also relevant on neutron generated ports at instance boot) fails. (overcloud) [stack@undercloud-0 ~]$ openstack port show port1-sriov +-----------------------+--------------------------------------------------------------------------+ | Field | Value | +-----------------------+--------------------------------------------------------------------------+ | admin_state_up | UP | | allowed_address_pairs | | | binding_host_id | | | binding_profile | physical_network='sriov-1' | | binding_vif_details | | | binding_vif_type | unbound | | binding_vnic_type | direct | | created_at | 2018-10-02T11:52:54Z | | data_plane_status | None | | description | | | device_id | | | device_owner | | | dns_assignment | None | | dns_name | None | | extra_dhcp_opts | | | fixed_ips | ip_address='60.0.30.5', subnet_id='1e90b136-aa0f-4329-87db-a25221f61122' | | id | e63ab5af-fb5a-46e5-91f0-d5674def2289 | | ip_address | None | | mac_address | fa:16:3e:18:b7:e2 | | name | port1-sriov | | network_id | 338edcda-37d1-4a5a-a553-72aa86ffde4d | | option_name | None | | option_value | None | | port_security_enabled | True | | project_id | fbf526f3027747c298597ec43cc758bd | | qos_policy_id | None | | revision_number | 42 | | security_group_ids | 334ce6ef-6237-4241-99ee-857db0683fec | | status | DOWN | | subnet_id | None | | tags | | | trunk_details | None | | updated_at | 2018-10-02T12:35:23Z | +-----------------------+--------------------------------------------------------------------------+ Server creation command: openstack server create --image rhel-guest-image-7.5-192.x86_64.qcow2 --flavor m1.medium.huge_pages_cpu_pinning_numa_node-0 --nic port-id=e63ab5af-fb5a-46e5-91f0-d5674def2289 TestServer Creation fails: (overcloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+------------+--------+----------+---------------------------------------+----------------------------------------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------------+--------+----------+---------------------------------------+----------------------------------------------+ | 0d8afc28-0fd9-4709-9952-97d29c6ac48f | TestServer | ERROR | | rhel-guest-image-7.5-192.x86_64.qcow2 | m1.medium.huge_pages_cpu_pinning_numa_node-0 | +--------------------------------------+------------+--------+----------+---------------------------------------+----------------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack server show TestServer +-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | None | | OS-EXT-SRV-ATTR:hypervisor_hostname | None | | OS-EXT-SRV-ATTR:instance_name | instance-00000030 | | OS-EXT-STS:power_state | NOSTATE | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | error | | OS-SRV-USG:launched_at | None | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | | | config_drive | | | created | 2018-10-03T09:03:05Z | | fault | {u'message': u'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 0d8afc28-0fd9-4709-9952-97d29c6ac48f.', u'code': 500, u'details': u' File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 581, in build_instances\n raise exception.MaxRetriesExceeded(reason=msg)\n', u'created': u'2018-10-03T09:03:25Z'} | | flavor | m1.medium.huge_pages_cpu_pinning_numa_node-0 (bb4a7ee6-dc7f-4260-b091-fbaac2ee9b64) | | hostId | | | id | 0d8afc28-0fd9-4709-9952-97d29c6ac48f | | image | rhel-guest-image-7.5-192.x86_64.qcow2 (2c85629a-beb5-404e-9cbb-b2c0700759aa) | | key_name | None | | name | TestServer | | project_id | fbf526f3027747c298597ec43cc758bd | | properties | | | status | ERROR | | updated | 2018-10-03T09:03:24Z | | user_id | 923c5299dc584d5280b09f2439bc0627 | | volumes_attached | | +-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Resulting in an error in nova-compute.log on compute node (attached in BZ with the relevant error). Version-Release number of selected component (if applicable): (overcloud) [stack@undercloud-0 ~]$ cat /etc/rhosp-release Red Hat OpenStack Platform release 13.0 (Queens) (overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version 2018-09-28.1 [root@computeovsdpdksriov-0 nova]# rpm -qa | grep nova openstack-nova-common-17.0.5-3.d7864fbgit.el7ost.noarch openstack-nova-compute-17.0.5-3.d7864fbgit.el7ost.noarch openstack-nova-api-17.0.5-3.d7864fbgit.el7ost.noarch python2-novaclient-10.1.0-1.el7ost.noarch openstack-nova-console-17.0.5-3.d7864fbgit.el7ost.noarch openstack-nova-placement-api-17.0.5-3.d7864fbgit.el7ost.noarch puppet-nova-12.4.0-6.el7ost.noarch openstack-nova-conductor-17.0.5-3.d7864fbgit.el7ost.noarch openstack-nova-scheduler-17.0.5-3.d7864fbgit.el7ost.noarch openstack-nova-migration-17.0.5-3.d7864fbgit.el7ost.noarch openstack-nova-novncproxy-17.0.5-3.d7864fbgit.el7ost.noarch python-nova-17.0.5-3.d7864fbgit.el7ost.noarch How reproducible: Tested on 3 separate unique deployments, and always occurred. Steps to Reproduce: 1. Create a network attached to physnet with SR-IOV nics 2. Attempt to boot an instance on SR-IOV network Actual results: Server creation fails with return code 500, nova generates an error on compute node regarding libvirt XML definition. Expected results: Server successfully booted.
It's a know issue when setting tx/rx queue size and using SRIOV ports, we previously badly configure the SRIOV interfaces with vhost driver and configured the queue sizes. It has been fixed with bug 1620171. Please confirm and so we can close it as a duplicate. Thanks
Hey, It's indeed duplicate of bug 1620171. In my deployment I haven't mentioned rx/tx queue sizes for libvirt and it's now defined by default after '2018-09-13.1' puddle during deployment so that's why I encountered it. Thanks for your help. *** This bug has been marked as a duplicate of bug 1620171 ***