Description of problem: Booted VM with Direct-physical port (The entire PF is associated to the instance). When I deleted the instance I expected that PF will be available and online. Actually when I am trying to boot instance with direct port (VF) I get this error message : VM in error state- fault | {"message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 102fde1b-22d3-4b05-8246-0f1af520455a. Last exception: internal error: Unable to configure VF 4 of PF 'p1p1' because the PF is not online. Please change host network config", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/conductor/manager.py\", line 524, in build_instances | filter_properties, instances[0].uuid) [root@compute-0 ~]# ifconfig |grep p1p1 --->PF is not online it's impossible to create instance with ditect port (VF) sosreport: https://drive.google.com/drive/folders/0B_izhJVSkOTDdnV3SmtNWnUwYUk Version-Release number of selected component (if applicable): [root@controller-0 ~]# rpm -qa |grep neutron openstack-neutron-10.0.0-11.el7ost.noarch python-neutron-lib-1.1.0-1.el7ost.noarch openstack-neutron-sriov-nic-agent-10.0.0-11.el7ost.noarch openstack-neutron-ml2-10.0.0-11.el7ost.noarch python-neutronclient-6.1.0-1.el7ost.noarch openstack-neutron-common-10.0.0-11.el7ost.noarch openstack-neutron-openvswitch-10.0.0-11.el7ost.noarch python-neutron-10.0.0-11.el7ost.noarch puppet-neutron-10.3.0-2.el7ost.noarch [root@controller-0 ~]# rpm -qa |grep nova openstack-nova-common-15.0.2-1.el7ost.noarch openstack-nova-cert-15.0.2-1.el7ost.noarch puppet-nova-10.4.0-3.el7ost.noarch openstack-nova-compute-15.0.2-1.el7ost.noarch openstack-nova-placement-api-15.0.2-1.el7ost.noarch openstack-nova-console-15.0.2-1.el7ost.noarch openstack-nova-novncproxy-15.0.2-1.el7ost.noarch openstack-nova-conductor-15.0.2-1.el7ost.noarch openstack-nova-scheduler-15.0.2-1.el7ost.noarch python-nova-15.0.2-1.el7ost.noarch openstack-nova-api-15.0.2-1.el7ost.noarch python-novaclient-7.1.0-1.el7ost.noarch How reproducible: Always Steps to Reproduce: 1. Deploy SRIOV setup with PF support 2. boot instance with Direct-physical port 3. Delete VM that is associated to PF 4. boot instance with Direct port (VF) Expected results: VM with direct port should be booted. PF should be released Additional info: Workaround - systemctl restart network
Created attachment 1268662 [details] log
I will also need the os-net-config version used for this test to ensure you have the required patches. Also note that you need to add the following parameters to the SR-IOV physical function configuration in your heat templates: nm_controlled = true hotplug = true
(In reply to Brent Eagles from comment #4) > I will also need the os-net-config version used for this test to ensure you > have the required patches. Also, note that you need to add the following > parameters to the SR-IOV physical function configuration in your heat > templates: > > nm_controlled = true > hotplug = true Is there any chance to get "os-net-config" from SOS-report? about the new parameters, I will try to deploy setup and check it. Will do my best to do it soon. Please provide the specific path of the config file that I need to add those parameters, so I do not miss anything. thanks.
@Eran, The parameters are applied to the interface configuration on the network templates. For example, in your version of tripleo-heat-templates/network/config/multiple-nics/compute.yaml, you need entries for the PF interfaces -type: interface name: nic6 use_dhcp: false nm_controlled: true hotplug: true Network manager will take care of bringing the interface back up. The exact path of the relevant file where these parameters would be would depend on the test environment. With respect to os-net-config - aren't the versions of all of the installed packages kept somewhere?
(In reply to Brent Eagles from comment #6) > @Eran, The parameters are applied to the interface configuration on the > network templates. For example, in your version of > tripleo-heat-templates/network/config/multiple-nics/compute.yaml, you need > entries for the PF interfaces > > -type: interface > name: nic6 > use_dhcp: false > nm_controlled: true > hotplug: true > > Network manager will take care of bringing the interface back up. The exact > path of the relevant file where these parameters would be would depend on > the test environment. > > With respect to os-net-config - aren't the versions of all of the installed > packages kept somewhere? Hmm, I don't know if they kept somewhere.
root@compute-0 ~]# rpm -qa |grep os-net os-net-config-6.0.0-3.el7ost.noarch When I set my setup with > nm_controlled: true > hotplug: true I didn't success to boot VF instance. Got error: {"message": "Build of instance 594cacae-6bdd-45b7-ae1b-4102b1d86cce aborted: Failed to allocate the network(s), not rescheduling.", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 1780, in _do_build_and_run_instance | | | filter_properties) | | | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 1990, in _build_and_run_instance
Created attachment 1274630 [details] setup config files
This seems to be timing-dependent. I was able to create VMs with PF ports, delete them and create VMs with VF ports on this system most of the time. It is only when I deleted a VM with a PF port and created the VF based one shortly (within 30 seconds?) thereafter. I'll study further to see if I can find out where the race(s) lie.