Description of problem: When manage SR-IOV PFs as Neutron ports I can see that /sys/class/net/enp5s0f1/device/sriov_numvfs parameter gets "0" value . when I delete the PF port so I can switch to SRIOV - direct port (VF) I cant boot vm because sriov_numvfs parameter equal to "0" value Version-Release number of selected component (if applicable): rpm -qa |grep neutron python-neutron-lib-0.3.0-0.20160803002107.405f896.el7ost.noarch openstack-neutron-9.0.0-0.20160817153328.b9169e3.el7ost.noarch puppet-neutron-9.1.0-0.20160813031056.7cf5e07.el7ost.noarch python-neutron-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-lbaas-9.0.0-0.20160816191643.4e7301e.el7ost.noarch python-neutron-fwaas-9.0.0-0.20160817171450.e1ac68f.el7ost.noarch python-neutron-lbaas-9.0.0-0.20160816191643.4e7301e.el7ost.noarch openstack-neutron-ml2-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-metering-agent-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-openvswitch-9.0.0-0.20160817153328.b9169e3.el7ost.noarch python-neutronclient-5.0.0-0.20160812094704.ec20f7f.el7ost.noarch openstack-neutron-common-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-fwaas-9.0.0-0.20160817171450.e1ac68f.el7ost.noarch [root@controller1 ~(keystone_admin)]# rpm -qa |grep nova python-novaclient-5.0.1-0.20160724130722.6b11a1c.el7ost.noarch openstack-nova-api-14.0.0-0.20160817225441.04cef3b.el7ost.noarch puppet-nova-9.1.0-0.20160813014843.b94f0a0.el7ost.noarch openstack-nova-common-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-novncproxy-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-conductor-14.0.0-0.20160817225441.04cef3b.el7ost.noarch python-nova-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-scheduler-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-cert-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-console-14.0.0-0.20160817225441.04cef3b.el7ost.noarch How reproducible: always Steps to Reproduce: 1.Set SRIOV ENV and PF support : https://docs.google.com/document/d/1qQbJlLI1hSlE4uwKpmVd0BoGSDBd8Z0lTzx5itQ6WL0/edit# 2. BOOT VM that assign to PF (neutron port- direct-physical) - should boot well 3. check cat /sys/class/net/enp5s0f1/device/sriov_numvfs (=0) 4. delete vm and check again sriov_numvfs (=0) 5. I expect that numvfs should return to the default value that was configured Actual results: Expected results: Additional info:
Eran was this on a system where you were modifying the VF count to 0, or is it VF >0 allocate PF deallocate PF VF == 0 ?
Brent I did not change the VF count to 0 manually . The scenario is : VF >0 allocate PF deallocate PF VF == 0
According to this bug my opinion is that RFE : https://bugzilla.redhat.com/show_bug.cgi?id=1233921 is block because its not looks like there is dynamic change between PF & VF
When the tripleo SR-IOV support is completed, this should be taken care of because there will be a script that runs when the interface is brought back up the VFs will get reset to the expected configured value. Of course this is contingent on there being and ifup on that PF. @Eran, can you check the up/down status of the PF once it's been "released" if you get the chance?
(In reply to Brent Eagles from comment #5) > When the tripleo SR-IOV support is completed, this should be taken care of > because there will be a script that runs when the interface is brought back > up the VFs will get reset to the expected configured value. Of course this > is contingent on there being and ifup on that PF. > > @Eran, can you check the up/down status of the PF once it's been "released" > if you get the chance? Yes Brent it's been released after I delete the VM that associate to PF
Actually, that's not what I meant to ask. I was referring to whether the interface was up or down. If it is not set to "up" then we cannot rely on the ifup-local hook that we install to resolve the VF count issue. If it is down, can you try bringing it up and seeing if the VFs come back or not.
(In reply to Brent Eagles from comment #7) > Actually, that's not what I meant to ask. I was referring to whether the > interface was up or down. If it is not set to "up" then we cannot rely on > the ifup-local hook that we install to resolve the VF count issue. If it is > down, can you try bringing it up and seeing if the VFs come back or not. It set to up after I release the PF .
Okay thanks. So this means that the persistent VF thing added by tripleo isn't going to help. Vladik, is this something that nova can do when the pci device is released? Alternatively, we'll have to get the SR-IOV agent involved on the compute node.
Created attachment 1197223 [details] pf_test from vladikr env.
Could we please reproduce the bug, but this time please enable the VFs using the max_vfs parameter and not via sysfs tunable. Unload the ixgbe (or other driver of the card ) driver and load it again modprobe ixgbe max_vfs=X I can't reproduce this problem with my card. I've added an output from my server. Regardless, it looks like this is a VFIO or a libvirt issue (hostdev managed=True), rather than nova. In Nova, we are relying on this behaviour as well. I think we should try reproducing with max_vfs and if it doesn't work we should try consulting Alex Williamson from the kvm team. Vladik
This is my take away from the testing environment for Telefonica. - sysfs: Using a command like this "echo 4 > /sys/class/net/em1/device/sriov_numvfs" does not persist the number of VFs per network adapter. So after allocating/deallocating a PF, the VFs configured previously are gone. - ixgbe max_vfs parameter works well. Doing the same kind of test as before, the VFs are back, however, the parent interface has got its link state as DOWN what makes its children (VFs) not to be available for allocation. Ricky
Latest news! As per Vladik instructions, with the NetworkManager enabled, the parent interface comes back in UP state. So combination of ixgbe max_vfs parameter + NetworkManager service makes the job. Ricky
it still exist from my last check
(In reply to Eran Kuris from comment #22) > it still exist from my last check I think we were really expecting you to be a little more expansive here as to the ask, as the upstream comment Nir referred to above was yes that's right and this is expected behavior: https://bugs.launchpad.net/nova/+bug/1616769/comments/1
This was fixed via changes to tripleo/director. The user needs to enable network management using "nm_controlled: true" on RHEL or "hotplug: true" on CentOS on the relevant interfaces in the network environment files. Please see: https://bugzilla.redhat.com/show_bug.cgi?id=1392584 https://bugzilla.redhat.com/show_bug.cgi?id=1392585