Description of problem: When assigning neutron port to PF (neutron port type - direct-physical ) the vm is booted and active but there is errors in sriov agent log 2016-08-24 13:47:11.702 1484 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-028806d4-c857-4956-9198-230cb0de5e18 - - - - -] Agent rpc_loop - iteration:391 started daemon_loop /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:367 2016-08-24 13:47:11.703 1484 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-028806d4-c857-4956-9198-230cb0de5e18 - - - - -] Agent out of sync with plugin! 2016-08-24 13:47:11.703 1484 DEBUG neutron.agent.linux.utils [req-028806d4-c857-4956-9198-230cb0de5e18 - - - - -] Running command (rootwrap daemon): ['ip', 'link', 'show', 'enp5s0f1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:99 2016-08-24 13:47:11.715 1484 ERROR neutron.agent.linux.utils [req-028806d4-c857-4956-9198-230cb0de5e18 - - - - -] Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "enp5s0f1" does not exist. 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib [req-028806d4-c857-4956-9198-230cb0de5e18 - - - - -] Failed executing ip command 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib Traceback (most recent call last): 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 83, in get_assigned_macs 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib out = self._as_root([], "link", ("show", self.dev_name)) 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 95, in _as_root 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib log_fail_as_error=self.log_fail_as_error) 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 104, in _execute 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib log_fail_as_error=log_fail_as_error) 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 138, in execute 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib raise RuntimeError(msg) 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "enp5s0f1" does not exist. 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib 2016-08-24 13:47:11.715 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-028806d4-c857-4956-9198-230cb0de5e18 - - - - -] Error in agent loop. Devices info: {} 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 380, in daemon_loop 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return f(*args, **kwargs) 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 199, in scan_devices 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 277, in get_assigned_devices_info 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 150, in get_assigned_devices_info 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent list(vf_to_pci_slot_mapping.keys())) 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 87, in get_assigned_macs 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent reason=e) 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent IpCommandDeviceError: ip command failed on device enp5s0f1: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "enp5s0f1" does not exist. 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 2016-08-24 13:47:11.717 1484 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agen Version-Release number of selected component (if applicable): RHOS-10 [root@controller1 ~(keystone_admin)]# rpm -qa |grep neutron python-neutron-lib-0.3.0-0.20160803002107.405f896.el7ost.noarch openstack-neutron-9.0.0-0.20160817153328.b9169e3.el7ost.noarch puppet-neutron-9.1.0-0.20160813031056.7cf5e07.el7ost.noarch python-neutron-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-lbaas-9.0.0-0.20160816191643.4e7301e.el7ost.noarch python-neutron-fwaas-9.0.0-0.20160817171450.e1ac68f.el7ost.noarch python-neutron-lbaas-9.0.0-0.20160816191643.4e7301e.el7ost.noarch openstack-neutron-ml2-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-metering-agent-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-openvswitch-9.0.0-0.20160817153328.b9169e3.el7ost.noarch python-neutronclient-5.0.0-0.20160812094704.ec20f7f.el7ost.noarch openstack-neutron-common-9.0.0-0.20160817153328.b9169e3.el7ost.noarch openstack-neutron-fwaas-9.0.0-0.20160817171450.e1ac68f.el7ost.noarch [root@controller1 ~(keystone_admin)]# rpm -qa |grep nova python-novaclient-5.0.1-0.20160724130722.6b11a1c.el7ost.noarch openstack-nova-api-14.0.0-0.20160817225441.04cef3b.el7ost.noarch puppet-nova-9.1.0-0.20160813014843.b94f0a0.el7ost.noarch openstack-nova-common-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-novncproxy-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-conductor-14.0.0-0.20160817225441.04cef3b.el7ost.noarch python-nova-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-scheduler-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-cert-14.0.0-0.20160817225441.04cef3b.el7ost.noarch openstack-nova-console-14.0.0-0.20160817225441.04cef3b.el7ost.noarch How reproducible: Steps to Reproduce: 1.deploy SRIOV setup and set PF functionality you can use guide : https://docs.google.com/document/d/1qQbJlLI1hSlE4uwKpmVd0BoGSDBd8Z0lTzx5itQ6WL0/edit# 2.boot vm & assign it to PF 3.check in compute node sriov agent log Actual results: error Expected results: no errors Additional info:
Following the upstream Launchpad bug, this is only relevant if we intend to allow users to create both VFs and PFs on the same node. Is that functionality relevant and required? If so this RHBZ should indeed block the RFE, otherwise it shouldn't, I'll leave that decision up to you.
(In reply to Assaf Muller from comment #2) > Following the upstream Launchpad bug, this is only relevant if we intend to > allow users to create both VFs and PFs on the same node. Is that > functionality relevant and required? If so this RHBZ should indeed block the > RFE, otherwise it shouldn't, I'll leave that decision up to you. Eran/Assaf: I am not sure I am following. Once you enable PF, no VFs will be available to do SR-IOV. And once you enable a single VF, the PF won’t be available anymore in the card. So what do you mean by "allow users to create both VFs and PFs on the same node"? Are you referring to using VFs and PFs on the same host but from separate network adapters? Thanks, Nir
Yes Nir this is exactly what I meant. Also there is scenario that Network adapter is already configured with VFs and after that I want to change it to work with all PF after I disassociate all ports from VFs.
(In reply to Eran Kuris from comment #4) > Yes Nir this is exactly what I meant. > Also there is scenario that Network adapter is already configured with VFs > and after that I want to change it to work with all PF after I disassociate > all ports from VFs. And just to understand this bug - you are saying that this is working, but the agent log shows errors? Thanks, Nir
(In reply to Nir Yechiel from comment #5) > (In reply to Eran Kuris from comment #4) > > Yes Nir this is exactly what I meant. > > Also there is scenario that Network adapter is already configured with VFs > > and after that I want to change it to work with all PF after I disassociate > > all ports from VFs. > > And just to understand this bug - you are saying that this is working, but > the agent log shows errors? > > Thanks, > Nir You are correct
Thanks for the clarifications, Eran. @Assaf, the use case is valid and we should look at the upstream bug. But since the functionally is there this should not be treated as a high priority bug or block the RHOSP 10 RFE, IMO. Thanks, Nir
Hello guys, I'm having the same stacktrace from the SRIOV agent, but I think there is indeed an issue. Let's say a compute node has got 2 network adapters (em1 and em2) with its correspondent VFs configured. If you start a VM with a direct-physical binding it will take one of these NICs. At that moment, SRIOV agent starts to show those ERROR messages including the "device dictionary" completely empty. In consequence, you cannot allocate VMs with VFs eventhough there is still another NIC available. IMHO, until this issue is not fixed, coexistence of VFs/PFs won't be usable. @Nir, could you reconsider the priority of this issue?? Thanks!
(In reply to Ricardo Noriega from comment #8) > Hello guys, > > I'm having the same stacktrace from the SRIOV agent, but I think there is > indeed an issue. > > Let's say a compute node has got 2 network adapters (em1 and em2) with its > correspondent VFs configured. If you start a VM with a direct-physical > binding it will take one of these NICs. At that moment, SRIOV agent starts > to show those ERROR messages including the "device dictionary" completely > empty. > > In consequence, you cannot allocate VMs with VFs eventhough there is still > another NIC available. > > IMHO, until this issue is not fixed, coexistence of VFs/PFs won't be usable. > > @Nir, could you reconsider the priority of this issue?? Thanks! You'll have to excuse my cheekyness but I'm going to steal your comment and post it in the Launchpad bug.
(In reply to Assaf Muller from comment #9) > (In reply to Ricardo Noriega from comment #8) > > Hello guys, > > > > I'm having the same stacktrace from the SRIOV agent, but I think there is > > indeed an issue. > > > > Let's say a compute node has got 2 network adapters (em1 and em2) with its > > correspondent VFs configured. If you start a VM with a direct-physical > > binding it will take one of these NICs. At that moment, SRIOV agent starts > > to show those ERROR messages including the "device dictionary" completely > > empty. > > > > In consequence, you cannot allocate VMs with VFs eventhough there is still > > another NIC available. > > > > IMHO, until this issue is not fixed, coexistence of VFs/PFs won't be usable. > > > > @Nir, could you reconsider the priority of this issue?? Thanks! > > You'll have to excuse my cheekyness but I'm going to steal your comment and > post it in the Launchpad bug. Your cheekyness is excused. No prob.
With: https://review.openstack.org/#/c/360447 and a modified version of (as per comments provided in review) https://review.openstack.org/#/c/377781/ A udev rule to call a script to set the vf count I got really close to co-existence. In the absence of NetworkManager, I need to add a script to be called from /proc/sys/kernel/hotplug to set the SR-IOV's link status to "up" before I could allocate VFs. At the moment, I can flip back and forth using a device as a PF or as VFs. With both PFs enabled, I can also use one for VFs and the other as a PF. We'll have to examine if this is route that. Obviously adding supporting scripting, etc. is outside the scope this bug and what we can expect from a user, but I mention it to outline a.) how I tested the patches and b.) what we might need to add to tripleo for supporting this functionality. Now all we need to do is to help these patches along to get merged u/s.
I'm moving this back from 11 to 10.z. If we find that we cannot backport the fix to 10 for whatever reason we'll think about this again.
stable/newton patch is proposed to upstream https://review.openstack.org/#/c/442088
It was verified in puddle 2017-06-15.5 The /var/log/neutron/sriov-nic-agent.log is clean without any errors.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1594