Bug 1656978

Summary: [Neutron] - VF link-state needs to be set to Auto
Product: Red Hat OpenStack Reporter: Marc Methot <mmethot>
Component: openstack-neutronAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED ERRATA QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: akaris, amuller, bcafarel, chrisw, coldford, cory.bannister, djuran, eelena, ekuris, fgadkano, jlibosva, jniu, jthomas, kfryklun, majopela, njohnston, pliu, ralonsoh, srevivo, tfreger, vcojot, weiyongjun
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-12.0.6-9.el7ost Doc Type: Release Note
Doc Text:
Previously, the Neutron SR-IOV agent set two possible states for virtual functions (VFs), `enable` or `disable`, which forced the VF link state regardless of the physical function (PF) link state. With this update, the Neutron SR-IOV agent sets VFs to `auto` or `disable`. The `auto` state replicates the PF `up` or `down` automatically. As a result, if the PF is in the `down` state, the VF does not transmit or receive, even with other VFs in the same embedded switch (NIC). NOTE: This behavior is not standard and depends on the NIC vendor implementation. Check the driver manual for the actual behavior of a VF in the `auto` state when the PF is `down`.
Story Points: ---
Clone Of: 1476160
: 1734490 1735676 (view as bug list) Environment:
Last Closed: 2019-09-03 16:53:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1476160    
Bug Blocks: 1476900, 1500557, 1734490, 1735676, 1905791    

Description Marc Methot 2018-12-06 19:11:01 UTC
+++ This bug was initially created as a clone of Bug #1476160 +++

Description

  Under the SRIOV scenario, when a VM is instantiated, a VF is associated with a port. The admin_state_up property of the port will determine the link-state of the associated VF:
    admin_state_up : True   -->  VF link-state: Enable (always on)
    admin_state_up : False  -->  VF link-state: Disable (always off)
    -- No 3rd option --

  However, from the perspective of high availability, VF link-state need to be set to Auto, so that the VF link state can follow that of PF. In case of PF link goes down, VF link state will go down to trigger failover in the VM. So a 3rd option of admin_state_up is required with which the VF link-state can be set to Auto.

Upstream RFE: https://bugs.launchpad.net/neutron/+bug/1722720

Comment 9 Vincent S. Cojot 2019-07-29 19:42:48 UTC
Hi,
For the record, I've successfully tested the following workaround of using a libvirt hook on OSP13z6:

On each compute, save the following script as: " /etc/libvirt/hooks/qemu" and then 'docker restart nova_libvirt':

$ cat libvirt_hook_qemu.sh 
#!/bin/bash
#
# Install:
# 1) mkdir /etc/libvirt/hooks
# 2) copy this script as /etc/libvirt/hooks/qemu
# 3) chmod 755 /etc/libvirt/hooks/qemu
# 4) docker restart nova_libvirt

# Logging
VLOG=/var/log/libvirt/libvirt_hooks_qemu.log ; exec &> >(tee -a "${VLOG}")

# Global vars
sriov_nics="p2p1 p2p2 p4p1 p4p2"

#
echo "(II) Time is: $(date)"

# Only kick in after VM has started (argv[2] == "started")
if [[ $2 = "started" ]]; then
	for mynic in ${sriov_nics}
	do
		if [ -d /sys/class/net/${mynic} ]; then
			enabled_vfs=$(/sbin/ip link show dev ${mynic} |awk '{ if (( $1 == "vf" ) && ($0 ~ /enable/)) print $2 }'|xargs)
			for myvf in ${enabled_vfs}
			do
				echo "(II) Will run: /sbin/ip link set dev ${mynic} vf ${myvf} state auto"
				/sbin/ip link set dev ${mynic} vf ${myvf} state auto
			done
		fi
	done
else
	echo "(II) Nothing to do (no VFs with link-state 'enable')"
fi
exit 0

Comment 14 Vincent S. Cojot 2019-07-31 16:05:40 UTC
Hi,
We're currently using this new code:

#!/bin/bash
#
# Description: This scripts gets called by libvirt when a VM is 'started'.
# It will explore SRIOV VFs for the whole compute and reset the link-state
# of those VFs from 'enable' (force state) to 'auto' (VF state follows PF state).
#

# Explore NICs to find SRIOV active NICs (numvfs > 0)
sriov_nics=""
for that_nic in $(echo /sys/class/net/*)l
do
	if [[ -f ${that_nic}/device/sriov_numvfs ]]; then
		if [[ $(cat ${that_nic}/device/sriov_numvfs) -gt 0 ]]; then
			sriov_nics+="$(basename ${that_nic}) "
		fi
	fi
done

# Only kick in after VM has started (argv[2] == "started")
if [[ $2 = "started" ]]; then
	logger -p syslog.info "$0 found the following PFs: ${sriov_nics}"
	for mynic in ${sriov_nics}
	do
		if [ -d /sys/class/net/${mynic} ]; then
			enabled_vfs=$(/sbin/ip link show dev ${mynic} |awk '{ if (( $1 == "vf" ) && ($0 ~ /enable/)) print $2 }'|xargs)
			if [[ ${enabled_vfs} = "" ]]; then
				logger -p syslog.info "$0 Nothing to do!: No VFs in link-state 'enable'"
			else
				for myvf in ${enabled_vfs}
				do
					logger -p syslog.info "$0 running: /sbin/ip link set dev ${mynic} vf ${myvf} state auto"
					/sbin/ip link set dev ${mynic} vf ${myvf} state auto
				done
			fi
		fi
	done
fi
exit 0

Comment 32 Eran Kuris 2019-08-20 09:36:40 UTC
fix verified: 

63: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:7f:28:b8 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC f6:3a:15:da:fb:6d, spoof checking on, link-state auto, trust off, query_rss off
    vf 1 MAC f6:84:fc:e9:dd:fd, spoof checking on, link-state auto, trust off, query_rss off
    vf 2 MAC 9a:8f:6c:31:39:8b, spoof checking on, link-state auto, trust off, query_rss off
    vf 3 MAC de:8b:cb:cc:d4:82, spoof checking on, link-state auto, trust off, query_rss off
    vf 4 MAC fa:16:3e:d1:db:c8, vlan 227, spoof checking on, link-state auto, trust off, query_rss off
C[root@computesriov-0 ~]# rpm -qa | grep neutron 
python2-neutronclient-6.7.0-1.el7ost.noarch
puppet-neutron-12.4.1-7.el7ost.noarch
python-neutron-lbaas-12.0.1-0.20181019202917.b9b6b6a.el7ost.noarch
openstack-neutron-linuxbridge-12.0.6-9.el7ost.noarch
python2-neutron-lib-1.13.0-1.el7ost.noarch
python-neutron-12.0.6-9.el7ost.noarch
openstack-neutron-12.0.6-9.el7ost.noarch
openstack-neutron-l2gw-agent-12.0.2-0.20180412115803.a9f8009.el7ost.noarch
openstack-neutron-sriov-nic-agent-12.0.6-9.el7ost.noarch
openstack-neutron-ml2-12.0.6-9.el7ost.noarch
openstack-neutron-metering-agent-12.0.6-9.el7ost.noarch
openstack-neutron-openvswitch-12.0.6-9.el7ost.noarch
openstack-neutron-lbaas-ui-4.0.1-0.20181115043347.7f2010d.el7ost.noarch
openstack-neutron-common-12.0.6-9.el7ost.noarch
openstack-neutron-lbaas-12.0.1-0.20181019202917.b9b6b6a.el7ost.noarch
OpenStack/13.0-RHEL-7/2019-08-13.1

Comment 50 errata-xmlrpc 2019-09-03 16:53:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2629