Bug 1338795

Summary: fcoeadm doesn't remove VLAN when stopped/restarted
Product: Red Hat Enterprise Linux 7 Reporter: Pavel Zhukov <pzhukov>
Component: fcoe-utilsAssignee: Chris Leech <cleech>
Status: CLOSED WONTFIX QA Contact: guazhang <guazhang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: cleech, guazhang, jcastillo, pzhukov, revers
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-15 07:41:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1319873    
Bug Blocks: 1334745, 1334748    

Description Pavel Zhukov 2016-05-23 12:39:05 UTC
Description of problem:
Once NIC is unconfigured (config under /etc/fcoe/cfg-<NIC> removed) and fcoe service restarted corresponding NIC still has fcoe capabilities enabled

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Configure one interface to use fcoe
2. restart fcoe services
3. remove configuration file
4. Restart the services

Actual results:
fcoe still configured. LUNs are visible

Expected results:
fcoe should be unconfigured 

Additional info:

Comment 3 Pavel Zhukov 2016-05-24 07:08:53 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=639466

Comment 4 Jose Castillo 2016-05-24 10:27:17 UTC
(In reply to Pavel Zhukov from comment #3)
> https://bugzilla.redhat.com/show_bug.cgi?id=639466

The way it was solved in RHEL 6, as I understand it, was by the following patch:

	 {
	-       local force=$1
	-
	-       pid=$($FCOEADM -p 2> /dev/null)
	-       if [ "$force" == "force" ]
	-       then
	-               action "Destroying any active fcoe interface/s"
	-               [ "$pid" ] && kill -HUP $pid
	-               modprobe -r $SUPPORTED_DRIVERS libfc
	+       if have_fcoe_root; then
	+               echo $"Possible FCoE root detected, not stopping FCoE."
	+               exit 1
			else
	-               [ "$pid" ] && kill -TERM $pid
	+               local force=$1
	+
	+               pid=$($FCOEADM -p 2> /dev/null)
	+               if [ "$force" == "force" ]
	+               then
	+                       action "Destroying any active fcoe interface/s"
	+                       [ "$pid" ] && kill -HUP $pid
	+                       sleep 3
	+                       # Destroy vports first (rhbz#903099)
	+                       for vport in $(ls /sys/class/fc_vports); do
	+                               echo 1 > /sys/class/fc_vports/${vport}/vport_delete
	+                       done
	+                       for iface in $($FCOEADM -i | grep -F 'Symbolic Name:' | \
	+                               sed 's/^.*over \([^\s]*\)$/\1/'); do
	+                                       echo $iface >/sys/module/libfcoe/parameters/destroy
	+                               done
	+                       sleep 3
	+                       modprobe -r $SUPPORTED_DRIVERS libfc
	+               else
	+                       [ "$pid" ] && kill -TERM $pid
	+               fi
	+
	+               action $"Stopping FCoE initiator service: "
	+
	+               rm -f ${LOCKFILE}
			fi
	-
	-       action $"Stopping FCoE initiator service: "
	-
	-       rm -f ${LOCKFILE}
	 }


And that code is executed when running "service fcoe stop force". In this case we are using the sysfs interface instead of 'fcoeadm -d <interface>', not sure why.

Now, for RHEL 7, when I run 'fcoeadm -d em1' and 'fcoeadm -d em2' as Chris mentioned, the instances are removed, but the vlan interfaces are still present in the output of 'ip l':

	[root@dell-per720-3 admin]# fcoeadm -i
		Description:      NetXtreme II BCM57800 1/10 Gigabit Ethernet
		Revision:         10
		Manufacturer:     Broadcom Corporation
		Serial Number:    C81F66F1C748
		Driver:           bnx2x 1.710.51-0
		Number of Ports:  1
	
			Symbolic Name:     bnx2fc (QLogic BCM57800) v2.4.2 over em1.200-fcoe
			OS Device Name:    host8
			Node Name:         0x2000C81F66F1C749
			Port Name:         0x2001C81F66F1C749
			FabricName:        0x100050EB1A2C8326
			Speed:             10 Gbit
			Supported Speed:   1 Gbit, 10 Gbit
			MaxFrameSize:      2048
			FC-ID (Port ID):   0x0102C0
			State:             Online
	
			Symbolic Name:     bnx2fc (QLogic BCM57800) v2.4.2 over em2.200-fcoe
			OS Device Name:    host9
			Node Name:         0x2000C81F66F1C74B
			Port Name:         0x2001C81F66F1C74B
			FabricName:        0x100050EB1A2C8326
			Speed:             10 Gbit
			Supported Speed:   1 Gbit, 10 Gbit
			MaxFrameSize:      2048
			FC-ID (Port ID):   0x010240
			State:             Online
	
	[root@dell-per720-3 admin]# fcoeadm -d em1
	
	[root@dell-per720-3 admin]# fcoeadm -d em2
	
	[root@dell-per720-3 admin]# fcoeadm -i
	No FCoE interfaces created.
	
	[root@dell-per720-3 admin]# ip l
	1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
		link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:48 brd ff:ff:ff:ff:ff:ff
	3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:4a brd ff:ff:ff:ff:ff:ff
	4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:4c brd ff:ff:ff:ff:ff:ff
	5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:4e brd ff:ff:ff:ff:ff:ff
	6: em1.200-fcoe@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
		link/ether c8:1f:66:f1:c7:48 brd ff:ff:ff:ff:ff:ff
	7: em2.200-fcoe@em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
		link/ether c8:1f:66:f1:c7:4a brd ff:ff:ff:ff:ff:ff
	8: rhevm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
		link/ether c8:1f:66:f1:c7:4c brd ff:ff:ff:ff:ff:ff
	9: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
		link/ether 62:9c:71:28:3c:0d brd ff:ff:ff:ff:ff:ff

As a side note, if I run the steps in the patch for RHEL 6 shown above, 'fcoeadm -i' shows no interfaces anymore and the modules are not loaded, but em1.200 and em2.200 still appear in 'ip l':

	[root@dell-per720-3 admin]# fcoeadm -i
	No FCoE interfaces created.
	
	[root@dell-per720-3 admin]# lsmod | grep bnx
	bnx2x                 730273  0 
	ptp                    19231  1 bnx2x
	mdio                   13807  1 bnx2x
	libcrc32c              12644  1 bnx2x

	[root@dell-per720-3 admin]# ip l |grep em
	2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
	3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
	4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP mode DEFAULT qlen 1000
	5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
	8: em1.200@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
	9: em2.200@em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT

So I'm not certain about how can we ensure the interfaces are removed, without 'ifconfig down', and that seems to me like a very horrible hack. 


A couple of things, slightly off-topic (sorry!):

* I found that 'fcoeadm -d' succeeds when the interfaces were set up via /etc/fcoe/cfg-<iface> config files. If the files are not present and I run 'fipvlan -acds' to set the interfaces up, running 'fcoeadm -d' gives me errors:

	[root@dell-per720-3 admin]# fcoeadm -d em1
	fcoeadm: Command failed
	Try 'fcoeadm --help' for more information.
	
	[root@dell-per720-3 admin]# fcoeadm -d em2
	fcoeadm: Command failed
	Try 'fcoeadm --help' for more information.
	
	And we get the following in syslog:
	
	[root@dell-per720-3 admin]# journalctl |tail
	May 24 08:50:51 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: error 0 Success
	May 24 08:50:51 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: em1 is not in port list.
	May 24 08:51:02 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: error 0 Success
	May 24 08:51:02 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: em2 is not in port list.

It is as if the 'fcoe_config.port' variable wasn't populated when the interfaces were set up with 'fipvlan'. Chris, is this expected, or a bug?

* When I stop the 'fcoe' service via 'systemctl stop fcoe', the interfaces are still shown in the output of 'fcoeadm -i'. Chris, do you think it may be worth adding a line in the systemd unit for ExecStop that calls 'fcoeadm -d', so when stopping the service it gets rid of the interfaces?

Comment 5 Rob Evers 2016-08-23 13:49:38 UTC
Chris, unless you have a fix ready, please move this to rhel7.4.  Thanks, Rob

Comment 6 Chris Leech 2016-08-24 16:34:40 UTC
deferring to 7.4

Comment 7 Chris Leech 2017-05-30 18:25:04 UTC
Clearing regression keyword, as this never changed in RHEL7 and is a difference from an optional forced functionality in RHEL6.

Comment 8 Chris Leech 2017-05-30 18:27:22 UTC
Is there any functional issue with FCoE shutdown as is, or is this purely "cosmetic" in that fabric logins and vlan devices are still visible?

It doesn't seem like the RHEL6 "force" shutdown would come into use by default, and I'd hate to replicate that as the new default behaviour and cause some other regression.

Comment 9 Pavel Zhukov 2017-05-31 04:45:39 UTC
(In reply to Chris Leech from comment #8)
> Is there any functional issue with FCoE shutdown as is, or is this purely
> "cosmetic" in that fabric logins and vlan devices are still visible?
Yes, there's functional issue with fcoe hook for vdsm https://github.com/oVirt/vdsm/blob/master/vdsm_hooks/fcoe/fcoe_before_network_setup.py
Because of this bug users of RHEL7 have to reboot hypervisor or delete vlans etc manually and some of them are not happy with this especially taking into account the fact it was working in RHEL6.

Comment 10 Chris Leech 2017-06-23 20:45:41 UTC
I spent some more time looking at this.  The el6 "stop force" code is mostly a workaround to manually removing NPIV ports, which from what I can tell isn't needed for el7.  That just leaves a difference of the default SIGTERM vs SIGHUP which will cause fcoemon to destroy active FCoE instances.

One option to switch to SIGHUP behavior would be to write a "/etc/systemd/system/fcoe.service.d/vdsm-override.conf" with 

 [Service]
 KillSignal=SIGHUP

Which should match el6 with "stop force"

To remove VLAN interfaces created when AUTO_VLAN is set, we'd need to go further.
There's no equivalent to passing parameters like force to a stop command, so it would require setting a variable in the environment file used for the fcoe.service /etc/sysconfig/fcoe.

I could add a shutdown script to use as the ExecStop, sending SIGTERM to fcoemon like systemd does today by default.  But it could check an environment variable and handle the SIGHUP termination and VLAN cleanup if it was set.

Something like the following might have enough checks to be safe.  It would still be up to vdsm or something to set the FORCE_CLEANUP=true in /etc/sysconfig/fcoe to trigger this non-default full cleanup.

---

#!/bin/bash

IP=/usr/sbin/ip
CFGDIR=/etc/fcoe

cleanup_vlans() {
	local link phys vlan 

	$IP -o link show type vlan | while IFS=':@ ' read -ra link; do
		vlan=${link[1]}
		phys=${link[2]}

		# does this look like the auto vlan format?
		[[ $vlan =~ $phys.[[:digit:]]+-fcoe ]] || continue

		# auto vlans don't have config files
		[ ! -e "$CFGDIR/cfg-$vlan" ] || continue

		# but the physical interface should
		[ -e "$CFGDIR/cfg-$phys" ] || continue

		# check that the physical port is configured for auto vlans
		( . "$CFGDIR/cfg-$phys" && [ "$AUTO_VLAN" == "yes" ] ) || continue

		echo "removing FCoE VLAN device $vlan"
		$IP link delete "$vlan"
	done
}

pid=$(pidof fcoemon)

if [ -v FORCE_CLEANUP ]; then
	[ "$pid" ] && kill -HUP "$pid"
	cleanup_vlans
else
	[ "$pid" ] && kill -TERM "$pid"
fi

Comment 12 RHEL Program Management 2020-12-15 07:41:35 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.