RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1338795 - fcoeadm doesn't remove VLAN when stopped/restarted
Summary: fcoeadm doesn't remove VLAN when stopped/restarted
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: fcoe-utils
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Chris Leech
QA Contact: guazhang@redhat.com
URL:
Whiteboard:
Depends On: 1319873
Blocks: 1334745 1334748
TreeView+ depends on / blocked
 
Reported: 2016-05-23 12:39 UTC by Pavel Zhukov
Modified: 2021-09-03 13:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-15 07:41:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Pavel Zhukov 2016-05-23 12:39:05 UTC
Description of problem:
Once NIC is unconfigured (config under /etc/fcoe/cfg-<NIC> removed) and fcoe service restarted corresponding NIC still has fcoe capabilities enabled

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Configure one interface to use fcoe
2. restart fcoe services
3. remove configuration file
4. Restart the services

Actual results:
fcoe still configured. LUNs are visible

Expected results:
fcoe should be unconfigured 

Additional info:

Comment 3 Pavel Zhukov 2016-05-24 07:08:53 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=639466

Comment 4 Jose Castillo 2016-05-24 10:27:17 UTC
(In reply to Pavel Zhukov from comment #3)
> https://bugzilla.redhat.com/show_bug.cgi?id=639466

The way it was solved in RHEL 6, as I understand it, was by the following patch:

	 {
	-       local force=$1
	-
	-       pid=$($FCOEADM -p 2> /dev/null)
	-       if [ "$force" == "force" ]
	-       then
	-               action "Destroying any active fcoe interface/s"
	-               [ "$pid" ] && kill -HUP $pid
	-               modprobe -r $SUPPORTED_DRIVERS libfc
	+       if have_fcoe_root; then
	+               echo $"Possible FCoE root detected, not stopping FCoE."
	+               exit 1
			else
	-               [ "$pid" ] && kill -TERM $pid
	+               local force=$1
	+
	+               pid=$($FCOEADM -p 2> /dev/null)
	+               if [ "$force" == "force" ]
	+               then
	+                       action "Destroying any active fcoe interface/s"
	+                       [ "$pid" ] && kill -HUP $pid
	+                       sleep 3
	+                       # Destroy vports first (rhbz#903099)
	+                       for vport in $(ls /sys/class/fc_vports); do
	+                               echo 1 > /sys/class/fc_vports/${vport}/vport_delete
	+                       done
	+                       for iface in $($FCOEADM -i | grep -F 'Symbolic Name:' | \
	+                               sed 's/^.*over \([^\s]*\)$/\1/'); do
	+                                       echo $iface >/sys/module/libfcoe/parameters/destroy
	+                               done
	+                       sleep 3
	+                       modprobe -r $SUPPORTED_DRIVERS libfc
	+               else
	+                       [ "$pid" ] && kill -TERM $pid
	+               fi
	+
	+               action $"Stopping FCoE initiator service: "
	+
	+               rm -f ${LOCKFILE}
			fi
	-
	-       action $"Stopping FCoE initiator service: "
	-
	-       rm -f ${LOCKFILE}
	 }


And that code is executed when running "service fcoe stop force". In this case we are using the sysfs interface instead of 'fcoeadm -d <interface>', not sure why.

Now, for RHEL 7, when I run 'fcoeadm -d em1' and 'fcoeadm -d em2' as Chris mentioned, the instances are removed, but the vlan interfaces are still present in the output of 'ip l':

	[root@dell-per720-3 admin]# fcoeadm -i
		Description:      NetXtreme II BCM57800 1/10 Gigabit Ethernet
		Revision:         10
		Manufacturer:     Broadcom Corporation
		Serial Number:    C81F66F1C748
		Driver:           bnx2x 1.710.51-0
		Number of Ports:  1
	
			Symbolic Name:     bnx2fc (QLogic BCM57800) v2.4.2 over em1.200-fcoe
			OS Device Name:    host8
			Node Name:         0x2000C81F66F1C749
			Port Name:         0x2001C81F66F1C749
			FabricName:        0x100050EB1A2C8326
			Speed:             10 Gbit
			Supported Speed:   1 Gbit, 10 Gbit
			MaxFrameSize:      2048
			FC-ID (Port ID):   0x0102C0
			State:             Online
	
			Symbolic Name:     bnx2fc (QLogic BCM57800) v2.4.2 over em2.200-fcoe
			OS Device Name:    host9
			Node Name:         0x2000C81F66F1C74B
			Port Name:         0x2001C81F66F1C74B
			FabricName:        0x100050EB1A2C8326
			Speed:             10 Gbit
			Supported Speed:   1 Gbit, 10 Gbit
			MaxFrameSize:      2048
			FC-ID (Port ID):   0x010240
			State:             Online
	
	[root@dell-per720-3 admin]# fcoeadm -d em1
	
	[root@dell-per720-3 admin]# fcoeadm -d em2
	
	[root@dell-per720-3 admin]# fcoeadm -i
	No FCoE interfaces created.
	
	[root@dell-per720-3 admin]# ip l
	1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
		link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:48 brd ff:ff:ff:ff:ff:ff
	3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:4a brd ff:ff:ff:ff:ff:ff
	4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:4c brd ff:ff:ff:ff:ff:ff
	5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
		link/ether c8:1f:66:f1:c7:4e brd ff:ff:ff:ff:ff:ff
	6: em1.200-fcoe@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
		link/ether c8:1f:66:f1:c7:48 brd ff:ff:ff:ff:ff:ff
	7: em2.200-fcoe@em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
		link/ether c8:1f:66:f1:c7:4a brd ff:ff:ff:ff:ff:ff
	8: rhevm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
		link/ether c8:1f:66:f1:c7:4c brd ff:ff:ff:ff:ff:ff
	9: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
		link/ether 62:9c:71:28:3c:0d brd ff:ff:ff:ff:ff:ff

As a side note, if I run the steps in the patch for RHEL 6 shown above, 'fcoeadm -i' shows no interfaces anymore and the modules are not loaded, but em1.200 and em2.200 still appear in 'ip l':

	[root@dell-per720-3 admin]# fcoeadm -i
	No FCoE interfaces created.
	
	[root@dell-per720-3 admin]# lsmod | grep bnx
	bnx2x                 730273  0 
	ptp                    19231  1 bnx2x
	mdio                   13807  1 bnx2x
	libcrc32c              12644  1 bnx2x

	[root@dell-per720-3 admin]# ip l |grep em
	2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
	3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
	4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP mode DEFAULT qlen 1000
	5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
	8: em1.200@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 
	9: em2.200@em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT

So I'm not certain about how can we ensure the interfaces are removed, without 'ifconfig down', and that seems to me like a very horrible hack. 


A couple of things, slightly off-topic (sorry!):

* I found that 'fcoeadm -d' succeeds when the interfaces were set up via /etc/fcoe/cfg-<iface> config files. If the files are not present and I run 'fipvlan -acds' to set the interfaces up, running 'fcoeadm -d' gives me errors:

	[root@dell-per720-3 admin]# fcoeadm -d em1
	fcoeadm: Command failed
	Try 'fcoeadm --help' for more information.
	
	[root@dell-per720-3 admin]# fcoeadm -d em2
	fcoeadm: Command failed
	Try 'fcoeadm --help' for more information.
	
	And we get the following in syslog:
	
	[root@dell-per720-3 admin]# journalctl |tail
	May 24 08:50:51 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: error 0 Success
	May 24 08:50:51 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: em1 is not in port list.
	May 24 08:51:02 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: error 0 Success
	May 24 08:51:02 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: em2 is not in port list.

It is as if the 'fcoe_config.port' variable wasn't populated when the interfaces were set up with 'fipvlan'. Chris, is this expected, or a bug?

* When I stop the 'fcoe' service via 'systemctl stop fcoe', the interfaces are still shown in the output of 'fcoeadm -i'. Chris, do you think it may be worth adding a line in the systemd unit for ExecStop that calls 'fcoeadm -d', so when stopping the service it gets rid of the interfaces?

Comment 5 Rob Evers 2016-08-23 13:49:38 UTC
Chris, unless you have a fix ready, please move this to rhel7.4.  Thanks, Rob

Comment 6 Chris Leech 2016-08-24 16:34:40 UTC
deferring to 7.4

Comment 7 Chris Leech 2017-05-30 18:25:04 UTC
Clearing regression keyword, as this never changed in RHEL7 and is a difference from an optional forced functionality in RHEL6.

Comment 8 Chris Leech 2017-05-30 18:27:22 UTC
Is there any functional issue with FCoE shutdown as is, or is this purely "cosmetic" in that fabric logins and vlan devices are still visible?

It doesn't seem like the RHEL6 "force" shutdown would come into use by default, and I'd hate to replicate that as the new default behaviour and cause some other regression.

Comment 9 Pavel Zhukov 2017-05-31 04:45:39 UTC
(In reply to Chris Leech from comment #8)
> Is there any functional issue with FCoE shutdown as is, or is this purely
> "cosmetic" in that fabric logins and vlan devices are still visible?
Yes, there's functional issue with fcoe hook for vdsm https://github.com/oVirt/vdsm/blob/master/vdsm_hooks/fcoe/fcoe_before_network_setup.py
Because of this bug users of RHEL7 have to reboot hypervisor or delete vlans etc manually and some of them are not happy with this especially taking into account the fact it was working in RHEL6.

Comment 10 Chris Leech 2017-06-23 20:45:41 UTC
I spent some more time looking at this.  The el6 "stop force" code is mostly a workaround to manually removing NPIV ports, which from what I can tell isn't needed for el7.  That just leaves a difference of the default SIGTERM vs SIGHUP which will cause fcoemon to destroy active FCoE instances.

One option to switch to SIGHUP behavior would be to write a "/etc/systemd/system/fcoe.service.d/vdsm-override.conf" with 

 [Service]
 KillSignal=SIGHUP

Which should match el6 with "stop force"

To remove VLAN interfaces created when AUTO_VLAN is set, we'd need to go further.
There's no equivalent to passing parameters like force to a stop command, so it would require setting a variable in the environment file used for the fcoe.service /etc/sysconfig/fcoe.

I could add a shutdown script to use as the ExecStop, sending SIGTERM to fcoemon like systemd does today by default.  But it could check an environment variable and handle the SIGHUP termination and VLAN cleanup if it was set.

Something like the following might have enough checks to be safe.  It would still be up to vdsm or something to set the FORCE_CLEANUP=true in /etc/sysconfig/fcoe to trigger this non-default full cleanup.

---

#!/bin/bash

IP=/usr/sbin/ip
CFGDIR=/etc/fcoe

cleanup_vlans() {
	local link phys vlan 

	$IP -o link show type vlan | while IFS=':@ ' read -ra link; do
		vlan=${link[1]}
		phys=${link[2]}

		# does this look like the auto vlan format?
		[[ $vlan =~ $phys.[[:digit:]]+-fcoe ]] || continue

		# auto vlans don't have config files
		[ ! -e "$CFGDIR/cfg-$vlan" ] || continue

		# but the physical interface should
		[ -e "$CFGDIR/cfg-$phys" ] || continue

		# check that the physical port is configured for auto vlans
		( . "$CFGDIR/cfg-$phys" && [ "$AUTO_VLAN" == "yes" ] ) || continue

		echo "removing FCoE VLAN device $vlan"
		$IP link delete "$vlan"
	done
}

pid=$(pidof fcoemon)

if [ -v FORCE_CLEANUP ]; then
	[ "$pid" ] && kill -HUP "$pid"
	cleanup_vlans
else
	[ "$pid" ] && kill -TERM "$pid"
fi

Comment 12 RHEL Program Management 2020-12-15 07:41:35 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.