Bug 1338795
Summary: | fcoeadm doesn't remove VLAN when stopped/restarted | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pavel Zhukov <pzhukov> |
Component: | fcoe-utils | Assignee: | Chris Leech <cleech> |
Status: | CLOSED WONTFIX | QA Contact: | guazhang <guazhang> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | cleech, guazhang, jcastillo, pzhukov, revers |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-12-15 07:41:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1319873 | ||
Bug Blocks: | 1334745, 1334748 |
Description
Pavel Zhukov
2016-05-23 12:39:05 UTC
(In reply to Pavel Zhukov from comment #3) > https://bugzilla.redhat.com/show_bug.cgi?id=639466 The way it was solved in RHEL 6, as I understand it, was by the following patch: { - local force=$1 - - pid=$($FCOEADM -p 2> /dev/null) - if [ "$force" == "force" ] - then - action "Destroying any active fcoe interface/s" - [ "$pid" ] && kill -HUP $pid - modprobe -r $SUPPORTED_DRIVERS libfc + if have_fcoe_root; then + echo $"Possible FCoE root detected, not stopping FCoE." + exit 1 else - [ "$pid" ] && kill -TERM $pid + local force=$1 + + pid=$($FCOEADM -p 2> /dev/null) + if [ "$force" == "force" ] + then + action "Destroying any active fcoe interface/s" + [ "$pid" ] && kill -HUP $pid + sleep 3 + # Destroy vports first (rhbz#903099) + for vport in $(ls /sys/class/fc_vports); do + echo 1 > /sys/class/fc_vports/${vport}/vport_delete + done + for iface in $($FCOEADM -i | grep -F 'Symbolic Name:' | \ + sed 's/^.*over \([^\s]*\)$/\1/'); do + echo $iface >/sys/module/libfcoe/parameters/destroy + done + sleep 3 + modprobe -r $SUPPORTED_DRIVERS libfc + else + [ "$pid" ] && kill -TERM $pid + fi + + action $"Stopping FCoE initiator service: " + + rm -f ${LOCKFILE} fi - - action $"Stopping FCoE initiator service: " - - rm -f ${LOCKFILE} } And that code is executed when running "service fcoe stop force". In this case we are using the sysfs interface instead of 'fcoeadm -d <interface>', not sure why. Now, for RHEL 7, when I run 'fcoeadm -d em1' and 'fcoeadm -d em2' as Chris mentioned, the instances are removed, but the vlan interfaces are still present in the output of 'ip l': [root@dell-per720-3 admin]# fcoeadm -i Description: NetXtreme II BCM57800 1/10 Gigabit Ethernet Revision: 10 Manufacturer: Broadcom Corporation Serial Number: C81F66F1C748 Driver: bnx2x 1.710.51-0 Number of Ports: 1 Symbolic Name: bnx2fc (QLogic BCM57800) v2.4.2 over em1.200-fcoe OS Device Name: host8 Node Name: 0x2000C81F66F1C749 Port Name: 0x2001C81F66F1C749 FabricName: 0x100050EB1A2C8326 Speed: 10 Gbit Supported Speed: 1 Gbit, 10 Gbit MaxFrameSize: 2048 FC-ID (Port ID): 0x0102C0 State: Online Symbolic Name: bnx2fc (QLogic BCM57800) v2.4.2 over em2.200-fcoe OS Device Name: host9 Node Name: 0x2000C81F66F1C74B Port Name: 0x2001C81F66F1C74B FabricName: 0x100050EB1A2C8326 Speed: 10 Gbit Supported Speed: 1 Gbit, 10 Gbit MaxFrameSize: 2048 FC-ID (Port ID): 0x010240 State: Online [root@dell-per720-3 admin]# fcoeadm -d em1 [root@dell-per720-3 admin]# fcoeadm -d em2 [root@dell-per720-3 admin]# fcoeadm -i No FCoE interfaces created. [root@dell-per720-3 admin]# ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether c8:1f:66:f1:c7:48 brd ff:ff:ff:ff:ff:ff 3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether c8:1f:66:f1:c7:4a brd ff:ff:ff:ff:ff:ff 4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP mode DEFAULT qlen 1000 link/ether c8:1f:66:f1:c7:4c brd ff:ff:ff:ff:ff:ff 5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether c8:1f:66:f1:c7:4e brd ff:ff:ff:ff:ff:ff 6: em1.200-fcoe@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT link/ether c8:1f:66:f1:c7:48 brd ff:ff:ff:ff:ff:ff 7: em2.200-fcoe@em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT link/ether c8:1f:66:f1:c7:4a brd ff:ff:ff:ff:ff:ff 8: rhevm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT link/ether c8:1f:66:f1:c7:4c brd ff:ff:ff:ff:ff:ff 9: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether 62:9c:71:28:3c:0d brd ff:ff:ff:ff:ff:ff As a side note, if I run the steps in the patch for RHEL 6 shown above, 'fcoeadm -i' shows no interfaces anymore and the modules are not loaded, but em1.200 and em2.200 still appear in 'ip l': [root@dell-per720-3 admin]# fcoeadm -i No FCoE interfaces created. [root@dell-per720-3 admin]# lsmod | grep bnx bnx2x 730273 0 ptp 19231 1 bnx2x mdio 13807 1 bnx2x libcrc32c 12644 1 bnx2x [root@dell-per720-3 admin]# ip l |grep em 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP mode DEFAULT qlen 1000 5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 8: em1.200@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT 9: em2.200@em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT So I'm not certain about how can we ensure the interfaces are removed, without 'ifconfig down', and that seems to me like a very horrible hack. A couple of things, slightly off-topic (sorry!): * I found that 'fcoeadm -d' succeeds when the interfaces were set up via /etc/fcoe/cfg-<iface> config files. If the files are not present and I run 'fipvlan -acds' to set the interfaces up, running 'fcoeadm -d' gives me errors: [root@dell-per720-3 admin]# fcoeadm -d em1 fcoeadm: Command failed Try 'fcoeadm --help' for more information. [root@dell-per720-3 admin]# fcoeadm -d em2 fcoeadm: Command failed Try 'fcoeadm --help' for more information. And we get the following in syslog: [root@dell-per720-3 admin]# journalctl |tail May 24 08:50:51 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: error 0 Success May 24 08:50:51 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: em1 is not in port list. May 24 08:51:02 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: error 0 Success May 24 08:51:02 dell-per720-3.gsslab.rdu2.redhat.com fcoemon[15611]: em2 is not in port list. It is as if the 'fcoe_config.port' variable wasn't populated when the interfaces were set up with 'fipvlan'. Chris, is this expected, or a bug? * When I stop the 'fcoe' service via 'systemctl stop fcoe', the interfaces are still shown in the output of 'fcoeadm -i'. Chris, do you think it may be worth adding a line in the systemd unit for ExecStop that calls 'fcoeadm -d', so when stopping the service it gets rid of the interfaces? Chris, unless you have a fix ready, please move this to rhel7.4. Thanks, Rob deferring to 7.4 Clearing regression keyword, as this never changed in RHEL7 and is a difference from an optional forced functionality in RHEL6. Is there any functional issue with FCoE shutdown as is, or is this purely "cosmetic" in that fabric logins and vlan devices are still visible? It doesn't seem like the RHEL6 "force" shutdown would come into use by default, and I'd hate to replicate that as the new default behaviour and cause some other regression. (In reply to Chris Leech from comment #8) > Is there any functional issue with FCoE shutdown as is, or is this purely > "cosmetic" in that fabric logins and vlan devices are still visible? Yes, there's functional issue with fcoe hook for vdsm https://github.com/oVirt/vdsm/blob/master/vdsm_hooks/fcoe/fcoe_before_network_setup.py Because of this bug users of RHEL7 have to reboot hypervisor or delete vlans etc manually and some of them are not happy with this especially taking into account the fact it was working in RHEL6. I spent some more time looking at this. The el6 "stop force" code is mostly a workaround to manually removing NPIV ports, which from what I can tell isn't needed for el7. That just leaves a difference of the default SIGTERM vs SIGHUP which will cause fcoemon to destroy active FCoE instances. One option to switch to SIGHUP behavior would be to write a "/etc/systemd/system/fcoe.service.d/vdsm-override.conf" with [Service] KillSignal=SIGHUP Which should match el6 with "stop force" To remove VLAN interfaces created when AUTO_VLAN is set, we'd need to go further. There's no equivalent to passing parameters like force to a stop command, so it would require setting a variable in the environment file used for the fcoe.service /etc/sysconfig/fcoe. I could add a shutdown script to use as the ExecStop, sending SIGTERM to fcoemon like systemd does today by default. But it could check an environment variable and handle the SIGHUP termination and VLAN cleanup if it was set. Something like the following might have enough checks to be safe. It would still be up to vdsm or something to set the FORCE_CLEANUP=true in /etc/sysconfig/fcoe to trigger this non-default full cleanup. --- #!/bin/bash IP=/usr/sbin/ip CFGDIR=/etc/fcoe cleanup_vlans() { local link phys vlan $IP -o link show type vlan | while IFS=':@ ' read -ra link; do vlan=${link[1]} phys=${link[2]} # does this look like the auto vlan format? [[ $vlan =~ $phys.[[:digit:]]+-fcoe ]] || continue # auto vlans don't have config files [ ! -e "$CFGDIR/cfg-$vlan" ] || continue # but the physical interface should [ -e "$CFGDIR/cfg-$phys" ] || continue # check that the physical port is configured for auto vlans ( . "$CFGDIR/cfg-$phys" && [ "$AUTO_VLAN" == "yes" ] ) || continue echo "removing FCoE VLAN device $vlan" $IP link delete "$vlan" done } pid=$(pidof fcoemon) if [ -v FORCE_CLEANUP ]; then [ "$pid" ] && kill -HUP "$pid" cleanup_vlans else [ "$pid" ] && kill -TERM "$pid" fi After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |