Hide Forgot
Description of problem: When delete a vHBA, its connected multipath devices not removed under /dev/mapper. And this will block follow-up pvcreate. Version-Release number of selected component (if applicable): # rpm -qa | egrep "multipath|lvm2|kernel-3.*514" lvm2-libs-2.02.166-1.el7.x86_64 device-mapper-multipath-libs-0.4.9-99.el7.x86_64 kernel-3.10.0-514.el7.x86_64 device-mapper-multipath-0.4.9-99.el7.x86_64 lvm2-2.02.166-1.el7.x86_64 How reproducible: 100% Steps to Reproduce: 0. I have machine with HBA card (scsi_host0) and scsi_host0 has a backend storage which presented as /dev/mpathc And a pair of valid wwnn/wwpn configured in storage switch, which can be used to create a vHBA 1. Create vHBA, useing scsi_host0 as parent # echo "1000000000000003:20000000c99e2b80" > /sys/class/fc_host/host0/vport_create # ll /dev/mapper/ total 0 ... lrwxrwxrwx. 1 root root 8 Nov 28 16:56 mpathh -> ../dm-10 ... <==== mpathh is the vhba connected device And scsi_host43 is the new vhba # lsscsi ... [43:0:0:0] disk IBM 2145 0000 /dev/sdk [43:0:0:1] disk IBM 2145 0000 /dev/sdl [43:0:1:0] disk IBM 2145 0000 /dev/sdm [43:0:1:1] disk IBM 2145 0000 /dev/sdn 2. remove the vhba card # echo "1000000000000003:20000000c99e2b80" > /sys/class/fc_host/host0/vport_delete #lsscsi ... <=== scsi_host43 gone as expected 3. But device mpathh still exists under /dev/mapper # ll /dev/mapper/ | grep mpathh lrwxrwxrwx. 1 root root 8 Nov 28 16:57 mpathh -> ../dm-10 4. Now if I try to pvcreate based on mpathc (**NOT** mpathh), terminal will hang! As follow: # /usr/sbin/pvcreate /dev/mapper/mpathc File descriptor 3 (pipe:[155703]) leaked on pvcreate invocation. Parent PID 5055: -bash File descriptor 4 (pipe:[154883]) leaked on pvcreate invocation. Parent PID 5055: -bash ^C^C^C^C Actual results: when a vHBA removed, its multipath devs still exist in /dev/mapper and will cause error when use pvcreate with other multipath devs. Expected results: When a vHBA removed, its related multipath devs should be removed. Additional info: When I "service multipathd restart" in another terminal, everything will be ok.
After you remove the vHBA, what does # multipath -l show. It's perfectly valid to have a multipath device with no paths. If the multipath device was in-use when all the paths went away, it will either fail all IO or queue all IO, depending on how it was configured.
As follow, after remove vhba, mpathg still exists with multipath -l, but without luns. Anyway, even if this is valid, pvcreate on another mpath dev hang is not a expected behaviour, pls help to check, thx. # echo "1000000000000001:20000000c99e2b80" > /sys/class/fc_host/host0/vport_create # multipath -l mpathd (360050763008084e6e000000000000064) dm-5 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:0:1 sdg 8:96 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:1:1 sdi 8:128 active undef running mpathc (360050763008084e6e000000000000063) dm-3 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:1:0 sdh 8:112 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:0:0 sdf 8:80 active undef running mpathb (360050763008084e6e000000000000062) dm-4 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 0:0:0:1 sdc 8:32 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 0:0:1:1 sde 8:64 active undef running mpatha (360050763008084e6e000000000000061) dm-2 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | |- 0:0:1:0 sdd 8:48 active undef running | `- 202:0:0:1 sdk 8:160 active undef running `-+- policy='service-time 0' prio=0 status=enabled |- 0:0:0:0 sdb 8:16 active undef running `- 202:0:1:1 sdm 8:192 active undef running mpathg (360050763008084e6e000000000000065) dm-6 IBM ,2145 size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 202:0:0:0 sdj 8:144 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 202:0:1:0 sdl 8:176 active undef running # echo "1000000000000001:20000000c99e2b80" > /sys/class/fc_host/host0/vport_delete # multipath -l mpathd (360050763008084e6e000000000000064) dm-5 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:0:1 sdg 8:96 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:1:1 sdi 8:128 active undef running mpathc (360050763008084e6e000000000000063) dm-3 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:1:0 sdh 8:112 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:0:0 sdf 8:80 active undef running mpathb (360050763008084e6e000000000000062) dm-4 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 0:0:0:1 sdc 8:32 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 0:0:1:1 sde 8:64 active undef running mpatha (360050763008084e6e000000000000061) dm-2 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 0:0:1:0 sdd 8:48 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 0:0:0:0 sdb 8:16 active undef running mpathg (360050763008084e6e000000000000065) dm-6 size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw
Can you post /var/log/messages from around when you do the delete?
# cat test.sh #!/bin/bash echo "" > /var/log/messages echo "1000000000000003:20000000c99e2b80" > /sys/class/fc_host/host0/vport_create sleep 5 echo "lsscsi:" lsscsi | grep sdk echo "1000000000000003:20000000c99e2b80" > /sys/class/fc_host/host0/vport_delete echo "lsscsi:" lsscsi | grep sdk /usr/sbin/pvcreate /dev/mapper/mpathc /var/log/messages as follow: Dec 8 14:18:23 bootp-73-75-161 kernel: scsi host363: Emulex LPe12002-M8 8Gb 2-port PCIe Fibre Channel Adapter on PCI bus 20 device 00 irq 16 port 0 Logical Link Speed: 4000 Mbps Dec 8 14:18:23 bootp-73-75-161 kernel: lpfc 0000:20:00.0: 0:(1):1825 Vport Created. Dec 8 14:18:23 bootp-73-75-161 kernel: scsi host0: vport-0:0-320 created via shost0 channel 0 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: port group 01 rel port 881 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: port group 01 state N non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: Attached scsi generic sg9 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: port group 01 rel port 881 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: port group 01 state N non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: Attached scsi generic sg10 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 multipathd: sdk: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: port group 00 rel port 81 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: port group 00 state A non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: Attached scsi generic sg11 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 1 1 8:112 1 service-time 0 2 1 8:80 1 8:160 1] Dec 8 14:18:23 bootp-73-75-161 multipathd: sdk [8:160]: path added to devmap mpathc Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: port group 00 rel port 81 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: port group 00 state A non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: Attached scsi generic sg12 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 multipathd: sdm: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 2 1 8:112 1 8:192 1 service-time 0 2 1 8:80 1 8:160 1] Dec 8 14:18:23 bootp-73-75-161 multipathd: sdm [8:192]: path added to devmap mpathc Dec 8 14:18:23 bootp-73-75-161 multipathd: sdl: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 1 8:176 1] Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathh: event checker started Dec 8 14:18:23 bootp-73-75-161 multipathd: sdl [8:176]: path added to devmap mpathh Dec 8 14:18:23 bootp-73-75-161 multipathd: sdj: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 1 1 8:176 1 service-time 0 1 1 8:144 1] Dec 8 14:18:27 bootp-73-75-161 dhclient[3848]: DHCPDISCOVER on virbr0 to 255.255.255.255 port 67 interval 12 (xid=0x5b8467e6) Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:0:0: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdj: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 1 8:176 1] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdj [8:144]: path removed from map mpathh Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 08 00 Dec 8 14:18:28 bootp-73-75-161 kernel: blk_update_request: I/O error, dev sdl, sector 41942912 Dec 8 14:18:28 bootp-73-75-161 kernel: device-mapper: multipath: Failing path 8:176. Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl: mark as failed Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: remaining active paths: 0 Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:0: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: map in use Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: can't flush Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 0 0] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl [8:176]: path removed from map mpathh Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:0:1: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdk: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 2 1 8:112 1 8:192 1 service-time 0 1 1 8:80 1] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdk [8:160]: path removed from map mpathc Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] tag#0 CDB: Read(10) 28 00 02 7f ff f0 00 00 08 00 Dec 8 14:18:28 bootp-73-75-161 kernel: blk_update_request: I/O error, dev sdm, sector 41943024 Dec 8 14:18:28 bootp-73-75-161 kernel: device-mapper: multipath: Failing path 8:192. Dec 8 14:18:28 bootp-73-75-161 multipathd: sdm: mark as failed Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathc: remaining active paths: 2 Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:1: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdm: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 1 1 8:112 1 service-time 0 1 1 8:80 1] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdm [8:192]: path removed from map mpathc Dec 8 14:18:28 bootp-73-75-161 kernel: lpfc 0000:20:00.0: 0:(1):1828 Vport Deleted. Dec 8 14:18:39 bootp-73-75-161 dhclient[3848]: DHCPDISCOVER on virbr0 to 255.255.255.255 port 67 interval 13 (xid=0x5b8467e6)
So here is the important part of the messages: Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: map in use Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: can't flush Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 0 0] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl [8:176]: path removed from map mpathh Something clearly has mpathh open at this point of time. Because of this, it is not possible to delete it, so multipathd loads a table with no devices. The default configuration for your array is set to queue forever when all paths are down. This doesn't make tons of sense when the devices have actually disappeared. You can override this by setting "flush_on_last_del" in the defaults section of multipath.conf defaults { flush_on_last_del yes } This will turn off queuing when the last path has failed. If whatever was holding the device open will close it upon receiving and error, this may cause the device to be deleted (it depends on whether whoever has it open closes it before multipathd tries to remove the device). If whatever is holding the device open doesn't close it, the multipath device will remain. It will immediately fail back IO send to it, however. multipathd also has the option to schedule deferred removal of the device. This means that whenever the last opener does close the device, it will get deleted, and nobody else will be able to open the device in the mean time. If a path is restored before the last opener closes the device, the deferred remove is canceled. You can set this by adding defaults { deferred_remove yes } to /etc/multipath.conf. These configuration changes should fix your issue. Otherwise, there is nothing that multipath can do here, since it can't remove an in-use device, and the device is configured to queue IO forever if there are no usable paths. Another possibility is to change your the device configuration to not queue IOs forever. If you thing I've closed this bug in error, please reopen it with your reasoning.