| Summary: | When delete a vHBA, its connected multipath devices not removed under /dev/mapper. And this will block follow-up pvcreate. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | yisun |
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> |
| Status: | CLOSED NOTABUG | QA Contact: | Lin Li <lilin> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | agk, bmarzins, heinzm, lilin, msnitzer, prajnoha, yisun |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-12-08 17:34:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
yisun
2016-11-28 09:02:35 UTC
After you remove the vHBA, what does # multipath -l show. It's perfectly valid to have a multipath device with no paths. If the multipath device was in-use when all the paths went away, it will either fail all IO or queue all IO, depending on how it was configured. As follow, after remove vhba, mpathg still exists with multipath -l, but without luns. Anyway, even if this is valid, pvcreate on another mpath dev hang is not a expected behaviour, pls help to check, thx. # echo "1000000000000001:20000000c99e2b80" > /sys/class/fc_host/host0/vport_create # multipath -l mpathd (360050763008084e6e000000000000064) dm-5 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:0:1 sdg 8:96 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:1:1 sdi 8:128 active undef running mpathc (360050763008084e6e000000000000063) dm-3 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:1:0 sdh 8:112 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:0:0 sdf 8:80 active undef running mpathb (360050763008084e6e000000000000062) dm-4 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 0:0:0:1 sdc 8:32 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 0:0:1:1 sde 8:64 active undef running mpatha (360050763008084e6e000000000000061) dm-2 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | |- 0:0:1:0 sdd 8:48 active undef running | `- 202:0:0:1 sdk 8:160 active undef running `-+- policy='service-time 0' prio=0 status=enabled |- 0:0:0:0 sdb 8:16 active undef running `- 202:0:1:1 sdm 8:192 active undef running mpathg (360050763008084e6e000000000000065) dm-6 IBM ,2145 size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 202:0:0:0 sdj 8:144 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 202:0:1:0 sdl 8:176 active undef running # echo "1000000000000001:20000000c99e2b80" > /sys/class/fc_host/host0/vport_delete # multipath -l mpathd (360050763008084e6e000000000000064) dm-5 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:0:1 sdg 8:96 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:1:1 sdi 8:128 active undef running mpathc (360050763008084e6e000000000000063) dm-3 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 7:0:1:0 sdh 8:112 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 7:0:0:0 sdf 8:80 active undef running mpathb (360050763008084e6e000000000000062) dm-4 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 0:0:0:1 sdc 8:32 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 0:0:1:1 sde 8:64 active undef running mpatha (360050763008084e6e000000000000061) dm-2 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 0:0:1:0 sdd 8:48 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 0:0:0:0 sdb 8:16 active undef running mpathg (360050763008084e6e000000000000065) dm-6 size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw Can you post /var/log/messages from around when you do the delete? # cat test.sh #!/bin/bash echo "" > /var/log/messages echo "1000000000000003:20000000c99e2b80" > /sys/class/fc_host/host0/vport_create sleep 5 echo "lsscsi:" lsscsi | grep sdk echo "1000000000000003:20000000c99e2b80" > /sys/class/fc_host/host0/vport_delete echo "lsscsi:" lsscsi | grep sdk /usr/sbin/pvcreate /dev/mapper/mpathc /var/log/messages as follow: Dec 8 14:18:23 bootp-73-75-161 kernel: scsi host363: Emulex LPe12002-M8 8Gb 2-port PCIe Fibre Channel Adapter on PCI bus 20 device 00 irq 16 port 0 Logical Link Speed: 4000 Mbps Dec 8 14:18:23 bootp-73-75-161 kernel: lpfc 0000:20:00.0: 0:(1):1825 Vport Created. Dec 8 14:18:23 bootp-73-75-161 kernel: scsi host0: vport-0:0-320 created via shost0 channel 0 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: port group 01 rel port 881 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: port group 01 state N non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:0: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: Attached scsi generic sg9 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: port group 01 rel port 881 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: port group 01 state N non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:0:1: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: Attached scsi generic sg10 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:1: [sdk] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 multipathd: sdk: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: port group 00 rel port 81 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: port group 00 state A non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:0: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: Attached scsi generic sg11 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: Direct-Access IBM 2145 0000 PQ: 0 ANSI: 6 Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 1 1 8:112 1 service-time 0 2 1 8:80 1 8:160 1] Dec 8 14:18:23 bootp-73-75-161 multipathd: sdk [8:160]: path added to devmap mpathc Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: supports implicit TPGS Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: port group 00 rel port 81 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: rtpg failed with 8000002 Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: port group 00 state A non-preferred supports tolusna Dec 8 14:18:23 bootp-73-75-161 kernel: scsi 363:0:1:1: alua: Attached Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: Attached scsi generic sg12 type 0 Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] Write Protect is off Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] Write cache: disabled, read cache: enabled, supports DPO and FUA Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:0:0: [sdj] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] Attached SCSI disk Dec 8 14:18:23 bootp-73-75-161 multipathd: sdm: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 2 1 8:112 1 8:192 1 service-time 0 2 1 8:80 1 8:160 1] Dec 8 14:18:23 bootp-73-75-161 multipathd: sdm [8:192]: path added to devmap mpathc Dec 8 14:18:23 bootp-73-75-161 multipathd: sdl: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 1 8:176 1] Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathh: event checker started Dec 8 14:18:23 bootp-73-75-161 multipathd: sdl [8:176]: path added to devmap mpathh Dec 8 14:18:23 bootp-73-75-161 multipathd: sdj: add path (uevent) Dec 8 14:18:23 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 1 1 8:176 1 service-time 0 1 1 8:144 1] Dec 8 14:18:27 bootp-73-75-161 dhclient[3848]: DHCPDISCOVER on virbr0 to 255.255.255.255 port 67 interval 12 (xid=0x5b8467e6) Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:0:0: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdj: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 1 8:176 1] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdj [8:144]: path removed from map mpathh Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:0: [sdl] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 08 00 Dec 8 14:18:28 bootp-73-75-161 kernel: blk_update_request: I/O error, dev sdl, sector 41942912 Dec 8 14:18:28 bootp-73-75-161 kernel: device-mapper: multipath: Failing path 8:176. Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl: mark as failed Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: remaining active paths: 0 Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:0: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: map in use Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: can't flush Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 0 0] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl [8:176]: path removed from map mpathh Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:0:1: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdk: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 2 1 8:112 1 8:192 1 service-time 0 1 1 8:80 1] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdk [8:160]: path removed from map mpathc Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:1: [sdm] tag#0 CDB: Read(10) 28 00 02 7f ff f0 00 00 08 00 Dec 8 14:18:28 bootp-73-75-161 kernel: blk_update_request: I/O error, dev sdm, sector 41943024 Dec 8 14:18:28 bootp-73-75-161 kernel: device-mapper: multipath: Failing path 8:192. Dec 8 14:18:28 bootp-73-75-161 multipathd: sdm: mark as failed Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathc: remaining active paths: 2 Dec 8 14:18:28 bootp-73-75-161 kernel: sd 363:0:1:1: alua: Detached Dec 8 14:18:28 bootp-73-75-161 multipathd: sdm: remove path (uevent) Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathc: load table [0 41943040 multipath 1 queue_if_no_path 0 2 1 service-time 0 1 1 8:112 1 service-time 0 1 1 8:80 1] Dec 8 14:18:28 bootp-73-75-161 multipathd: sdm [8:192]: path removed from map mpathc Dec 8 14:18:28 bootp-73-75-161 kernel: lpfc 0000:20:00.0: 0:(1):1828 Vport Deleted. Dec 8 14:18:39 bootp-73-75-161 dhclient[3848]: DHCPDISCOVER on virbr0 to 255.255.255.255 port 67 interval 13 (xid=0x5b8467e6) So here is the important part of the messages:
Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl: remove path (uevent)
Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: map in use
Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: can't flush
Dec 8 14:18:28 bootp-73-75-161 multipathd: mpathh: load table [0 41943040 multipath 1 queue_if_no_path 0 0 0]
Dec 8 14:18:28 bootp-73-75-161 multipathd: sdl [8:176]: path removed from map mpathh
Something clearly has mpathh open at this point of time. Because of this, it is not possible to delete it, so multipathd loads a table with no devices. The
default configuration for your array is set to queue forever when all paths are
down. This doesn't make tons of sense when the devices have actually disappeared.
You can override this by setting "flush_on_last_del" in the defaults section of multipath.conf
defaults {
flush_on_last_del yes
}
This will turn off queuing when the last path has failed. If whatever was holding the device open will close it upon receiving and error, this may cause
the device to be deleted (it depends on whether whoever has it open closes it
before multipathd tries to remove the device). If whatever is holding the device open doesn't close it, the multipath device will remain. It will immediately fail back IO send to it, however.
multipathd also has the option to schedule deferred removal of the device. This means that whenever the last opener does close the device, it will get deleted, and nobody else will be able to open the device in the mean time. If a path is restored before the last opener closes the device, the deferred remove is canceled.
You can set this by adding
defaults {
deferred_remove yes
}
to /etc/multipath.conf. These configuration changes should fix your issue. Otherwise, there is nothing that multipath can do here, since it can't remove an in-use device, and the device is configured to queue IO forever if there are no usable paths. Another possibility is to change your the device configuration to not queue IOs forever.
If you thing I've closed this bug in error, please reopen it with your reasoning.
|