Description of problem: I've been running some tests using a Dell MD3000i connected to a server via iSCSI, using kernels provided by Don Zickus which include the MD3000i in the RDAC driver. A couple of times now I have deleted a virtual disk on the array, flushed the multipath map, created a new disk and then attempted to reload the device map. At this point, a kpartx process becomes stuck: 6626 ? S< 0:00 /sbin/dmsetup ls --target multipath --exec /sbin/kpartx -a -p p -j 253 -m 0 6627 ? D< 0:00 /sbin/kpartx -a -p p /dev/mapper/mpath3 and there are lots of messages like these: Jul 2 16:16:36 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:36 marsantes multipathd: mpath3: event checker started Jul 2 16:16:37 marsantes kernel: sd 7:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:37 marsantes kernel: sd 5:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:38 marsantes kernel: sd 5:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:38 marsantes kernel: sd 4:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:38 marsantes kernel: sd 4:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:38 marsantes kernel: sd 6:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:39 marsantes kernel: sd 6:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:39 marsantes kernel: end_request: I/O error, dev sdb, sector 0 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:16. Jul 2 16:16:39 marsantes multipathd: 8:16: mark as failed Jul 2 16:16:39 marsantes multipathd: mpath3: remaining active paths: 3 Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:39 marsantes kernel: end_request: I/O error, dev sdc, sector 0 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:32. Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes kernel: end_request: I/O error, dev sdd, sector 0 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:48. Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:39 marsantes kernel: end_request: I/O error, dev sdf, sector 0 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:80. Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:40 marsantes multipathd: 8:32: mark as failed Jul 2 16:16:40 marsantes multipathd: mpath3: remaining active paths: 2 Jul 2 16:16:40 marsantes multipathd: 8:48: mark as failed Jul 2 16:16:40 marsantes multipathd: mpath3: remaining active paths: 1 Jul 2 16:16:40 marsantes multipathd: 8:80: mark as failed Jul 2 16:16:40 marsantes multipathd: mpath3: remaining active paths: 0 Jul 2 16:16:43 marsantes multipathd: 8:16: reinstated Jul 2 16:16:43 marsantes multipathd: mpath3: remaining active paths: 1 Jul 2 16:16:43 marsantes kernel: sd 7:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:43 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:43 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:43 marsantes kernel: sd 5:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:43 marsantes kernel: sd 5:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:43 marsantes kernel: sd 4:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:44 marsantes multipathd: 8:32: reinstated Jul 2 16:16:44 marsantes multipathd: mpath3: remaining active paths: 2 Jul 2 16:16:44 marsantes multipathd: 8:48: reinstated Jul 2 16:16:44 marsantes multipathd: mpath3: remaining active paths: 3 Jul 2 16:16:44 marsantes multipathd: 8:80: reinstated Jul 2 16:16:44 marsantes multipathd: mpath3: remaining active paths: 4 Jul 2 16:16:44 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:44 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:44 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:44 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:44 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:44 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:44 marsantes kernel: sd 4:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:44 marsantes kernel: sd 6:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:44 marsantes kernel: sd 6:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:44 marsantes kernel: sd 5:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:45 marsantes kernel: sd 5:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:45 marsantes kernel: sd 4:0:0:1: queueing MODE_SELECT command. Jul 2 16:16:46 marsantes kernel: sd 4:0:0:1: retrying MODE_SELECT command. Jul 2 16:16:46 marsantes kernel: end_request: I/O error, dev sdb, sector 0 Jul 2 16:16:46 marsantes kernel: device-mapper: multipath: Failing path 8:16. Jul 2 16:16:46 marsantes kernel: end_request: I/O error, dev sdb, sector 8 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:16. Jul 2 16:16:39 marsantes multipathd: 8:16: mark as failed Jul 2 16:16:39 marsantes multipathd: mpath3: remaining active paths: 3 Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:39 marsantes kernel: end_request: I/O error, dev sdc, sector 0 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:32. Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes kernel: end_request: I/O error, dev sdd, sector 0 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:48. Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:39 marsantes kernel: end_request: I/O error, dev sdf, sector 0 Jul 2 16:16:39 marsantes kernel: device-mapper: multipath: Failing path 8:80. Jul 2 16:16:39 marsantes multipathd: dm-0: add map (uevent) Jul 2 16:16:39 marsantes multipathd: dm-0: devmap already registered Jul 2 16:16:40 marsantes multipathd: 8:32: mark as failed Jul 2 16:16:40 marsantes multipathd: mpath3: remaining active paths: 2 Jul 2 16:16:40 marsantes multipathd: 8:48: mark as failed Jul 2 16:16:40 marsantes multipathd: mpath3: remaining active paths: 1 Jul 2 16:16:40 marsantes multipathd: 8:80: mark as failed Jul 2 16:16:40 marsantes multipathd: mpath3: remaining active paths: 0 Jul 2 16:16:43 marsantes multipathd: 8:16: reinstated Jul 2 16:16:43 marsantes multipathd: mpath3: remaining active paths: 1 running continuously and the only remedy is a reboot. I've tried commenting out the line in the udev rules as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=497041 but that hasn't had any effect. Version-Release number of selected component (if applicable): device-mapper-1.02.28-2.el5 device-mapper-multipath-0.4.7-23.el5_3.4 kpartx-0.4.7-23.el5_3.4 How reproducible: Steps to Reproduce: 1. delete a disk on the MD3000i 2. create a disk on the MD3000i 3. Actual results: Expected results: Additional info:
Did you have iscsid rescan the array? (from console: iscsiadm -m node -R). You have to make iscsid update its device nodes before you reload multipath. Finally, I must say multipath seems to go beserk on a device beeing removed while it has commands in the queue. Perhaps someone should look after that.
Could you also run udevmonitor, to see if the kernel is really throwing out all those uevents?
Created attachment 358541 [details] Udevmonitor dump Udevmonitor dump while removing and re-adding a disk over iscsi.
Created attachment 358542 [details] Message log Partial message log while removing and re-adding a disk over iscsi
It seems much happier now, running kernel 2.6.18-164.el5. I deleted a virtual disk yesterday then added a new one. When I rescanned the array then rebuild the multipath device map, the new disk appeared and there was no kpartx hang. Can't play around with this particular device any more as it's going into production. However, another one will be installed fairly soon and I can run more testing on that.
Above the requsted udevmonitor dump. Please note that the addition and removal of the disk is done at the iscsi target side. I also noticed that iscsid did not remove the disk device nodes (in use by multipath?). Then again, I should check if iscsi still properly does that when not using multipath. Finnally, toying around with the iscsi disks really screwed up multipath again: 36001c23000dd034d000009ca4795fe16 dm-6 DELL,MD3000i [size=1.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][enabled] \_ 2:0:0:10 sdj 8:144 [active][ready] \_ round-robin 0 [prio=100][enabled] \_ 5:0:0:10 sdk 8:160 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 3:0:0:10 sdr 65:16 [active][ghost] \_ round-robin 0 [prio=0][enabled] \_ 4:0:0:10 sds 65:32 [active][ghost] 36001c23000dd030e000007534795b6af dm-4 DELL,MD3000i [size=50G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=0][enabled] \_ 2:0:0:7 sdf 8:80 [active][ghost] \_ round-robin 0 [prio=0][enabled] \_ 5:0:0:7 sdg 8:96 [active][ghost] \_ round-robin 0 [prio=100][active] \_ 3:0:0:7 sdn 8:208 [active][ready] \_ round-robin 0 [prio=100][enabled] \_ 4:0:0:7 sdo 8:224 [active][ready] 1_ dm-19 DELL,MD3000i [size=1.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=0][enabled] \_ 2:0:0:1 sdb 8:16 [active][ghost] \_ round-robin 0 [prio=0][enabled] \_ 5:0:0:1 sdc 8:32 [active][ghost] \_ round-robin 0 [prio=0][enabled] \_ 3:0:0:1 sdd 8:48 [active][ghost] \_ round-robin 0 [prio=0][enabled] \_ 4:0:0:1 sde 8:64 [active][ghost] \_ round-robin 0 [prio=100][active] \_ 2:0:0:10 sdj 8:144 [active][ready] \_ round-robin 0 [prio=100][enabled] \_ 5:0:0:10 sdk 8:160 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 3:0:0:10 sdr 65:16 [active][ghost] \_ round-robin 0 [prio=0][enabled] \_ 4:0:0:10 sds 65:32 [active][ghost] 36001c23000dd034d0000098a4795ecb8 dm-5 DELL,MD3000i [size=50G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=100][active] \_ 2:0:0:8 sdh 8:112 [active][ready] \_ round-robin 0 [prio=100][enabled] \_ 5:0:0:8 sdi 8:128 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 3:0:0:8 sdp 8:240 [active][ghost] \_ round-robin 0 [prio=0][enabled] \_ 4:0:0:8 sdq 65:0 [active][ghost] Note the designation "1_ dm-19 DELL,MD3000i". The paths of 2 disks that have been removed are grouped there. After re-adding one of those disks, "36001c23000dd034d000009ca4795fe16 dm-6 DELL,MD3000i" appears using the same paths as before. So now they're listed twice by multipath. PS. Maybe worth another bug-report or a manual change for the md3000i (from redhat): i dont like the fact the kernel tries to read the iscsi disk partition tables as the specific path might no be accessible. It causes a lot of read errors and I assume slows down boot dramatically.
Having a device change wwids can really mess with multipath. It's quite possible that some of the recent iscsi changes have fixed this. Are you able to reproduce this on a recent version.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
I've changed jobs since I reported this and no longer have access to this hardware.