Created attachment 1205384 [details] rbd map worked and multipath -ll shows the devices Description of problem: In a multi-gw configuration, with device-mapper-multipath providing the path layer to LIO - when a node is killed during active I/O some devices report as busy after the node restarts. Since the device is in a busy state, it is unable to be accessed by LIO and therefore the client is never able recover it's path(s). Version-Release number of selected component (if applicable): RHEL 7.3 beta - 3.10.0-506.el7.x86_64 device-mapper-multipath-0.4.9-99.el7.x86_64 Also I'm using the rbd-target-gw script in the following hierarchy rbd-target-gw -> rbdmap -> target How reproducible: 3 tests - all with same outcome Steps to Reproduce: 1. Create a 2 gateway configuration, with a LUN exported to a Windows 2012 client 2. connect the Windows client to both gateways with the iscsi initiator 3. run iometer on the windows box, 100% random read workload will be fine 4. determine the gw which is active for this LUN, then poweroff this gateway forcing the alternate node to be accessed 5. restart the gateway 6. Look at the device should be added to LIO and available to the Windows Client Actual results: systemctl status target shows the issue when attempting to add the LUN to the LIO i.e. Sep 28 15:55:49 ceph-1.test.lab systemd[1]: Starting Restore LIO kernel target configuration... Sep 28 15:55:49 ceph-1.test.lab target[2777]: Could not create StorageObject ansible3: Cannot configure StorageObject because device /dev/mapper/0-7a924515f007c is...e, skipped Sep 28 15:55:49 ceph-1.test.lab target[2777]: Could not find matching StorageObject for LUN 2, skipped Sep 28 15:55:49 ceph-1.test.lab target[2777]: Could not find matching StorageObject for LUN 2, skipped Sep 28 15:55:49 ceph-1.test.lab target[2777]: Could not find matching TPG LUN 2 for MappedLUN 0, skipped Expected results: The restarted node should be able to access all prior LUNs, and LIO should be restored to it's prior state. Additional info:
I removed the device with multipath -f unmapped then rbd map rbd/ansible3 -o noshare multipath shows the device [root@ceph-1 system]# multipath -ll 0-7a924515f007c dm-4 Ceph,RBD size=30G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- #:#:#:# rbd2 251:32 active ready running 0-7ab55515f007c dm-6 Ceph,RBD size=50G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- #:#:#:# rbd3 251:48 active ready running 0-7aafe79e2a9e3 dm-3 Ceph,RBD size=15G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- #:#:#:# rbd1 251:16 active ready running 0-937d7515f007c dm-2 Ceph,RBD size=30G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- #:#:#:# rbd0 251:0 active ready running However a targetctl restore still fails with the same message checking osd state [root@ceph-1 system]# ceph osd blacklist ls listed 0 entries
Created attachment 1205676 [details] syslog from the node - ceph-1
Added syslog from the node encountering these issues. in this case the device that LIO wants to add is rbd2 which is rbd/ansible3
More info, but no solution It appears that lvm is accepting the dm devices created for the rbd's. I added this to lvm.conf global_filter = [ "r|^/dev/mapper/[0-255]-.*|" ] Now lvmdiskscan does NOT show the /dev/mapper/0-<bla> devices I also noticed that in the boot log Sep 30 15:48:24 ceph-1.test.lab systemd-udevd[2623]: inotify_add_watch(7, /dev/rbd2p1, 10) failed: No such file or directory So I updated the udev rules directly (just to test!) excluding rbd devices /lib/udev/rules.d/60-persistent-storage.rules and now the inotify watch is resolved... Sep 30 15:50:53 ceph-1.test.lab kernel: rbd2: p1 Sep 30 15:50:53 ceph-1.test.lab kernel: rbd: rbd2: capacity 32212254720 features 0x5 Sep 30 15:50:53 ceph-1.test.lab multipathd[513]: rbd2: add path (uevent) Sep 30 15:50:53 ceph-1.test.lab multipathd[513]: rbd2: HDIO_GETGEO failed with 25 Sep 30 15:50:53 ceph-1.test.lab rbdmap[2502]: Mapped 'rbd/ansible3' to '/dev/rbd2' Sep 30 15:50:53 ceph-1.test.lab multipathd[513]: 0-7a924515f007c: load table [0 62914560 multipath 0 0 1 1 service-time 0 1 1 251:32 1] Sep 30 15:50:53 ceph-1.test.lab multipathd[513]: 0-7a924515f007c: event checker started Sep 30 15:50:53 ceph-1.test.lab multipathd[513]: rbd2 [251:32]: path added to devmap 0-7a924515f007c However, the problem remains. A device exported and initialised by a client is getting locked on the gateway preventing iy from being used by LIO.