Description of problem: ----------------------- Had configured iSCSI Gateway on one of the OSD nodes - basically a co-located iscsi configuration. After a planned reboot of the OSD node, the OSD Disks are not getting auto-mounted. Version-Release number of selected component (if applicable): ------------------------------------------------------------- V10.2.3-10 How reproducible: ----------------- always Steps to Reproduce: 1. Configure iSCSI on a 3:1 gateway setup (3 Co-located and 1 dedicated) 2. Mount the Created iSCSI Luns on KVM Host and create few VM's 3. While performing the multipath tests - by failing the path(gw node) - I had to reboot one of the co-located OSD node and the OSD's failed to come back after reboot Actual results: --------------- OSD's are not coming up after reboot Expected results: ----------------- OSD's should be auto-mounted after reboot Additional info: ----------------- 2016-10-27 15:45:51.248911 7fb4b5e31700 10 -- 10.8.128.5:6801/27438 reaper deleted pipe 0x7fb4cbd27400 2016-10-27 15:45:51.248915 7fb4b5e31700 10 -- 10.8.128.5:6801/27438 reaper done 2016-10-27 15:45:51.248918 7fb4b5e31700 10 -- 10.8.128.5:6801/27438 reaper_entry done 2016-10-27 15:45:51.248959 7fb4bf8c5800 20 -- 10.8.128.5:6801/27438 wait: stopped reaper thread 2016-10-27 15:45:51.248972 7fb4bf8c5800 10 -- 10.8.128.5:6801/27438 wait: closing pipes 2016-10-27 15:45:51.248977 7fb4bf8c5800 10 -- 10.8.128.5:6801/27438 reaper 2016-10-27 15:45:51.248983 7fb4bf8c5800 10 -- 10.8.128.5:6801/27438 reaper done 2016-10-27 15:45:51.248987 7fb4bf8c5800 10 -- 10.8.128.5:6801/27438 wait: waiting for pipes to close 2016-10-27 15:45:51.248992 7fb4bf8c5800 10 -- 10.8.128.5:6801/27438 wait: done. 2016-10-27 15:45:51.248999 7fb4bf8c5800 1 -- 10.8.128.5:6801/27438 shutdown complete. 2016-10-27 15:45:51.249004 7fb4bf8c5800 10 -- :/27438 wait: waiting for dispatch queue 2016-10-27 15:45:51.249037 7fb4bf8c5800 10 -- :/27438 wait: dispatch queue is stopped 2016-10-27 15:45:51.249046 7fb4bf8c5800 20 -- :/27438 wait: stopping reaper thread 2016-10-27 15:45:51.249062 7fb4b5630700 10 -- :/27438 reaper_entry done 2016-10-27 15:45:51.249131 7fb4bf8c5800 20 -- :/27438 wait: stopped reaper thread 2016-10-27 15:45:51.249140 7fb4bf8c5800 10 -- :/27438 wait: closing pipes 2016-10-27 15:45:51.249142 7fb4bf8c5800 10 -- :/27438 reaper 2016-10-27 15:45:51.249143 7fb4bf8c5800 10 -- :/27438 reaper done 2016-10-27 15:45:51.249144 7fb4bf8c5800 10 -- :/27438 wait: waiting for pipes to close 2016-10-27 15:45:51.249146 7fb4bf8c5800 10 -- :/27438 wait: done. 2016-10-27 15:45:51.249147 7fb4bf8c5800 1 -- :/27438 shutdown complete. 2016-10-27 15:52:38.373193 7f5ac1d70800 0 set uid:gid to 167:167 (ceph:ceph) 2016-10-27 15:52:38.373213 7f5ac1d70800 0 ceph version 10.2.3-10.el7cp (1829b6c4f0010d6aba2cd51cc6c23f74c2e189e0), process ceph-osd, pid 2539 2016-10-27 15:52:38.373478 7f5ac1d70800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or directory^[[0m 2016-10-27 15:52:38.762645 7f6d4368c800 0 set uid:gid to 167:167 (ceph:ceph) 2016-10-27 15:52:38.762665 7f6d4368c800 0 ceph version 10.2.3-10.el7cp (1829b6c4f0010d6aba2cd51cc6c23f74c2e189e0), process ceph-osd, pid 2731 2016-10-27 15:52:38.762824 7f6d4368c800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or directory^[[0m 2016-10-27 15:52:39.260364 7fc77af05800 0 set uid:gid to 167:167 (ceph:ceph) 2016-10-27 15:52:39.260381 7fc77af05800 0 ceph version 10.2.3-10.el7cp (1829b6c4f0010d6aba2cd51cc6c23f74c2e189e0), process ceph-osd, pid 2904 2016-10-27 15:52:39.260575 7fc77af05800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or directory^[[0m ------------------------------------------------ mount output doesn't show the disks, Tried mounting manually and the disk seems to be busy and also says it's already mounted which is not.. [root@XXX005 ceph-1]# mount /dev/sdb /var/lib/ceph/osd/ceph-1/ mount: /dev/sdb is already mounted or /var/lib/ceph/osd/ceph-1 busy [root@XXX005 ceph-1]# mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=16363800k,nr_inodes=4090950,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) configfs on /sys/kernel/config type configfs (rw,relatime) /dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=32,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) debugfs on /sys/kernel/debug type debugfs (rw,relatime) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) nfsd on /proc/fs/nfsd type nfsd (rw,relatime) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3274680k,mode=700)
multipathd claimed the sdb, sdc, and sdd devices and prevented them from being used directly: # multipath -ll Hitachi_HUA722010CLA330_JPW9M0N20D247E dm-1 ATA ,Hitachi HUA72201 size=932G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- 1:0:0:0 sdb 8:16 active ready running Hitachi_HUA722010CLA330_JPW9J0N20BMZHC dm-2 ATA ,Hitachi HUA72201 size=932G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- 3:0:0:0 sdd 8:48 active ready running Hitachi_HUA722010CLA330_JPW9M0N20D268E dm-0 ATA ,Hitachi HUA72201 size=932G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- 2:0:0:0 sdc 8:32 active ready running Either these disks needs to be blacklisted from multipath or the systemctl ceph-disk need to use the multipath device.
To be completely safe with users preferences in case they are using dm-multipath for some system or OSD disks, we probably want find_multipaths = yes. I guess users seem to prefer this and for RHEL we override the upstream default and set that to yes. The ceph iscsi tools were setting it back to no because you cannot set it at the per device level like other settings. In the ceph iscsi config modules we will want to add code (I made a patch for this) to run /sbin/multipath device_name for the specific rbd images, so it will not matter what the user has set for find_multipaths. Assuming we cannot make any code changes, then here are the manual instructions that we could add to the ceph iscsi ansible doc to work around the bug here and they will work ok for when we can release a code fix later: 1. After ansible-playbook ceph-iscsi-gw.yml is run log into each node running a iSCSI target and run multipath -ll if there are disks that the user did not intend to be used by dm-multipath, for example disks being used by OSDs, run multipath -w device_name multipath -f device_name example: multipath -w mpatha multipath -f mpatha 2. Open /etc/multiapth.conf on each node running a iSCSI target and in the defaults section remove the global skip_partx and change the global user_friendly_names value to yes: defaults { user_friendly_names yes find_multipaths no } 2. By default, the ansible iscsi modules unblacklisted everything. Unless, you are using dm-multipath for specific devices you can blacklist everything again by adding devnode ".*" to the uncommenented out blacklist {} section at the bottom of the file so it looks like this: blacklist { devnode ".*" } 3. We do want dm-multipath for rbd devices, so add an exception for it by adding the following to the multipath.conf: blacklist_exceptions { devnode "^rbd[0-9]" } 4. For rbd devices add the following to multipath.conf: devices { device { vendor "Ceph" product "RBD" skip_kpartx yes user_friendly_names no } } 5. Reload the new settings: systemctl reload multipathd Hemanth, if you want I can run those commands on your system for you.
(In reply to Mike Christie from comment #9) > To be completely safe with users preferences in case they are using > dm-multipath for some system or OSD disks, we probably want find_multipaths > = yes. I guess users seem to prefer this and for RHEL we override the > upstream default and set that to yes. The ceph iscsi tools were setting it > back to no because you cannot set it at the per device level like other > settings. > > In the ceph iscsi config modules we will want to add code (I made a patch > for this) to run /sbin/multipath device_name for the specific rbd images, so > it will not matter what the user has set for find_multipaths. > > > Assuming we cannot make any code changes, then here are the manual > instructions that we could add to the ceph iscsi ansible doc to work around > the bug here and they will work ok for when we can release a code fix later: > > > 1. After ansible-playbook ceph-iscsi-gw.yml is run log into each node > running a iSCSI target and run > > multipath -ll > > if there are disks that the user did not intend to be used by dm-multipath, > for example disks being used by OSDs, run > > multipath -w device_name > multipath -f device_name > > example: > > multipath -w mpatha > multipath -f mpatha > > 2. Open /etc/multiapth.conf on each node running a iSCSI target and in the > defaults section remove the global skip_partx and change the global > user_friendly_names value to yes: > > defaults { > user_friendly_names yes > find_multipaths no > } > > > 2. By default, the ansible iscsi modules unblacklisted everything. Unless, > you are using dm-multipath for specific devices you can blacklist everything > again by adding > > devnode ".*" > > to the uncommenented out blacklist {} section at the bottom of the file so > it looks like this: > > blacklist { > devnode ".*" > } > > 3. We do want dm-multipath for rbd devices, so add an exception for it by > adding the following to the multipath.conf: > > blacklist_exceptions { > devnode "^rbd[0-9]" > } > > 4. For rbd devices add the following to multipath.conf: > > devices { > device { > vendor "Ceph" > product "RBD" > skip_kpartx yes > user_friendly_names no > } > } > > 5. Reload the new settings: > > systemctl reload multipathd > > > > Hemanth, if you want I can run those commands on your system for you. Mike, The machines are no more in that state to run the commands.. If this are the steps we are documenting as a workaround for the customers then I am okay in running them. If not I will wait for the fix.
As the BZ is moved to 2.2, Can comment #9 be documented in Known Issues of 2.1??
(In reply to Hemanth Kumar from comment #12) > As the BZ is moved to 2.2, Can comment #9 be documented in Known Issues of > 2.1?? Yes. I am working on it. For the most part the instructions are what you will do. I am just trying to test and add more info about how to handle if the user is using dm-multipath for root or other disks on the OSD/gw machine.
Created attachment 1219550 [details] Failover on N/W Failure Hi Paul, Failed the primary GW Node's N/W and the Failover happened within 15 secs.. Refer the attachment for the Performance monitor stats on Windows.. Will update the same after reboot..
(In reply to Hemanth Kumar from comment #14) > Created attachment 1219550 [details] > Failover on N/W Failure > > Hi Paul, > > Failed the primary GW Node's N/W and the Failover happened within 15 secs.. > > Refer the attachment for the Performance monitor stats on Windows.. > > > Will update the same after reboot.. Ignore Comment#14 Updated in the wrong BZ..
Looks good. Thanks Bara.
This issue only affects RHCS 2.1 since RHCS 2.2 will utilize a new approach for incorporating RBD-backed iSCSI.