Description of problem: Currently we recommend multipathing priority configuration as constant. # cat /etc/multipath.conf [...] # LIO iSCSI devices { device { vendor "LIO-ORG" user_friendly_names "yes" # names like mpatha path_grouping_policy "failover" # one path per group path_selector "round-robin 0" failback immediate path_checker "tur" prio "const" no_path_retry 120 rr_weight "uniform" } } Setting 'prio const' means setting all paths priority as 1, hence if we create blocks with NODE1, NODE2 and NODE3 (HA 3), they will be assigned to portals for tpg1, tpg2 and tp3 respectively. Since the path prio is 1 for all, it is very likely that tpg1 is mostly picked and hence the active path will be target on NODE1 for all the blocks. In order that we distribute the IO load more or less equally, we need to have a way to set the priorities of the paths at the target side. Currently [root@localhost ~]# targetcli ls o- / ......................................................................... [...] o- backstores .............................................................. [...] | o- block .................................................. [Storage Objects: 0] | o- fileio ................................................. [Storage Objects: 0] | o- pscsi .................................................. [Storage Objects: 0] | o- ramdisk ................................................ [Storage Objects: 0] | o- user:glfs .............................................. [Storage Objects: 1] | o- tcmublock [sample.124.162/block-store/d87e2981-51b2-4f82-acaa-0dc8 2037854e (1.0GiB) activated] | o- alua ................................................... [ALUA Groups: 2] | o- default_tg_pt_gp ....................... [ALUA state: Active/optimized] | o- glfs_tg_pt_gp .......................... [ALUA state: Active/optimized] o- iscsi ............................................................ [Targets: 1] | o- iqn.2016-12.org.gluster-block:d87e2981-51b2-4f82-acaa-0dc82037854e [TPGs: 3] | o- tpg1 .................................................. [gen-acls, no-auth] | | o- acls .......................................................... [ACLs: 0] | | o- luns .......................................................... [LUNs: 1] | | | o- lun0 ................................. [user/tcmublock (glfs_tg_pt_gp)] | | o- portals .................................................... [Portals: 1] | | o- 192.168.124.162:3260 ............................................. [OK] | o- tpg2 ........................................................... [disabled] | | o- acls .......................................................... [ACLs: 0] | | o- luns .......................................................... [LUNs: 1] | | | o- lun0 ................................. [user/tcmublock (glfs_tg_pt_gp)] | | o- portals .................................................... [Portals: 1] | | o- 192.168.124.149:3260 ............................................. [OK] | o- tpg3 ........................................................... [disabled] | o- acls .......................................................... [ACLs: 0] | o- luns .......................................................... [LUNs: 1] | | o- lun0 ................................. [user/tcmublock (glfs_tg_pt_gp)] | o- portals .................................................... [Portals: 1] | o- 192.168.124.184:3260 ............................................. [OK] o- loopback ......................................................... [Targets: 0] as part of specifying priorities, we will have to create two target portal groups per storage object i.e. glfs_tg_pt_gp_ao and glfs_tg_pt_gp_ano, active optimized (AO) and active non optimized(ANO). AO will have prio 50 and ANO will have priority 10. [root@localhost ~]# targetcli ls o- / ......................................................................... [...] o- backstores .............................................................. [...] | o- block .................................................. [Storage Objects: 0] | o- fileio ................................................. [Storage Objects: 0] | o- pscsi .................................................. [Storage Objects: 0] | o- ramdisk ................................................ [Storage Objects: 0] | o- user:glfs .............................................. [Storage Objects: 1] | o- tcmublock [sample.124.162/block-store/d87e2981-51b2-4f82-acaa-0dc8 2037854e (1.0GiB) activated] | o- alua ................................................... [ALUA Groups: 3] | o- default_tg_pt_gp ....................... [ALUA state: Active/optimized] | o- glfs_tg_pt_gp_ano .................. [ALUA state: Active/non-optimized] | o- glfs_tg_pt_gp_ao ....................... [ALUA state: Active/optimized] o- iscsi ............................................................ [Targets: 1] | o- iqn.2016-12.org.gluster-block:d87e2981-51b2-4f82-acaa-0dc82037854e [TPGs: 3] | o- tpg1 .................................................. [gen-acls, no-auth] | | o- acls .......................................................... [ACLs: 0] | | o- luns .......................................................... [LUNs: 1] | | | o- lun0 .............................. [user/tcmublock (glfs_tg_pt_gp_ao)] | | o- portals .................................................... [Portals: 1] | | o- 192.168.124.162:3260 ............................................. [OK] | o- tpg2 ........................................................... [disabled] | | o- acls .......................................................... [ACLs: 0] | | o- luns .......................................................... [LUNs: 1] | | | o- lun0 ............................. [user/tcmublock (glfs_tg_pt_gp_ano)] | | o- portals .................................................... [Portals: 1] | | o- 192.168.124.149:3260 ............................................. [OK] | o- tpg3 ........................................................... [disabled] | o- acls .......................................................... [ACLs: 0] | o- luns .......................................................... [LUNs: 1] | | o- lun0 ............................. [user/tcmublock (glfs_tg_pt_gp_ano)] | o- portals .................................................... [Portals: 1] | o- 192.168.124.184:3260 ............................................. [OK] o- loopback ......................................................... [Targets: 0] And to use this benefit, on the initiator side we will have to set 'prio alua' # cat /etc/multipath.conf [...] # LIO iSCSI devices { device { vendor "LIO-ORG" user_friendly_names "yes" # names like mpatha path_grouping_policy "failover" # one path per group path_selector "round-robin 0" failback immediate path_checker "tur" prio "alua" no_path_retry 120 rr_weight "uniform" } } Unfortunately we don't have a way to get the active devices on a given target node from target side. We many have to build a logic to distribute the AO tpgs equally. We want to maintain a counter for AO per node in every volume. say, If we have IP1, IP2, IP3 (HA 3) We will have a counter file (in every volume), where we maintain AO count: $ cat .counter IP1: 70 1P2: 58 IP3: 63 For example, the next block create command will pick AO on IP2 (since it has less load) any better solutions are welcome. Changes needed: gluster-block tcmu-runner (dummy lock implementation in the glfs handler) useful links: https://github.com/open-iscsi/rtslib-fb/blob/master/rtslib/alua.py#L312 Upstream discussion: https://github.com/gluster/gluster-block/issues/82
The changes suggested by this RFE are in place since gluster-block version gluster-block-0.2.1-19.el7rhgs. Versions used for verification ============= # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep targetcli ; done glusterfs-storage-krlwr +++++++++++++++++++++++ targetcli-2.1.fb46-6.el7_5.noarch glusterfs-storage-pngnm +++++++++++++++++++++++ targetcli-2.1.fb46-6.el7_5.noarch glusterfs-storage-v8z6s +++++++++++++++++++++++ targetcli-2.1.fb46-6.el7_5.noarch # # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep gluster-block ; done glusterfs-storage-krlwr +++++++++++++++++++++++ gluster-block-0.2.1-20.el7rhgs.x86_64 glusterfs-storage-pngnm +++++++++++++++++++++++ gluster-block-0.2.1-20.el7rhgs.x86_64 glusterfs-storage-v8z6s +++++++++++++++++++++++ gluster-block-0.2.1-20.el7rhgs.x86_64 # # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep tcmu-runner ; done glusterfs-storage-krlwr +++++++++++++++++++++++ tcmu-runner-1.2.0-20.el7rhgs.x86_64 glusterfs-storage-pngnm +++++++++++++++++++++++ tcmu-runner-1.2.0-20.el7rhgs.x86_64 glusterfs-storage-v8z6s +++++++++++++++++++++++ tcmu-runner-1.2.0-20.el7rhgs.x86_64 # # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-configshell ; done glusterfs-storage-krlwr +++++++++++++++++++++++ python-configshell-1.1.fb23-4.el7_5.noarch glusterfs-storage-pngnm +++++++++++++++++++++++ python-configshell-1.1.fb23-4.el7_5.noarch glusterfs-storage-v8z6s +++++++++++++++++++++++ python-configshell-1.1.fb23-4.el7_5.noarch # # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-rtslib ; done glusterfs-storage-krlwr +++++++++++++++++++++++ python-rtslib-2.1.fb63-12.el7_5.noarch glusterfs-storage-pngnm +++++++++++++++++++++++ python-rtslib-2.1.fb63-12.el7_5.noarch glusterfs-storage-v8z6s +++++++++++++++++++++++ python-rtslib-2.1.fb63-12.el7_5.noarch Verified the following: +++++++++++++++++++++++++++++++ 1. created multiple block volumes mounted on app pods 2. Confirmed in targetcli ls, that we now have two target portal groups per storage object i.e. glfs_tg_pt_gp_ao and glfs_tg_pt_gp_ano, active optimized (AO) and active non optimized(ANO). AO has prio 50 and ANO has priority 10. 3. The load balancing works fine once we have changes in the /etc/multipath.conf file to have prio "alua" Some snippet from the setup for one blockvolume +++++++++++++++++++++++++++++ multipath -ll mpathd (3600140589bba72ef49445bf9501b7d9e) dm-34 LIO-ORG ,TCMU device size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=50 status=active | `- 47:0:0:0 sdo 8:224 active ready running |-+- policy='round-robin 0' prio=10 status=enabled | `- 49:0:0:0 sdq 65:0 active ready running `-+- policy='round-robin 0' prio=10 status=enabled `- 48:0:0:0 sdp 8:240 active ready running #ll /dev/disk/by-path/ip*|grep sdo lrwxrwxrwx. 1 root root 9 Jul 4 10:04 /dev/disk/by-path/ip-10.70.43.230:3260-iscsi-iqn.2016-12.org.gluster-block:89bba72e-f494-45bf-9501-b7d9ec328213-lun-0 -> ../../sdo | o- iqn.2016-12.org.gluster-block:89bba72e-f494-45bf-9501-b7d9ec328213 ................................................ [TPGs: 3] | | o- tpg1 ........................................................................................................... [disabled] | | | o- acls .......................................................................................................... [ACLs: 0] | | | o- luns .......................................................................................................... [LUNs: 1] | | | | o- lun0 ........................ [user/test-vol_glusterfs_claim14_7d6cfdf9-7f43-11e8-a6bb-0a580a800209 (glfs_tg_pt_gp_ao)] | | | o- portals .................................................................................................... [Portals: 1] | | | o- 10.70.43.230:3260 ................................................................................................ [OK] | | o- tpg2 ..................................................................................... [gen-acls, tpg-auth, 1-way auth] | | | o- acls .......................................................................................................... [ACLs: 0] | | | o- luns .......................................................................................................... [LUNs: 1] | | | | o- lun0 ....................... [user/test-vol_glusterfs_claim14_7d6cfdf9-7f43-11e8-a6bb-0a580a800209 (glfs_tg_pt_gp_ano)] | | | o- portals .................................................................................................... [Portals: 1] | | | o- 10.70.43.19:3260 ................................................................................................. [OK] | | o- tpg3 ........................................................................................................... [disabled] | | o- acls .......................................................................................................... [ACLs: 0] | | o- luns .......................................................................................................... [LUNs: 1] | | | o- lun0 ....................... [user/test-vol_glusterfs_claim14_7d6cfdf9-7f43-11e8-a6bb-0a580a800209 (glfs_tg_pt_gp_ano)] | | o- portals .................................................................................................... [Portals: 1] | | o- 10.70.43.53:3260 ................................................................................................. [OK] 4. The /etc/multipath.conf file # cat /etc/multipath.conf # LIO iSCSI # TODO: Add env variables for tweaking devices { device { vendor "LIO-ORG" user_friendly_names "yes" path_grouping_policy "failover" path_selector "round-robin 0" failback immediate path_checker "tur" prio "alua" no_path_retry 120 rr_weight "uniform" } } defaults { user_friendly_names yes find_multipaths yes } blacklist { } --------------------------- 5. We have around 35 block devices created, hence confirmed that a counter is maintained to keep track of distributing the AO tpgs equally # attr -l prio.info Attribute "selinux" has a 30 byte value for prio.info Attribute "block.10.70.43.53" has a 1024 byte value for prio.info Attribute "block.10.70.43.19" has a 1024 byte value for prio.info Attribute "block.10.70.43.230" has a 1024 byte value for prio.info [root@dhcp43-29 block-meta]# for i in `attr -l prio.info|grep "block.10"|cut -d "\"" -f2`; do attr -g $i prio.info; done Attribute "block.10.70.43.53" had a 1024 byte value for prio.info: 4 Attribute "block.10.70.43.19" had a 1024 byte value for prio.info: 5 Attribute "block.10.70.43.230" had a 1024 byte value for prio.info: 5 # cd block-meta [root@dhcp43-29 block-meta]# attr -l prio.info Attribute "selinux" has a 30 byte value for prio.info Attribute "block.10.70.43.53" has a 1024 byte value for prio.info Attribute "block.10.70.43.19" has a 1024 byte value for prio.info Attribute "block.10.70.43.230" has a 1024 byte value for prio.info [root@dhcp43-29 block-meta]# for i in `attr -l prio.info|grep "block.10"|cut -d "\"" -f2`; do attr -g $i prio.info; done Attribute "block.10.70.43.53" had a 1024 byte value for prio.info: 7 Attribute "block.10.70.43.19" had a 1024 byte value for prio.info: 7 Attribute "block.10.70.43.230" had a 1024 byte value for prio.info: 8 Attaching the multipath and targetcli output from the setup for further confirmation. This bug is now moved to verified
Have updated the doc text field , kindly review.
made the required changes.
updated that. thankyou
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2691