1575529 – [RFE] load balancing the IO connections (active paths) across HA nodes

Bug 1575529 - [RFE] load balancing the IO connections (active paths) across HA nodes

Summary: [RFE] load balancing the IO connections (active paths) across HA nodes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	cns-3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	CNS 3.10
Assignee:	Prasanna Kumar Kalever
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:	1585197
Blocks:	1568860
TreeView+	depends on / blocked

Reported:	2018-05-07 08:14 UTC by Prasanna Kumar Kalever
Modified:	2018-09-12 09:26 UTC (History)
CC List:	12 users (show)
Fixed In Version:	gluster-block-0.2.1-19.el7rhgs
Doc Type:	Enhancement
Doc Text:	Currently, multipathing priority configuration is set to constant with all the HA paths. Hence it is possible that the load might not be distributed uniformly across the available HA nodes. With this update, priority based load balancing at gluster-block is introduced. The management daemon gluster-block reads the load balance information from the metadata and selects a high priority path from HA based on the data. When the block device is requested for creation, high priority is set on a path whose node is least used. While logging to the device, the initiator side multipath tools picks the high priority path and marks it as active. This way it is possible to distribute the balance across the nodes.
Clone Of:
Environment:
Last Closed:	2018-09-12 09:25:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2691	0	None	None	None	2018-09-12 09:26:41 UTC

Description Prasanna Kumar Kalever 2018-05-07 08:14:17 UTC

Description of problem:

Currently we recommend multipathing priority configuration as constant.

# cat /etc/multipath.conf
[...]
# LIO iSCSI
devices {
        device {
                vendor "LIO-ORG"
                user_friendly_names "yes" # names like mpatha
                path_grouping_policy "failover" # one path per group
                path_selector "round-robin 0"
                failback immediate
                path_checker "tur"
                prio "const"
                no_path_retry 120
                rr_weight "uniform"
        }
}

Setting 'prio const' means setting all paths priority as 1, hence if we create blocks with NODE1, NODE2 and NODE3 (HA 3), they will be assigned to portals for tpg1, tpg2 and tp3 respectively.

Since the path prio is 1 for all, it is very likely that tpg1 is mostly picked and hence the active path will be target on NODE1 for all the blocks.

In order that we distribute the IO load more or less equally, we need to have a way to set the priorities of the paths at the target side.

Currently

[root@localhost ~]# targetcli ls                                                    
o- / ......................................................................... [...]
  o- backstores .............................................................. [...]
  | o- block .................................................. [Storage Objects: 0]
  | o- fileio ................................................. [Storage Objects: 0]
  | o- pscsi .................................................. [Storage Objects: 0]
  | o- ramdisk ................................................ [Storage Objects: 0]
  | o- user:glfs .............................................. [Storage Objects: 1]
  |   o- tcmublock  [sample.124.162/block-store/d87e2981-51b2-4f82-acaa-0dc8
2037854e (1.0GiB) activated]                                                        
  |     o- alua ................................................... [ALUA Groups: 2]
  |       o- default_tg_pt_gp ....................... [ALUA state: Active/optimized]
  |       o- glfs_tg_pt_gp .......................... [ALUA state: Active/optimized]
  o- iscsi ............................................................ [Targets: 1]
  | o- iqn.2016-12.org.gluster-block:d87e2981-51b2-4f82-acaa-0dc82037854e  [TPGs: 3]
  |   o- tpg1 .................................................. [gen-acls, no-auth]
  |   | o- acls .......................................................... [ACLs: 0]
  |   | o- luns .......................................................... [LUNs: 1]
  |   | | o- lun0 ................................. [user/tcmublock (glfs_tg_pt_gp)]
  |   | o- portals .................................................... [Portals: 1]
  |   |   o- 192.168.124.162:3260 ............................................. [OK]
  |   o- tpg2 ........................................................... [disabled]
  |   | o- acls .......................................................... [ACLs: 0]
  |   | o- luns .......................................................... [LUNs: 1]
  |   | | o- lun0 ................................. [user/tcmublock (glfs_tg_pt_gp)]
  |   | o- portals .................................................... [Portals: 1]
  |   |   o- 192.168.124.149:3260 ............................................. [OK]
  |   o- tpg3 ........................................................... [disabled]
  |     o- acls .......................................................... [ACLs: 0]
  |     o- luns .......................................................... [LUNs: 1]
  |     | o- lun0 ................................. [user/tcmublock (glfs_tg_pt_gp)]
  |     o- portals .................................................... [Portals: 1]
  |       o- 192.168.124.184:3260 ............................................. [OK]
  o- loopback ......................................................... [Targets: 0]
as part of specifying priorities, we will have to create two target portal groups per storage object i.e. glfs_tg_pt_gp_ao and glfs_tg_pt_gp_ano, active optimized (AO) and active non optimized(ANO). AO will have prio 50 and ANO will have priority 10.

[root@localhost ~]# targetcli ls                                                    
o- / ......................................................................... [...]
  o- backstores .............................................................. [...]
  | o- block .................................................. [Storage Objects: 0]
  | o- fileio ................................................. [Storage Objects: 0]
  | o- pscsi .................................................. [Storage Objects: 0]
  | o- ramdisk ................................................ [Storage Objects: 0]
  | o- user:glfs .............................................. [Storage Objects: 1]
  |   o- tcmublock  [sample.124.162/block-store/d87e2981-51b2-4f82-acaa-0dc8
2037854e (1.0GiB) activated]                                                        
  |     o- alua ................................................... [ALUA Groups: 3]
  |       o- default_tg_pt_gp ....................... [ALUA state: Active/optimized]
  |       o- glfs_tg_pt_gp_ano .................. [ALUA state: Active/non-optimized]
  |       o- glfs_tg_pt_gp_ao ....................... [ALUA state: Active/optimized]
  o- iscsi ............................................................ [Targets: 1]
  | o- iqn.2016-12.org.gluster-block:d87e2981-51b2-4f82-acaa-0dc82037854e  [TPGs: 3]
  |   o- tpg1 .................................................. [gen-acls, no-auth]
  |   | o- acls .......................................................... [ACLs: 0]
  |   | o- luns .......................................................... [LUNs: 1]
  |   | | o- lun0 .............................. [user/tcmublock (glfs_tg_pt_gp_ao)]
  |   | o- portals .................................................... [Portals: 1]
  |   |   o- 192.168.124.162:3260 ............................................. [OK]
  |   o- tpg2 ........................................................... [disabled]
  |   | o- acls .......................................................... [ACLs: 0]
  |   | o- luns .......................................................... [LUNs: 1]
  |   | | o- lun0 ............................. [user/tcmublock (glfs_tg_pt_gp_ano)]
  |   | o- portals .................................................... [Portals: 1]
  |   |   o- 192.168.124.149:3260 ............................................. [OK]
  |   o- tpg3 ........................................................... [disabled]
  |     o- acls .......................................................... [ACLs: 0]
  |     o- luns .......................................................... [LUNs: 1]
  |     | o- lun0 ............................. [user/tcmublock (glfs_tg_pt_gp_ano)]
  |     o- portals .................................................... [Portals: 1]
  |       o- 192.168.124.184:3260 ............................................. [OK]
  o- loopback ......................................................... [Targets: 0]

And to use this benefit, on the initiator side we will have to set 'prio alua'

# cat /etc/multipath.conf
[...]
# LIO iSCSI
devices {
        device {
                vendor "LIO-ORG"
                user_friendly_names "yes" # names like mpatha
                path_grouping_policy "failover" # one path per group
                path_selector "round-robin 0"
                failback immediate
                path_checker "tur"
                prio "alua"
                no_path_retry 120
                rr_weight "uniform"
        }
}

Unfortunately we don't have a way to get the active devices on a given target node from target side.
We many have to build a logic to distribute the AO tpgs equally.
We want to maintain a counter for AO per node in every volume.

say, If we have IP1, IP2, IP3 (HA 3)
We will have a counter file (in every volume), where we maintain AO count:
$ cat .counter
IP1: 70
1P2: 58
IP3: 63

For example, the next block create command will pick AO on IP2 (since it has less load) any better solutions are welcome.

Changes needed:
gluster-block
tcmu-runner (dummy lock implementation in the glfs handler)

useful links:
https://github.com/open-iscsi/rtslib-fb/blob/master/rtslib/alua.py#L312



Upstream discussion:
https://github.com/gluster/gluster-block/issues/82

Comment 6 Neha Berry 2018-07-04 11:23:20 UTC

The changes suggested by this RFE are in place since gluster-block version gluster-block-0.2.1-19.el7rhgs.


Versions used for verification
=============


#  for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep targetcli ; done
glusterfs-storage-krlwr
+++++++++++++++++++++++
targetcli-2.1.fb46-6.el7_5.noarch
glusterfs-storage-pngnm
+++++++++++++++++++++++
targetcli-2.1.fb46-6.el7_5.noarch
glusterfs-storage-v8z6s
+++++++++++++++++++++++
targetcli-2.1.fb46-6.el7_5.noarch
#
# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep gluster-block ; done
glusterfs-storage-krlwr
+++++++++++++++++++++++
gluster-block-0.2.1-20.el7rhgs.x86_64
glusterfs-storage-pngnm
+++++++++++++++++++++++
gluster-block-0.2.1-20.el7rhgs.x86_64
glusterfs-storage-v8z6s
+++++++++++++++++++++++
gluster-block-0.2.1-20.el7rhgs.x86_64
#
# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep tcmu-runner ; done
glusterfs-storage-krlwr
+++++++++++++++++++++++
tcmu-runner-1.2.0-20.el7rhgs.x86_64
glusterfs-storage-pngnm
+++++++++++++++++++++++
tcmu-runner-1.2.0-20.el7rhgs.x86_64
glusterfs-storage-v8z6s
+++++++++++++++++++++++
tcmu-runner-1.2.0-20.el7rhgs.x86_64
#
# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-configshell ; done
glusterfs-storage-krlwr
+++++++++++++++++++++++
python-configshell-1.1.fb23-4.el7_5.noarch
glusterfs-storage-pngnm
+++++++++++++++++++++++
python-configshell-1.1.fb23-4.el7_5.noarch
glusterfs-storage-v8z6s
+++++++++++++++++++++++
python-configshell-1.1.fb23-4.el7_5.noarch
#
# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-rtslib ; done
glusterfs-storage-krlwr
+++++++++++++++++++++++
python-rtslib-2.1.fb63-12.el7_5.noarch
glusterfs-storage-pngnm
+++++++++++++++++++++++
python-rtslib-2.1.fb63-12.el7_5.noarch
glusterfs-storage-v8z6s
+++++++++++++++++++++++
python-rtslib-2.1.fb63-12.el7_5.noarch


Verified the following:
+++++++++++++++++++++++++++++++

1. created multiple block volumes mounted on app pods
2. Confirmed in targetcli ls, that we now have  two target portal groups per storage object i.e. glfs_tg_pt_gp_ao and glfs_tg_pt_gp_ano, active optimized (AO) and active non optimized(ANO). AO has prio 50 and ANO has priority 10.

3. The load balancing works fine once we have changes in the /etc/multipath.conf file to have prio "alua"

Some snippet from the setup for one blockvolume
+++++++++++++++++++++++++++++

multipath -ll

mpathd (3600140589bba72ef49445bf9501b7d9e) dm-34 LIO-ORG ,TCMU device     
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 47:0:0:0 sdo  8:224  active ready running
|-+- policy='round-robin 0' prio=10 status=enabled
| `- 49:0:0:0 sdq  65:0   active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 48:0:0:0 sdp  8:240  active ready running


#ll /dev/disk/by-path/ip*|grep sdo 
lrwxrwxrwx. 1 root root  9 Jul  4 10:04 /dev/disk/by-path/ip-10.70.43.230:3260-iscsi-iqn.2016-12.org.gluster-block:89bba72e-f494-45bf-9501-b7d9ec328213-lun-0 -> ../../sdo


| o- iqn.2016-12.org.gluster-block:89bba72e-f494-45bf-9501-b7d9ec328213 ................................................ [TPGs: 3]
  | | o- tpg1 ........................................................................................................... [disabled]
  | | | o- acls .......................................................................................................... [ACLs: 0]
  | | | o- luns .......................................................................................................... [LUNs: 1]
  | | | | o- lun0 ........................ [user/test-vol_glusterfs_claim14_7d6cfdf9-7f43-11e8-a6bb-0a580a800209 (glfs_tg_pt_gp_ao)]
  | | | o- portals .................................................................................................... [Portals: 1]
  | | |   o- 10.70.43.230:3260 ................................................................................................ [OK]
  | | o- tpg2 ..................................................................................... [gen-acls, tpg-auth, 1-way auth]
  | | | o- acls .......................................................................................................... [ACLs: 0]
  | | | o- luns .......................................................................................................... [LUNs: 1]
  | | | | o- lun0 ....................... [user/test-vol_glusterfs_claim14_7d6cfdf9-7f43-11e8-a6bb-0a580a800209 (glfs_tg_pt_gp_ano)]
  | | | o- portals .................................................................................................... [Portals: 1]
  | | |   o- 10.70.43.19:3260 ................................................................................................. [OK]
  | | o- tpg3 ........................................................................................................... [disabled]
  | |   o- acls .......................................................................................................... [ACLs: 0]
  | |   o- luns .......................................................................................................... [LUNs: 1]
  | |   | o- lun0 ....................... [user/test-vol_glusterfs_claim14_7d6cfdf9-7f43-11e8-a6bb-0a580a800209 (glfs_tg_pt_gp_ano)]
  | |   o- portals .................................................................................................... [Portals: 1]
  | |     o- 10.70.43.53:3260 ................................................................................................. [OK]


4. The /etc/multipath.conf file

# cat /etc/multipath.conf
# LIO iSCSI
# TODO: Add env variables for tweaking
devices {
        device {
                vendor "LIO-ORG"
                user_friendly_names "yes" 
                path_grouping_policy "failover"
                path_selector "round-robin 0"
                failback immediate
                path_checker "tur"
                prio "alua"
                no_path_retry 120
                rr_weight "uniform"
        }
}
defaults {
	user_friendly_names yes
	find_multipaths yes
}


blacklist {
}


---------------------------

5. We have around 35 block devices created, hence confirmed that a counter is maintained to keep track of distributing the AO tpgs equally

# attr -l prio.info 
Attribute "selinux" has a 30 byte value for prio.info
Attribute "block.10.70.43.53" has a 1024 byte value for prio.info
Attribute "block.10.70.43.19" has a 1024 byte value for prio.info
Attribute "block.10.70.43.230" has a 1024 byte value for prio.info
[root@dhcp43-29 block-meta]# for i in `attr -l prio.info|grep "block.10"|cut -d "\"" -f2`; do attr -g $i prio.info; done
Attribute "block.10.70.43.53" had a 1024 byte value for prio.info:
4
Attribute "block.10.70.43.19" had a 1024 byte value for prio.info:
5
Attribute "block.10.70.43.230" had a 1024 byte value for prio.info:
5


# cd block-meta
[root@dhcp43-29 block-meta]# attr -l prio.info 
Attribute "selinux" has a 30 byte value for prio.info
Attribute "block.10.70.43.53" has a 1024 byte value for prio.info
Attribute "block.10.70.43.19" has a 1024 byte value for prio.info
Attribute "block.10.70.43.230" has a 1024 byte value for prio.info
[root@dhcp43-29 block-meta]# for i in `attr -l prio.info|grep "block.10"|cut -d "\"" -f2`; do attr -g $i prio.info; done
Attribute "block.10.70.43.53" had a 1024 byte value for prio.info:
7
Attribute "block.10.70.43.19" had a 1024 byte value for prio.info:
7
Attribute "block.10.70.43.230" had a 1024 byte value for prio.info:
8

Attaching the multipath and targetcli output from the setup for further confirmation.


This bug is now moved to verified

Comment 9 Anjana KD 2018-09-07 10:08:44 UTC

Have updated the doc text field , kindly review.

Comment 11 Anjana KD 2018-09-07 13:05:34 UTC

made the required changes.

Comment 13 Anjana KD 2018-09-07 14:16:21 UTC

updated that. thankyou

Comment 15 errata-xmlrpc 2018-09-12 09:25:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2691

Note You need to log in before you can comment on or make changes to this bug.