1511868 – Non-Homogeneous Distribution of Bricks across drives in Backend

Bug 1511868 - Non-Homogeneous Distribution of Bricks across drives in Backend

Summary: Non-Homogeneous Distribution of Bricks across drives in Backend

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	heketi
Sub Component:
Version:	cns-3.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Michael Adam
QA Contact:	Rachael
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1543779 OCS-3.11.1-devel-triage-done
TreeView+	depends on / blocked

Reported:	2017-11-10 10:30 UTC by Shekhar Berry
Modified:	2019-06-18 12:35 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-18 12:35:41 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Shekhar Berry 2017-11-10 10:30:14 UTC

Description of problem:

There could be a use case in CNS environment where we have large number of small size volumes. In such case bricks corresponding to each volume ends up landing in one or two drives and not on all the backend drives.

In our 3 node CNS environment when we scale from 100 5GB Volumes to 1000 5GB Volumes, corresponding 100 to 1000 bricks created on each CNS nodes doesn't distribute across all 12 drives in the backend.

For the case of 100, 200 Volume only 2 out of 12 HDDs were utilized whereas in case of 500, 1000 Volume test only 6 out of 12 HDDs were utilized. Lesser the number of HDDs used in the backend less the performance so its imperative we change the code of Heketi to make sure bricks are distributed homogeneously.

The following link points to utilization of HDDs for 100, 200, 500 and 1000 volume case:

http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/disk_utilization/

If you see the Images in link above you can see how mane drives are actually working when Write IO is being performed on them .This will show how many drives are actually hosting bricks.

Version-Release number of selected component (if applicable):

8 servers were used to setup Openshift with 1 of them being master server. Master node was schedulable
3 servers out of 8 above were dedicated for CNS deployment. These 3 servers were non-schedulable i.e. only hosting storage pods and no application pods on them.
48 GB RAM, 2 CPU sockets with 6 cores each, 12 processors in total on all 8 servers
3 CNS nodes comprised 12 7200 RPMs Hard Drives of 930GB capacity. All were part of CNS topology giving it a total capacity ~11TB (replica 3 setup)

kernal : 3.10.0-693.el7.x86_64
penshift Version : v3.6.173.0.7
Kubernetes : v1.6.1+5115d708d7
Docker : 1.12.6 , docker-1.12.6-31.1.git97ba2c0.el7.x86_64
rhgs server image : rhgs3/rhgs-server-rhel7:3.3.0-24
volmanager : rhgs3/rhgs-volmanager-rhel7:3.3.0-27
heketi : 5.0.0-11.el7rhgs.x86_64 and heketi-client-5.0.0-11.el7rhgs.x86_64
cns-deploy : cns-deploy-5.0.0-41.el7rhgs.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Scale 100 small size Volumes in a 3 nodes CNS environment (make sure there is significant number of HDDs in the config)
2. Check where the bricks land
3.

Actual results:

Bricks land in 1-2 HDDs out of 12 HDDs

Expected results:

Bricks should be homogeneously distributed across all the HDDs in the backend.

Additional info:

Comment 3 Michael Adam 2018-01-11 10:05:54 UTC

Thanks for reporting! We will look into improving this soon.

Apart from the situation of multiple disks per node on a 3-node cluster, there is also the situation of more than three nodes, where inhomogeneity could occur.

Comment 7 Niels de Vos 2018-05-08 08:19:10 UTC

John, IIRC you re-modelled the brick allocation logic. Do you see a chance to improve the distribution so that the devices will host the same amount of bricks?

I am not sure how the algorithm is implemented now, but it may be more random and a round-robin way could result in a more 'equal' distribution.

Comment 8 John Mulligan 2018-05-08 13:59:08 UTC

It can be done, and certainly should be done eventually. However, it's not a small job IMO. Recent refactoring should make this easier but all current "placers" still rely on the same basic code that can produce these uneven layouts.

Comment 9 Niels de Vos 2018-05-08 14:45:08 UTC

Moving this out to cns-3.11.0, might become a glusterd2 enhancement later.

Note You need to log in before you can comment on or make changes to this bug.