Bug 1375465

Summary:	[RFE] Implement multi threaded self-heal for ec volumes
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Atin Mukherjee <amukherj>
Component:	disperse	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.1	CC:	asoman, asrivast, bugs, pkarampu, rcyriac, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone:	---	Keywords:	FutureFeature, Triaged
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-3	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1368451	Environment:
Last Closed:	2017-03-23 05:47:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1368451
Bug Blocks:	1351503

Description Atin Mukherjee 2016-09-13 08:27:49 UTC

+++ This bug was initially created as a clone of Bug #1368451 +++

Description of problem:
We need a way to increase parallelism in number of heals that can happen in disperse volumes as well. This bug tracks that RFE

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-08-19 10:06:55 EDT ---

REVIEW: http://review.gluster.org/15083 (cluster/ec: Do multi-threaded self-heal) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Worker Ant on 2016-08-24 13:21:08 EDT ---

REVIEW: http://review.gluster.org/15083 (cluster/ec: Do multi-threaded self-heal) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Worker Ant on 2016-08-24 18:24:26 EDT ---

COMMIT: http://review.gluster.org/15083 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 56a79b357e09d91305994fcc0b2d250cb9ac243d
Author: Pranith Kumar K <pkarampu>
Date:   Thu Aug 4 00:41:16 2016 +0530

    cluster/ec: Do multi-threaded self-heal
    
    BUG: 1368451
    Change-Id: I5d6b91d714ad6906dc478a401e614115c89a8fbb
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/15083
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Ashish Pandey <aspandey>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 5 Nag Pavan Chilakam 2016-11-15 11:27:07 UTC

QATP:
=====
1)
1) Create 4+2 volume.
2) Kill a brick process from the server.
3) Start IO using dd with creating multiple small files in thousands
4) Open the log files on all the bricks 
5) start the brick force by gluster volume <vol-name> start force
6) logs should show healing in parallel

Expected :-
Check for the threads in TOP command.
Heals should be done. from source brick to empty brick.
Threads should work parallel and heal files 


2)
Summary:- Healing with single thread and Multithread.
Note turn of metadata , entry and data heal and self heal daemon.
1) Create single 4+2 volume on two setups.
2) Change the "cluster.shd-max-threads" to default on one setup and 16 to second setup.
3) Kill one of the brick process and create same mulitple files and folders on the two volumes
4) Now start the metadata , entry and data heal and self heal daemon and the brick process.
5) Calculate the time for both the volumes, Mulithreaded volume should heal faster.
6) check logs for heal progress.


Expected:-
Mulitthreaded heal should be done faster.
No heal should fail.

Comment 6 Nag Pavan Chilakam 2016-11-15 11:28:16 UTC

verfied mtsh for ec volume and I see that parallel heals are happening to improve performance.

Moving to verified.

Will raise new bugs if any on this seperately

verified version:3.8.4-5

Comment 8 errata-xmlrpc 2017-03-23 05:47:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html