1562744 – [EC] slow heal speed on disperse volume after brick replacement

Bug 1562744 - [EC] slow heal speed on disperse volume after brick replacement

Summary: [EC] slow heal speed on disperse volume after brick replacement

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Pranith Kumar K
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-04-02 10:46 UTC by Ashish Pandey
Modified:	2018-09-04 07:31 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.12.2-8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 06:45:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:46:54 UTC

Description Ashish Pandey 2018-04-02 10:46:02 UTC

Description of problem:
Description of problem: We are doing a drive replacement test and have noted that the heal speed for disperse volume is very slow.

I replaced a 4TB drive with a different drive. It has been almost 26 hours since that. I see that only about 453 GB of data has been constructed back.

/dev/sdc                                     3.7T  3.7T  7.5G 100% /ws/disk1
/dev/sdd                                     3.7T  3.7T  7.5G 100% /ws/disk2
/dev/sde                                     3.7T  3.7T  7.6G 100% /ws/disk3
/dev/sdg                                     3.7T  453G  3.2T  13% /ws/disk5


Version-Release number of selected component (if applicable): glusterfs-3.8.4-18.el7rhgs.x86_64


Additional info: Volume is a distributed-disperse 2 x (4 + 2) = 12 volume, with the following options reconfigured:

Options Reconfigured:
cluster.shd-wait-qlength: 65536
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
nfs.export-dirs: off
nfs.export-volumes: off
performance.read-ahead: off
auth.allow: xxxxxxxxxxxxxxx
cluster.shd-max-threads: 64


Profile info and df will be uploaded to the bug shortly. Directory structure will be given 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
https://github.com/gluster/glusterfs/issues/354

Comment 7 Nag Pavan Chilakam 2018-05-22 09:40:20 UTC

have tested replace brick for same data set for about 176GB data(on the brick being replaced) on a 2x(4+2) volume 
On 3.3.1-async(3.8.4.54-8) the total time for replace brick to complete (ie heal completion) was 6hrs
on 3.4(3.12.2.9/11) was 4.5 hours

hence ie about 25% improvement.
hence moving the bug to verified

Comment 9 errata-xmlrpc 2018-09-04 06:45:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.