1234884 – Selfheal on a volume stops at a particular point and does not resume for a long time

Bug 1234884 - Selfheal on a volume stops at a particular point and does not resume for a long time

Summary: Selfheal on a volume stops at a particular point and does not resume for a lo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Ravishankar N
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:	rebase
Depends On:
Blocks:	1503134
TreeView+	depends on / blocked

Reported:	2015-06-23 12:51 UTC by Apeksha
Modified:	2018-09-17 14:09 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.12.2-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 06:26:56 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreports 1 (11.86 MB, application/x-xz) 2015-06-23 12:56 UTC, Shruti Sampat	no flags	Details
sosreports 2 (10.10 MB, application/x-xz) 2015-06-23 12:57 UTC, Shruti Sampat	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:28:39 UTC

Description Apeksha 2015-06-23 12:51:08 UTC

Description of problem:
Selfheal on a volume stops at a particular point and does not resume for a long time

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-3.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. create a 1X2 dist-rep volume and mount using nfs-ganesha vers=3
2. create directories and files
3. bring down 1 brick of the replica pair
4. rename all the files and directories
5. force start the volume
6. Self-heal process starts and then seems to hang

Actual results: No. of enteries come to a specific number and then stops there

Expected results: No. of enteries in self-heal info must become 0


Additional info:
[root@nfs2 ~]# gluster v heal testvol info
Brick nfs1:/rhs/brick1/brick1/testvol_brick0/
/x1/b1 
/x1/b2 
/x1/b3 
/x1/b4 
'
'
'
/x15/b19 
/x15/b20 
Number of entries: 300

Brick nfs2:/rhs/brick1/brick1/testvol_brick1/
Number of entries: 0
 
Self-heal eventually completes after a few hours

Comment 2 Shruti Sampat 2015-06-23 12:56:30 UTC

Created attachment 1042308 [details]
sosreports 1

Comment 3 Shruti Sampat 2015-06-23 12:57:50 UTC

Created attachment 1042309 [details]
sosreports 2

Comment 5 Apeksha 2015-06-23 13:10:46 UTC

on the client used following script 

1. to create files/directories:

for i in {1..15}; do mkdir /mnt/testvol/a$i; mkdir /mnt/testvol/x$i; for j in {1..20}; do mkdir /mnt/testvol/a$i/b$j; mkdir /mnt/testvol/x$i/y$j; for k in {1..30}; do touch /mnt/testvol/a$i/b$j/c$k; done done done

2. to rename files/directories:

for i in {1..15}; do for j in {1..20}; do mv /mnt/testvol/a$i/b$j /mnt/testvol/x$i/b$j; for k in {1..30}; do mv /mnt/testvol/x$i/b$j/c$k /mnt/testvol/x$i/y$j/c$k; done done done

Comment 11 Vijay Avuthu 2018-03-28 14:37:53 UTC

Update:
==============

Build used : glusterfs-3.12.2-6.el7rhgs.x86_64

Verified below scenarios for both 1 * 2 and 2 * 3

1. create a volume and mount 
2. create directories and files using below

for i in {1..15}; do mkdir /mnt/testvol/a$i; mkdir /mnt/testvol/x$i; for j in {1..20}; do mkdir /mnt/testvol/a$i/b$j; mkdir /mnt/testvol/x$i/y$j; for k in {1..30}; do touch /mnt/testvol/a$i/b$j/c$k; done done done


3. bring down 1 brick of the replica pair ( for 2 * 3 , bring down 1 brick for each replica set )
4. rename all the files and directories using below

for i in {1..15}; do for j in {1..20}; do mv /mnt/testvol/a$i/b$j /mnt/testvol/x$i/b$j; for k in {1..30}; do mv /mnt/testvol/x$i/b$j/c$k /mnt/testvol/x$i/y$j/c$k; done done done

5. force start the volume

Healing is completed without any issues

[root@dhcp35-163 ~]# gluster vol heal 23 info 
Brick 10.70.35.61:/bricks/brick0/testvol_distributed-replicated_brick0
Status: Connected
Number of entries: 0

Brick 10.70.35.174:/bricks/brick0/testvol_distributed-replicated_brick1
Status: Connected
Number of entries: 0

Brick 10.70.35.17:/bricks/brick0/testvol_distributed-replicated_brick2
Status: Connected
Number of entries: 0

Brick 10.70.35.163:/bricks/brick0/testvol_distributed-replicated_brick3
Status: Connected
Number of entries: 0

Brick 10.70.35.136:/bricks/brick0/testvol_distributed-replicated_brick4
Status: Connected
Number of entries: 0

Brick 10.70.35.214:/bricks/brick0/testvol_distributed-replicated_brick5
Status: Connected
Number of entries: 0

[root@dhcp35-163 ~]# 

> Also verified with the steps provided in comment 7

Changing status to Verified.

Comment 13 errata-xmlrpc 2018-09-04 06:26:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.