Bug 1234884

Summary:

Selfheal on a volume stops at a particular point and does not resume for a long time

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Apeksha <akhakhar>

Component:

replicate

Assignee:

Ravishankar N <ravishankar>

Status:

CLOSED ERRATA

QA Contact:

Vijay Avuthu <vavuthu>

Severity:

high

Docs Contact:

Priority:

high

Version:

rhgs-3.1

CC:

nchilaka, ravishankar, rhinduja, rhs-bugs, sheggodu, ssampat

Target Milestone:

---

Keywords:

ZStream

Target Release:

RHGS 3.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

rebase

Fixed In Version:

glusterfs-3.12.2-1

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-04 06:26:56 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1503134

Attachments:

Description	Flags
sosreports 1	none
sosreports 2	none

Description Apeksha 2015-06-23 12:51:08 UTC

Description of problem:
Selfheal on a volume stops at a particular point and does not resume for a long time

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-3.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. create a 1X2 dist-rep volume and mount using nfs-ganesha vers=3
2. create directories and files
3. bring down 1 brick of the replica pair
4. rename all the files and directories
5. force start the volume
6. Self-heal process starts and then seems to hang

Actual results: No. of enteries come to a specific number and then stops there

Expected results: No. of enteries in self-heal info must become 0


Additional info:
[root@nfs2 ~]# gluster v heal testvol info
Brick nfs1:/rhs/brick1/brick1/testvol_brick0/
/x1/b1 
/x1/b2 
/x1/b3 
/x1/b4 
'
'
'
/x15/b19 
/x15/b20 
Number of entries: 300

Brick nfs2:/rhs/brick1/brick1/testvol_brick1/
Number of entries: 0
 
Self-heal eventually completes after a few hours

Comment 2 Shruti Sampat 2015-06-23 12:56:30 UTC

Created attachment 1042308 [details]
sosreports 1

Comment 3 Shruti Sampat 2015-06-23 12:57:50 UTC

Created attachment 1042309 [details]
sosreports 2

Comment 5 Apeksha 2015-06-23 13:10:46 UTC

on the client used following script 

1. to create files/directories:

for i in {1..15}; do mkdir /mnt/testvol/a$i; mkdir /mnt/testvol/x$i; for j in {1..20}; do mkdir /mnt/testvol/a$i/b$j; mkdir /mnt/testvol/x$i/y$j; for k in {1..30}; do touch /mnt/testvol/a$i/b$j/c$k; done done done

2. to rename files/directories:

for i in {1..15}; do for j in {1..20}; do mv /mnt/testvol/a$i/b$j /mnt/testvol/x$i/b$j; for k in {1..30}; do mv /mnt/testvol/x$i/b$j/c$k /mnt/testvol/x$i/y$j/c$k; done done done

Comment 11 Vijay Avuthu 2018-03-28 14:37:53 UTC

Update:
==============

Build used : glusterfs-3.12.2-6.el7rhgs.x86_64

Verified below scenarios for both 1 * 2 and 2 * 3

1. create a volume and mount 
2. create directories and files using below

for i in {1..15}; do mkdir /mnt/testvol/a$i; mkdir /mnt/testvol/x$i; for j in {1..20}; do mkdir /mnt/testvol/a$i/b$j; mkdir /mnt/testvol/x$i/y$j; for k in {1..30}; do touch /mnt/testvol/a$i/b$j/c$k; done done done


3. bring down 1 brick of the replica pair ( for 2 * 3 , bring down 1 brick for each replica set )
4. rename all the files and directories using below

for i in {1..15}; do for j in {1..20}; do mv /mnt/testvol/a$i/b$j /mnt/testvol/x$i/b$j; for k in {1..30}; do mv /mnt/testvol/x$i/b$j/c$k /mnt/testvol/x$i/y$j/c$k; done done done

5. force start the volume

Healing is completed without any issues

[root@dhcp35-163 ~]# gluster vol heal 23 info 
Brick 10.70.35.61:/bricks/brick0/testvol_distributed-replicated_brick0
Status: Connected
Number of entries: 0

Brick 10.70.35.174:/bricks/brick0/testvol_distributed-replicated_brick1
Status: Connected
Number of entries: 0

Brick 10.70.35.17:/bricks/brick0/testvol_distributed-replicated_brick2
Status: Connected
Number of entries: 0

Brick 10.70.35.163:/bricks/brick0/testvol_distributed-replicated_brick3
Status: Connected
Number of entries: 0

Brick 10.70.35.136:/bricks/brick0/testvol_distributed-replicated_brick4
Status: Connected
Number of entries: 0

Brick 10.70.35.214:/bricks/brick0/testvol_distributed-replicated_brick5
Status: Connected
Number of entries: 0

[root@dhcp35-163 ~]# 

> Also verified with the steps provided in comment 7

Changing status to Verified.

Comment 13 errata-xmlrpc 2018-09-04 06:26:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607