1652598 – Gluster not healing files (

Bug 1652598 - Gluster not healing files (

Summary: Gluster not healing files (

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.12
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-22 13:05 UTC by ryan
Modified:	2018-12-03 09:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-12-03 09:34:00 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Gluster heal log for volume with issue (59.89 KB, text/plain) 2018-11-22 13:05 UTC, ryan	no flags	Details
View All

Description ryan 2018-11-22 13:05:22 UTC

Created attachment 1507930 [details]
Gluster heal log for volume with issue

Description of problem:
Gluster not replicating/healing files

Version-Release number of selected component (if applicable):
Gluster 3.12.14


How reproducible:
Unknown

Steps to Reproduce:
1.Create distributed-replicated (2 replica) volume
2. Add some files
3. Take one node offline
4. Add more files to volume
5. Bring other node back online
6. Self-heal doesn't work

Actual results:
Self-heal daemon does not copy file to other node, instead gflheal log is full of messages like:
[2018-11-22 11:38:50.298813] W [dict.c:656:dict_ref] (-->/usr/lib64/glusterfs/3.12.14/xlator/cluster/replicate.so(+0x62423) [0x7f2cfec9a423] -->/lib64/libglusterfs.so.0(syncop_getxattr_cbk+0x34) [0x7f2d135398d4] -->/lib64/libglusterfs.so.0(dict_ref+0xbd) [0x7f2d134f7c7d] ) 0-dict: dict is NULL [Invalid argument]

Expected results:
Self-heal copies files back to failed volume

Additional info:
heal log attached

Comment 1 ryan 2018-11-29 09:59:35 UTC

We found that the gluster SHD daemons had died, potentially due to XFS filesystem issues.

We found that the only way to restart these SHDs was to stop the volume and start it again. Is there a better way of doing this?

Comment 2 Ravishankar N 2018-12-03 04:16:18 UTC

(In reply to ryan from comment #1)
> We found that the gluster SHD daemons had died, potentially due to XFS
> filesystem issues.
> 

Okay, can the bug be closed?

> We found that the only way to restart these SHDs was to stop the volume and
> start it again. Is there a better way of doing this?

You can do 'gluster volume start <volname> force`. This will restart the shd without affecting the running brick processes.

Comment 3 ryan 2018-12-03 09:34:00 UTC

Hi Ravishankar,

Thanks for the info, i'll try that next time.
Yes, ticket can be closed.

Many thanks,
Ryan

Note You need to log in before you can comment on or make changes to this bug.