Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1652598

Summary: Gluster not healing files (
Product: [Community] GlusterFS Reporter: ryan
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.12CC: bugs, dominic, ravishankar, ryan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-03 09:34:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Gluster heal log for volume with issue none

Description ryan 2018-11-22 13:05:22 UTC
Created attachment 1507930 [details]
Gluster heal log for volume with issue

Description of problem:
Gluster not replicating/healing files

Version-Release number of selected component (if applicable):
Gluster 3.12.14


How reproducible:
Unknown

Steps to Reproduce:
1.Create distributed-replicated (2 replica) volume
2. Add some files
3. Take one node offline
4. Add more files to volume
5. Bring other node back online
6. Self-heal doesn't work

Actual results:
Self-heal daemon does not copy file to other node, instead gflheal log is full of messages like:
[2018-11-22 11:38:50.298813] W [dict.c:656:dict_ref] (-->/usr/lib64/glusterfs/3.12.14/xlator/cluster/replicate.so(+0x62423) [0x7f2cfec9a423] -->/lib64/libglusterfs.so.0(syncop_getxattr_cbk+0x34) [0x7f2d135398d4] -->/lib64/libglusterfs.so.0(dict_ref+0xbd) [0x7f2d134f7c7d] ) 0-dict: dict is NULL [Invalid argument]

Expected results:
Self-heal copies files back to failed volume

Additional info:
heal log attached

Comment 1 ryan 2018-11-29 09:59:35 UTC
We found that the gluster SHD daemons had died, potentially due to XFS filesystem issues.

We found that the only way to restart these SHDs was to stop the volume and start it again. Is there a better way of doing this?

Comment 2 Ravishankar N 2018-12-03 04:16:18 UTC
(In reply to ryan from comment #1)
> We found that the gluster SHD daemons had died, potentially due to XFS
> filesystem issues.
> 

Okay, can the bug be closed?

> We found that the only way to restart these SHDs was to stop the volume and
> start it again. Is there a better way of doing this?

You can do 'gluster volume start <volname> force`. This will restart the shd without affecting the running brick processes.

Comment 3 ryan 2018-12-03 09:34:00 UTC
Hi Ravishankar,

Thanks for the info, i'll try that next time.
Yes, ticket can be closed.

Many thanks,
Ryan