Bug 1652598 - Gluster not healing files (
Summary: Gluster not healing files (
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.12
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-22 13:05 UTC by ryan
Modified: 2018-12-03 09:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-03 09:34:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Gluster heal log for volume with issue (59.89 KB, text/plain)
2018-11-22 13:05 UTC, ryan
no flags Details

Description ryan 2018-11-22 13:05:22 UTC
Created attachment 1507930 [details]
Gluster heal log for volume with issue

Description of problem:
Gluster not replicating/healing files

Version-Release number of selected component (if applicable):
Gluster 3.12.14


How reproducible:
Unknown

Steps to Reproduce:
1.Create distributed-replicated (2 replica) volume
2. Add some files
3. Take one node offline
4. Add more files to volume
5. Bring other node back online
6. Self-heal doesn't work

Actual results:
Self-heal daemon does not copy file to other node, instead gflheal log is full of messages like:
[2018-11-22 11:38:50.298813] W [dict.c:656:dict_ref] (-->/usr/lib64/glusterfs/3.12.14/xlator/cluster/replicate.so(+0x62423) [0x7f2cfec9a423] -->/lib64/libglusterfs.so.0(syncop_getxattr_cbk+0x34) [0x7f2d135398d4] -->/lib64/libglusterfs.so.0(dict_ref+0xbd) [0x7f2d134f7c7d] ) 0-dict: dict is NULL [Invalid argument]

Expected results:
Self-heal copies files back to failed volume

Additional info:
heal log attached

Comment 1 ryan 2018-11-29 09:59:35 UTC
We found that the gluster SHD daemons had died, potentially due to XFS filesystem issues.

We found that the only way to restart these SHDs was to stop the volume and start it again. Is there a better way of doing this?

Comment 2 Ravishankar N 2018-12-03 04:16:18 UTC
(In reply to ryan from comment #1)
> We found that the gluster SHD daemons had died, potentially due to XFS
> filesystem issues.
> 

Okay, can the bug be closed?

> We found that the only way to restart these SHDs was to stop the volume and
> start it again. Is there a better way of doing this?

You can do 'gluster volume start <volname> force`. This will restart the shd without affecting the running brick processes.

Comment 3 ryan 2018-12-03 09:34:00 UTC
Hi Ravishankar,

Thanks for the info, i'll try that next time.
Yes, ticket can be closed.

Many thanks,
Ryan


Note You need to log in before you can comment on or make changes to this bug.