Created attachment 1507930 [details] Gluster heal log for volume with issue Description of problem: Gluster not replicating/healing files Version-Release number of selected component (if applicable): Gluster 3.12.14 How reproducible: Unknown Steps to Reproduce: 1.Create distributed-replicated (2 replica) volume 2. Add some files 3. Take one node offline 4. Add more files to volume 5. Bring other node back online 6. Self-heal doesn't work Actual results: Self-heal daemon does not copy file to other node, instead gflheal log is full of messages like: [2018-11-22 11:38:50.298813] W [dict.c:656:dict_ref] (-->/usr/lib64/glusterfs/3.12.14/xlator/cluster/replicate.so(+0x62423) [0x7f2cfec9a423] -->/lib64/libglusterfs.so.0(syncop_getxattr_cbk+0x34) [0x7f2d135398d4] -->/lib64/libglusterfs.so.0(dict_ref+0xbd) [0x7f2d134f7c7d] ) 0-dict: dict is NULL [Invalid argument] Expected results: Self-heal copies files back to failed volume Additional info: heal log attached
We found that the gluster SHD daemons had died, potentially due to XFS filesystem issues. We found that the only way to restart these SHDs was to stop the volume and start it again. Is there a better way of doing this?
(In reply to ryan from comment #1) > We found that the gluster SHD daemons had died, potentially due to XFS > filesystem issues. > Okay, can the bug be closed? > We found that the only way to restart these SHDs was to stop the volume and > start it again. Is there a better way of doing this? You can do 'gluster volume start <volname> force`. This will restart the shd without affecting the running brick processes.
Hi Ravishankar, Thanks for the info, i'll try that next time. Yes, ticket can be closed. Many thanks, Ryan