Bug 1652598
| Summary: | Gluster not healing files ( | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | ryan | ||||
| Component: | replicate | Assignee: | Ravishankar N <ravishankar> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.12 | CC: | bugs, dominic, ravishankar, ryan | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-12-03 09:34:00 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
We found that the gluster SHD daemons had died, potentially due to XFS filesystem issues. We found that the only way to restart these SHDs was to stop the volume and start it again. Is there a better way of doing this? (In reply to ryan from comment #1) > We found that the gluster SHD daemons had died, potentially due to XFS > filesystem issues. > Okay, can the bug be closed? > We found that the only way to restart these SHDs was to stop the volume and > start it again. Is there a better way of doing this? You can do 'gluster volume start <volname> force`. This will restart the shd without affecting the running brick processes. Hi Ravishankar, Thanks for the info, i'll try that next time. Yes, ticket can be closed. Many thanks, Ryan |
Created attachment 1507930 [details] Gluster heal log for volume with issue Description of problem: Gluster not replicating/healing files Version-Release number of selected component (if applicable): Gluster 3.12.14 How reproducible: Unknown Steps to Reproduce: 1.Create distributed-replicated (2 replica) volume 2. Add some files 3. Take one node offline 4. Add more files to volume 5. Bring other node back online 6. Self-heal doesn't work Actual results: Self-heal daemon does not copy file to other node, instead gflheal log is full of messages like: [2018-11-22 11:38:50.298813] W [dict.c:656:dict_ref] (-->/usr/lib64/glusterfs/3.12.14/xlator/cluster/replicate.so(+0x62423) [0x7f2cfec9a423] -->/lib64/libglusterfs.so.0(syncop_getxattr_cbk+0x34) [0x7f2d135398d4] -->/lib64/libglusterfs.so.0(dict_ref+0xbd) [0x7f2d134f7c7d] ) 0-dict: dict is NULL [Invalid argument] Expected results: Self-heal copies files back to failed volume Additional info: heal log attached