Bug 1716792
| Summary: | heal launch failing with "Glusterd Syncop Mgmt brick op 'Heal' failed" | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
| Component: | disperse | Assignee: | Ashish Pandey <aspandey> |
| Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.5 | CC: | amukherj, aspandey, rhs-bugs, rkavunga, srakonde, storage-qa-internal, vdas |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-6.0-7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-30 12:21:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1696809 | ||
|
Description
Nag Pavan Chilakam
2019-06-04 06:20:29 UTC
Volume info: =========== [root@rhs-gp-srv1 glusterfs]# gluster v info Volume Name: cvlt-ecv Type: Distributed-Disperse Volume ID: c500e86b-f505-48c8-8141-cb3de5956c24 Status: Started Snapshot Count: 0 Number of Bricks: 12 x (4 + 2) = 72 Transport-type: tcp Bricks: Brick1: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1 Brick2: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1 Brick3: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1 Brick4: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1 Brick5: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1 Brick6: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1 Brick7: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2 Brick8: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2 Brick9: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2 Brick10: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2 Brick11: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2 Brick12: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2 Brick13: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3 Brick14: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3 Brick15: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3 Brick16: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3 Brick17: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3 Brick18: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3 Brick19: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4 Brick20: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4 Brick21: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4 Brick22: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4 Brick23: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4 Brick24: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4 Brick25: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5 Brick26: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5 Brick27: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5 Brick28: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5 Brick29: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5 Brick30: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5 Brick31: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6 Brick32: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6 Brick33: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6 Brick34: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6 Brick35: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6 Brick36: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6 Brick37: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7 Brick38: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7 Brick39: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7 Brick40: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7 Brick41: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7 Brick42: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7 Brick43: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8 Brick44: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8 Brick45: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8 Brick46: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8 Brick47: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8 Brick48: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8 Brick49: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9 Brick50: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9 Brick51: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9 Brick52: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9 Brick53: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9 Brick54: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9 Brick55: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10 Brick56: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10 Brick57: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10 Brick58: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10 Brick59: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10 Brick60: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10 Brick61: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11 Brick62: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11 Brick63: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11 Brick64: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11 Brick65: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11 Brick66: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11 Brick67: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12 Brick68: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12 Brick69: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12 Brick70: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12 Brick71: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12 Brick72: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12 Options Reconfigured: disperse.other-eager-lock: off transport.address-family: inet nfs.disable: on cluster.brick-multiplex: enable Volume Name: logvol Type: Replicate Volume ID: dd1a76f5-6d16-40bd-90c1-215cf031c2ae Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick1/logvol Brick2: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick1/logvol Brick3: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick1/logvol Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.brick-multiplex: enable logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1716792/ Rafi,While I have provided all the logs, however do let me know if you need the setup, else I would like to consume it for my testing (In reply to nchilaka from comment #0) > Description of problem: > ======================== > In continuation with BZ#1716279 - fuse client log says "insufficient > available children for this request ", even though all bricks are up > I have a file which is not getting healed at all > https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c3 > > Even after close to a day, the healing has not completed and hence I tried > to trigger an index heal. However on triggering an index heal I am getting > below error message > > [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv > Launching heal operation to perform index self heal on volume cvlt-ecv has > been unsuccessful: > Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log > file for details. > > Also to add to that I don't find any new logging due to above in the shd > logs. > > Note: raising a new bug to track the issue seperately, however feel free to > dup it if need be. > > All details are available in BZ#1716279, the only additional steps are as > below > >> stopped all the file creates using dd > >> only IO running is the top command logging " started capturing top o/p every 2 minutes of each node -->gets appended to file being hosted on the ec volume, one file each for every node" > > I do see in shd log below Invalid argument error > [2019-06-04 06:15:25.016138] E [MSGID: 114031] > [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 2-cvlt-ecv-client-7: > remote operation failed [Invalid argument] > > Will be attaching the latest sosreports and logs > > Version-Release number of selected component (if applicable): > =================================== > 6.0.3 on rhel7.7 beta > > > How reproducible: > consistent on my setup > > Steps: > post BZ#1716279, trigger an index heal, and you will see below issue One more step I forgot to mention, was I also created another 1x3 volume too,however the heal was trigger long after this (After about 12hrs) > [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv > Launching heal operation to perform index self heal on volume cvlt-ecv has > been unsuccessful: > Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log > file for details. > > > Actual results: > ================ > 1)heal fails to trigger for now marking this onqa validation blocked by BZ#1716279 as 1716279 is failedqa i did trigger heal, and didn't see the above issue. But for now keeping this bug as is based on https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c18 and also given that I did try a couple of times heal and healinfo and didn't hit crash, I can move this bug to verified If I further hit an issue again, can raise a seperate bug version:6.0.7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249 |