Description of problem: ======================== In continuation with BZ#1716279 - fuse client log says "insufficient available children for this request ", even though all bricks are up I have a file which is not getting healed at all https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c3 Even after close to a day, the healing has not completed and hence I tried to trigger an index heal. However on triggering an index heal I am getting below error message [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv Launching heal operation to perform index self heal on volume cvlt-ecv has been unsuccessful: Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details. Also to add to that I don't find any new logging due to above in the shd logs. Note: raising a new bug to track the issue seperately, however feel free to dup it if need be. All details are available in BZ#1716279, the only additional steps are as below >> stopped all the file creates using dd >> only IO running is the top command logging " started capturing top o/p every 2 minutes of each node -->gets appended to file being hosted on the ec volume, one file each for every node" I do see in shd log below Invalid argument error [2019-06-04 06:15:25.016138] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 2-cvlt-ecv-client-7: remote operation failed [Invalid argument] Will be attaching the latest sosreports and logs Version-Release number of selected component (if applicable): =================================== 6.0.3 on rhel7.7 beta How reproducible: consistent on my setup Steps: post BZ#1716279, trigger an index heal, and you will see below issue [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv Launching heal operation to perform index self heal on volume cvlt-ecv has been unsuccessful: Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details. Actual results: ================ 1)heal fails to trigger
Volume info: =========== [root@rhs-gp-srv1 glusterfs]# gluster v info Volume Name: cvlt-ecv Type: Distributed-Disperse Volume ID: c500e86b-f505-48c8-8141-cb3de5956c24 Status: Started Snapshot Count: 0 Number of Bricks: 12 x (4 + 2) = 72 Transport-type: tcp Bricks: Brick1: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1 Brick2: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1 Brick3: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1 Brick4: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1 Brick5: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1 Brick6: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1 Brick7: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2 Brick8: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2 Brick9: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2 Brick10: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2 Brick11: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2 Brick12: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2 Brick13: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3 Brick14: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3 Brick15: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3 Brick16: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3 Brick17: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3 Brick18: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3 Brick19: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4 Brick20: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4 Brick21: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4 Brick22: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4 Brick23: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4 Brick24: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4 Brick25: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5 Brick26: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5 Brick27: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5 Brick28: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5 Brick29: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5 Brick30: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5 Brick31: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6 Brick32: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6 Brick33: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6 Brick34: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6 Brick35: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6 Brick36: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6 Brick37: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7 Brick38: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7 Brick39: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7 Brick40: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7 Brick41: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7 Brick42: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7 Brick43: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8 Brick44: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8 Brick45: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8 Brick46: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8 Brick47: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8 Brick48: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8 Brick49: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9 Brick50: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9 Brick51: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9 Brick52: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9 Brick53: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9 Brick54: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9 Brick55: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10 Brick56: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10 Brick57: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10 Brick58: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10 Brick59: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10 Brick60: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10 Brick61: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11 Brick62: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11 Brick63: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11 Brick64: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11 Brick65: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11 Brick66: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11 Brick67: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12 Brick68: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12 Brick69: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12 Brick70: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12 Brick71: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12 Brick72: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12 Options Reconfigured: disperse.other-eager-lock: off transport.address-family: inet nfs.disable: on cluster.brick-multiplex: enable Volume Name: logvol Type: Replicate Volume ID: dd1a76f5-6d16-40bd-90c1-215cf031c2ae Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick1/logvol Brick2: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick1/logvol Brick3: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick1/logvol Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.brick-multiplex: enable
logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1716792/
Rafi,While I have provided all the logs, however do let me know if you need the setup, else I would like to consume it for my testing
(In reply to nchilaka from comment #0) > Description of problem: > ======================== > In continuation with BZ#1716279 - fuse client log says "insufficient > available children for this request ", even though all bricks are up > I have a file which is not getting healed at all > https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c3 > > Even after close to a day, the healing has not completed and hence I tried > to trigger an index heal. However on triggering an index heal I am getting > below error message > > [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv > Launching heal operation to perform index self heal on volume cvlt-ecv has > been unsuccessful: > Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log > file for details. > > Also to add to that I don't find any new logging due to above in the shd > logs. > > Note: raising a new bug to track the issue seperately, however feel free to > dup it if need be. > > All details are available in BZ#1716279, the only additional steps are as > below > >> stopped all the file creates using dd > >> only IO running is the top command logging " started capturing top o/p every 2 minutes of each node -->gets appended to file being hosted on the ec volume, one file each for every node" > > I do see in shd log below Invalid argument error > [2019-06-04 06:15:25.016138] E [MSGID: 114031] > [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 2-cvlt-ecv-client-7: > remote operation failed [Invalid argument] > > Will be attaching the latest sosreports and logs > > Version-Release number of selected component (if applicable): > =================================== > 6.0.3 on rhel7.7 beta > > > How reproducible: > consistent on my setup > > Steps: > post BZ#1716279, trigger an index heal, and you will see below issue One more step I forgot to mention, was I also created another 1x3 volume too,however the heal was trigger long after this (After about 12hrs) > [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv > Launching heal operation to perform index self heal on volume cvlt-ecv has > been unsuccessful: > Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log > file for details. > > > Actual results: > ================ > 1)heal fails to trigger
for now marking this onqa validation blocked by BZ#1716279 as 1716279 is failedqa i did trigger heal, and didn't see the above issue. But for now keeping this bug as is
based on https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c18 and also given that I did try a couple of times heal and healinfo and didn't hit crash, I can move this bug to verified If I further hit an issue again, can raise a seperate bug version:6.0.7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249