Bug 1716792 - heal launch failing with "Glusterd Syncop Mgmt brick op 'Heal' failed"
Summary: heal launch failing with "Glusterd Syncop Mgmt brick op 'Heal' failed"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: RHGS 3.5.0
Assignee: Ashish Pandey
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks: 1696809
TreeView+ depends on / blocked
 
Reported: 2019-06-04 06:20 UTC by Nag Pavan Chilakam
Modified: 2019-10-30 12:22 UTC (History)
7 users (show)

Fixed In Version: glusterfs-6.0-7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-30 12:21:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:22:07 UTC

Description Nag Pavan Chilakam 2019-06-04 06:20:29 UTC
Description of problem:
========================
In continuation with BZ#1716279 - fuse client log says "insufficient available children for this request ", even though all bricks are up
I have a file which is not getting healed at all
https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c3

Even after close to a day, the healing has not completed and hence I tried to trigger an index heal. However on triggering an index heal I am getting below error message

[root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv 
Launching heal operation to perform index self heal on volume cvlt-ecv has been unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details.

Also to add to that I don't find any new logging due to above in the shd logs.

Note: raising a new bug to track the issue seperately, however feel free to dup it if need be.

All details are available in BZ#1716279, the only additional steps are as below
>> stopped all the file creates using dd
>> only IO running is the top command logging " started capturing top o/p every 2 minutes of each node -->gets appended to file being hosted on the ec volume, one file each for every node"

I do see in shd log below Invalid argument error
[2019-06-04 06:15:25.016138] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 2-cvlt-ecv-client-7: remote operation failed [Invalid argument]

Will be attaching the latest sosreports and logs

Version-Release number of selected component (if applicable):
===================================
6.0.3 on rhel7.7 beta


How reproducible:
consistent on my setup

Steps:
post BZ#1716279, trigger an index heal, and you will see below issue
[root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv 
Launching heal operation to perform index self heal on volume cvlt-ecv has been unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details.


Actual results:
================
1)heal fails to trigger

Comment 2 Nag Pavan Chilakam 2019-06-04 06:32:56 UTC
Volume info:
===========
[root@rhs-gp-srv1 glusterfs]# gluster v info
 
Volume Name: cvlt-ecv
Type: Distributed-Disperse
Volume ID: c500e86b-f505-48c8-8141-cb3de5956c24
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1
Brick2: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1
Brick3: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick2/cvlt-ecv-sv1
Brick4: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1
Brick5: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1
Brick6: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick3/cvlt-ecv-sv1
Brick7: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2
Brick8: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2
Brick9: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick4/cvlt-ecv-sv2
Brick10: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2
Brick11: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2
Brick12: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick5/cvlt-ecv-sv2
Brick13: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3
Brick14: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3
Brick15: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick6/cvlt-ecv-sv3
Brick16: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3
Brick17: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3
Brick18: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick7/cvlt-ecv-sv3
Brick19: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4
Brick20: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4
Brick21: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick8/cvlt-ecv-sv4
Brick22: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4
Brick23: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4
Brick24: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick9/cvlt-ecv-sv4
Brick25: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5
Brick26: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5
Brick27: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick10/cvlt-ecv-sv5
Brick28: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5
Brick29: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5
Brick30: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick11/cvlt-ecv-sv5
Brick31: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6
Brick32: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6
Brick33: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick12/cvlt-ecv-sv6
Brick34: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6
Brick35: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6
Brick36: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick13/cvlt-ecv-sv6
Brick37: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7
Brick38: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7
Brick39: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick14/cvlt-ecv-sv7
Brick40: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7
Brick41: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7
Brick42: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick15/cvlt-ecv-sv7
Brick43: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8
Brick44: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8
Brick45: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick16/cvlt-ecv-sv8
Brick46: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8
Brick47: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8
Brick48: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick17/cvlt-ecv-sv8
Brick49: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9
Brick50: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9
Brick51: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick18/cvlt-ecv-sv9
Brick52: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9
Brick53: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9
Brick54: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick19/cvlt-ecv-sv9
Brick55: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10
Brick56: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10
Brick57: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick20/cvlt-ecv-sv10
Brick58: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10
Brick59: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10
Brick60: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick21/cvlt-ecv-sv10
Brick61: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11
Brick62: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11
Brick63: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick22/cvlt-ecv-sv11
Brick64: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11
Brick65: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11
Brick66: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick23/cvlt-ecv-sv11
Brick67: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12
Brick68: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12
Brick69: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick24/cvlt-ecv-sv12
Brick70: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12
Brick71: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12
Brick72: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick25/cvlt-ecv-sv12
Options Reconfigured:
disperse.other-eager-lock: off
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
 
Volume Name: logvol
Type: Replicate
Volume ID: dd1a76f5-6d16-40bd-90c1-215cf031c2ae
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv1.lab.eng.blr.redhat.com:/gluster/brick1/logvol
Brick2: rhs-gp-srv2.lab.eng.blr.redhat.com:/gluster/brick1/logvol
Brick3: rhs-gp-srv4.lab.eng.blr.redhat.com:/gluster/brick1/logvol
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: enable

Comment 3 Nag Pavan Chilakam 2019-06-04 06:35:45 UTC
logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1716792/

Comment 6 Nag Pavan Chilakam 2019-06-04 14:24:14 UTC
Rafi,While I have provided all the logs, however do let me know if you need the setup, else I would like to consume it for my testing

Comment 7 Nag Pavan Chilakam 2019-06-04 14:27:28 UTC
(In reply to nchilaka from comment #0)
> Description of problem:
> ========================
> In continuation with BZ#1716279 - fuse client log says "insufficient
> available children for this request ", even though all bricks are up
> I have a file which is not getting healed at all
> https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c3
> 
> Even after close to a day, the healing has not completed and hence I tried
> to trigger an index heal. However on triggering an index heal I am getting
> below error message
> 
> [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv 
> Launching heal operation to perform index self heal on volume cvlt-ecv has
> been unsuccessful:
> Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log
> file for details.
> 
> Also to add to that I don't find any new logging due to above in the shd
> logs.
> 
> Note: raising a new bug to track the issue seperately, however feel free to
> dup it if need be.
> 
> All details are available in BZ#1716279, the only additional steps are as
> below
> >> stopped all the file creates using dd
> >> only IO running is the top command logging " started capturing top o/p every 2 minutes of each node -->gets appended to file being hosted on the ec volume, one file each for every node"
> 
> I do see in shd log below Invalid argument error
> [2019-06-04 06:15:25.016138] E [MSGID: 114031]
> [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 2-cvlt-ecv-client-7:
> remote operation failed [Invalid argument]
> 
> Will be attaching the latest sosreports and logs
> 
> Version-Release number of selected component (if applicable):
> ===================================
> 6.0.3 on rhel7.7 beta
> 
> 
> How reproducible:
> consistent on my setup
> 
> Steps:
> post BZ#1716279, trigger an index heal, and you will see below issue
One more step I forgot to mention, was I also created another 1x3 volume too,however the heal was trigger long after this (After about 12hrs)
> [root@rhs-gp-srv1 ~]# gluster v heal cvlt-ecv 
> Launching heal operation to perform index self heal on volume cvlt-ecv has
> been unsuccessful:
> Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log
> file for details.
> 
> 
> Actual results:
> ================
> 1)heal fails to trigger

Comment 29 Nag Pavan Chilakam 2019-07-05 11:39:00 UTC
for now marking this onqa validation blocked by BZ#1716279 as 1716279 is failedqa
i did trigger heal, and didn't see the above issue.
But for now keeping this bug as is

Comment 30 Nag Pavan Chilakam 2019-07-08 11:09:39 UTC
based on https://bugzilla.redhat.com/show_bug.cgi?id=1716279#c18  and also given that I did try a couple of times heal and healinfo and didn't hit crash, I can move this bug to verified
If I further hit an issue again, can raise a seperate bug

version:6.0.7

Comment 32 errata-xmlrpc 2019-10-30 12:21:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.