Bug 1603082 - Manual Index heal throws error which is misguiding when heal is triggered to heal a brick if another brick is down
Summary: Manual Index heal throws error which is misguiding when heal is triggered to ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ---
: ---
Assignee: Sanju
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks: 1676812
TreeView+ depends on / blocked
 
Reported: 2018-07-19 06:25 UTC by Upasana
Modified: 2019-07-10 06:46 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1676812 (view as bug list)
Environment:
Last Closed: 2019-03-25 08:24:27 UTC
Embargoed:


Attachments (Terms of Use)

Comment 2 Upasana 2018-07-19 06:36:23 UTC
I was verifying Bug 1475789 when i hit this problem
sos report - http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/ubansal/1603082/

Comment 24 Nag Pavan Chilakam 2018-08-08 07:00:31 UTC
I discussed with Upasana, and based on futher analysis, below is the summary
The heal was happening when i checked on my setup. 
However the error message is misguiding.
Also the error message has a regression introduced

Hence changing title.
However, If incase Upasana, sees that the file is not healing(as she is unable to recollect at this point given that this bug was raised about 20days back), she will raise a new bug again, and also the reason behind calling a heal not happening.

Also One very important note is that the error message is different between 3.3.1(latest live 3.8.4-54.15) and 3.4 latest(3.12.2-15)

For the steps mentioned by Upasana, below is error message (used pkill) on 3.3.1 and 3.4.0
3.3.1:
------
Launching heal operation to perform index self heal on volume ecv has been unsuccessful on bricks that are down. Please check if all brick processes are running.

3.4.0
-------
 Launching heal operation to perform index self heal on volume ecv has been unsuccessful:
Commit failed on rhs-client19.lab.eng.blr.redhat.com. Please check log file for details.


Also, Simple testcase, dont even have any IOs running
have an ecvolume, kill brick on one node, then kill another brick on another node, and issue a heal command


3.3.1
-----------
Launching heal operation to perform index self heal on volume ecv has been unsuccessful on bricks that are down. Please check if all brick processes are running.

Note: I checked with kill -9/-15 and even with brickmux on  , and saw the same error message

3.4.0
--------
Launching heal operation to perform index self heal on volume dispersed has been unsuccessful:
Commit failed on 10.70.35.3. Please check log file for details.


pkill glusterfsd//kill 15 <glusterfsd-pid>
Launching heal operation to perform index self heal on volume ecv has been unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details.

Comment 36 Sanju 2019-02-15 09:13:32 UTC
Upasana/Nag,

Please take a look at https://review.gluster.org/#/c/glusterfs/+/22209/1//COMMIT_MSG and provide your comments.

Comment 37 Atin Mukherjee 2019-02-19 10:16:25 UTC
Looking at the patch, it doesn't look like we can have a detailed message claiming "brick may be down" in case of commit failure for any other reasons than what Ravi explained in some of the following scenarios:

Without this patch, here are some meaningful errors:
=====================================================
[root@ravi2 glusterfs]# gluster v heal testvol
Launching heal operation to perform index self heal on volume testvol has been unsuccessful:
Volume testvol is not started.

[root@ravi2 glusterfs]# gluster v heal testvol
Launching heal operation to perform index self heal on volume testvol has been unsuccessful:
Self-heal-daemon is disabled. Heal will not be triggered on volume testvol

[root@ravi2 glusterfs]# gluster v heal testvol
Launching heal operation to perform index self heal on volume testvol has been unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file for details.
=====================================================

My take is we close this bug as can't fix.

Ravi - do you agree?

Comment 38 Ravishankar N 2019-02-19 11:14:15 UTC
Makes sense to me Atin.

Comment 39 Atin Mukherjee 2019-02-19 14:57:08 UTC
Upasana - Please go through above two comments. We're going to close this bug with the justification mentioned in comment 37. If you happen to disagree please raise your voice now (with counter justification) otherwise this bug will be closed by end of this week.


Note You need to log in before you can comment on or make changes to this bug.