Bug 1483977 - [afr]: info split-brain takes longer time about (1m 15secs) to show the output with 0 entries
Summary: [afr]: info split-brain takes longer time about (1m 15secs) to show the outpu...
Keywords:
Status: CLOSED DUPLICATE of bug 1721355
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.3
Hardware: x86_64
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-22 12:13 UTC by Rahul Hinduja
Modified: 2019-12-03 08:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-03 08:33:27 UTC
Embargoed:


Attachments (Terms of Use)

Description Rahul Hinduja 2017-08-22 12:13:11 UTC
Description of problem:
=======================

In one of my scenario, split-brain is taking way longer time as follows:

[root@dhcp37-64 glusterfs]# time gluster volume heal master info split-brain
Brick 10.70.37.64:/rhs/brick1/b1
Status: Connected
Number of entries in split-brain: 0

Brick 10.70.37.60:/rhs/brick1/b2
Status: Connected
Number of entries in split-brain: 0

Brick 10.70.37.64:/rhs/brick2/b3
Status: Connected
Number of entries in split-brain: 0

Brick 10.70.37.60:/rhs/brick2/b4
Status: Connected
Number of entries in split-brain: 0


real        1m25.705s
user        0m13.429s
sys        0m23.291s
[root@dhcp37-64 glusterfs]#


I am able to reproduce this twice via following:

1. Create 2x2 volume
2. Turn of shd
3. Bring down 1 brick
4. Write Data from mount:
For records, i wrote:
 for i in {create,chmod,hardlink,chgrp,symlink,hardlink,truncate,hardlink}; do crefi --multi -n 5 -b 10 -d 10 --max=10K --min=500 --random -T 10 -t text --fop=$i /mnt/master/ ; sleep 10 ; done
5. Bring the brick back
6. Performed client side healing


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.8.4-41.el7rhgs.x86_64


How reproducible:
=================
2/2


Actual results:
===============
split-brain succeeds but takes a longer time.

Comment 3 Ravishankar N 2017-08-22 12:32:38 UTC
What was the output of heal info at this time?  `info split-brain` goes through all files that need heal, performs lookups, examines xattrs etc and only prints the ones in split-brain. So if there are a million files that need heal but zero in split-brain, `info split-brain` would still take a long time. Unless heal-info also had zero entries, I don't think this is a bug.

Comment 5 Rahul Hinduja 2017-10-09 06:38:24 UTC
(In reply to Ravishankar N from comment #3)
> What was the output of heal info at this time?  `info split-brain` goes
> through all files that need heal, performs lookups, examines xattrs etc and
> only prints the ones in split-brain. So if there are a million files that
> need heal but zero in split-brain, `info split-brain` would still take a
> long time. Unless heal-info also had zero entries, I don't think this is a
> bug.

I had some files to be healed, hence the time taken could be because of the this explanation. However, if there are millions of files to be healed (which was the case in one of recent customer case), this delay could be perceived hang. 

This is in that case a usability bug and requires an enhancement in the design. A warning or info is definitely required to let user know that it might take a while and it could be run in background.

Comment 6 Ravishankar N 2017-10-09 08:31:18 UTC
(In reply to Rahul Hinduja from comment #5)
> (In reply to Ravishankar N from comment #3)
> > What was the output of heal info at this time?  `info split-brain` goes
> > through all files that need heal, performs lookups, examines xattrs etc and
> > only prints the ones in split-brain. So if there are a million files that
> > need heal but zero in split-brain, `info split-brain` would still take a
> > long time. Unless heal-info also had zero entries, I don't think this is a
> > bug.
> 
> I had some files to be healed, hence the time taken could be because of the
> this explanation. However, if there are millions of files to be healed
> (which was the case in one of recent customer case), this delay could be
> perceived hang. 
> 
> This is in that case a usability bug and requires an enhancement in the
> design. A warning or info is definitely required to let user know that it
> might take a while and it could be run in background.

Thanks Rahul, I think we can fix this upstream first and not target this for 3.4.0.  We could do this as part of  https://bugzilla.redhat.com/show_bug.cgi?id=1349352#c12 which calls for more changes from usability point of view. Does that sound ok?

Comment 7 Rahul Hinduja 2017-10-09 08:37:44 UTC
> Thanks Rahul, I think we can fix this upstream first and not target this for
> 3.4.0.  We could do this as part of 
> https://bugzilla.redhat.com/show_bug.cgi?id=1349352#c12 which calls for more
> changes from usability point of view. Does that sound ok?

Agree, I am ok to defer from 3.4.0 and to be fixed as part of 1349352

Comment 9 Pranith Kumar K 2019-12-03 08:33:27 UTC
This bug is being fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=1721355. If this issue is seen even after the fix, please feel free to re-open this bug.

*** This bug has been marked as a duplicate of bug 1721355 ***


Note You need to log in before you can comment on or make changes to this bug.