Bug 1336098 - heal info command takes tens of minutes when in split-brain situation.
Summary: heal info command takes tens of minutes when in split-brain situation.
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.7.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-14 07:46 UTC by René Pavlík
Modified: 2017-03-08 10:48 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-08 10:48:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description René Pavlík 2016-05-14 07:46:38 UTC
Description of problem:
When we encounter a gfid split brain, the `heal info` command takes about 40 minutes to process.

We have usually about 50 files in gfid split brain, which generates about 1500-2500 files in `undergoing heal` state. The heal is not successful because of the split brain files which must be treated first. Listing of those files takse ages.

The `heal info split-brain` command hangs entirely, after 2 hours it did not write a single line.

Version-Release number of selected component (if applicable):
3.7.6, 2.7.8 - tested on our env.

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
The info commands should report the state immediately.

Additional info:

Comment 1 Pranith Kumar K 2016-06-23 00:37:59 UTC
hi,
     Ideally heal-info and heal-info split-brain should take same amount of time. Gfid split-brains are not shown in 'heal info split-brain' output. Do you have any steps to recreate this issue?

Pranith

Comment 2 René Pavlík 2016-06-27 12:19:14 UTC
Hi, Pranith,

not exactly the steps, but I can give you a description of our setup and the triggers of that split-brain. The main aspect of this bug report is to have reliable, always-returning-something tool to detect a split-brain - for example by an external monitoring system, invoking the command each minute or so. I have seen the reported behavior every time our setup had a connection issues and gfid split-brain occurred. But the lasting time of the command depends on the extent of the damage.

Our setup:
- 3 replicated nodes with client quorum
- 15 servers having the cluster mounted locally, rsyncing their data to the glusterfs, to their own directories (sharing the same, common parent dir)
- the files are only being appended with new data or new files are being added, no deletion
- when there is a connection issue, the gfid split-brain occurs: on each brick, there is the latest data file with different size and gfid, or is missing entirelly on some bricks.
- the total amount of the files in real split-brain is about 50
- sometimes also the containing directory has this issue

In such situation we would like to detect the split-brain but the issue reported occurs.

I'm sorry, that I cannot give you exact scenario, where you would directly see the issue. Hope this helps.

If you need additional info, please ask.

Thanks.

Rene

Comment 3 Kaushal 2017-03-08 10:48:24 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.


Note You need to log in before you can comment on or make changes to this bug.