Bug 1467822

Summary: replace-brick command should fail if brick to be replaced is source brick for data to be healed
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1432969 Environment:
Last Closed: 2020-03-12 12:29:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1432969    
Bug Blocks: 1432004    

Description Ravishankar N 2017-07-05 08:54:29 UTC
+++ This bug was initially created as a clone of Bug #1432969 +++

Description of problem:
On a replica 3 volume if replace brick is run on all three bricks (before self-heal is completed) we might end up in a data loss. This is currently documented in admin guide as a precaution that admin should take. However, it would be good if Gluster stopped admin from doing such a mistake,
:

Additional comment from Ravishankar N on 2017-03-21 00:53:48 EDT ---

Pranith, is this something we want to fix in gluster rather than Heketi? To prevent subsequent replace-bricks while one is already in progress, I think glusterd must be made aware of what an admin would normally run heal-info and verify, i.e. ensure that pending heal count is zero.  If that is the way to go, then Ashiq's patch https://review.gluster.org/#/c/12154/ to get healinfo summary would be helpful. glusterd then would need to run it, parse the xml for no. of entries and then allow/deny every replace brick operation.

--- Additional comment from Pranith Kumar K on 2017-03-22 21:57:26 EDT ---

(In reply to Ravishankar N from comment #2)
> Pranith, is this something we want to fix in gluster rather than Heketi? To
> prevent subsequent replace-bricks while one is already in progress, I think
> glusterd must be made aware of what an admin would normally run heal-info
> and verify, i.e. ensure that pending heal count is zero.  If that is the way
> to go, then Ashiq's patch https://review.gluster.org/#/c/12154/ to get
> healinfo summary would be helpful. glusterd then would need to run it, parse
> the xml for no. of entries and then allow/deny every replace brick operation.

That will lead to handling this part in all the management daemons. One of the recurring theme I have been hearing all this week is that the CLIs shouldn't do anything dangerous. So it would be better to increase intelligence of replace-brick CLI IMO.

--- Additional comment from Ravishankar N on 2017-03-22 23:03:08 EDT ---

(In reply to Pranith Kumar K from comment #3)
> That will lead to handling this part in all the management daemons. One of
> the recurring theme I have been hearing all this week is that the CLIs
> shouldn't do anything dangerous. So it would be better to increase
> intelligence of replace-brick CLI IMO.

`replace-brick` already works only with commit force, so that itself is a good warning I think. Also, if we disallow replace-brick if pending heals are not zero, a user cannot replace a faulty brick because the faulty brick itself might be the reason why heal-info is non zero in the first place (i.e. it can be the only 'sink' brick).

--- Additional comment from Pranith Kumar K on 2017-03-22 23:23:05 EDT ---

(In reply to Ravishankar N from comment #4)
> (In reply to Pranith Kumar K from comment #3)
> > That will lead to handling this part in all the management daemons. One of
> > the recurring theme I have been hearing all this week is that the CLIs
> > shouldn't do anything dangerous. So it would be better to increase
> > intelligence of replace-brick CLI IMO.
> 
> `replace-brick` already works only with commit force, so that itself is a
> good warning I think.

That is interesting, in that case we need to make the command itself user friendly and without warning.

> Also, if we disallow replace-brick if pending heals
> are not zero, a user cannot replace a faulty brick because the faulty brick
> itself might be the reason why heal-info is non zero in the first place
> (i.e. it can be the only 'sink' brick).

Yes that is correct, question is how do you propose to increase intelligence of replace-brick CLI so that it is safer to replace-brick?

--- Additional comment from Ravishankar N on 2017-03-23 06:49:07 EDT ---

(In reply to Pranith Kumar K from comment #5)
> Yes that is correct, question is how do you propose to increase intelligence
> of replace-brick CLI so that it is safer to replace-brick?

For each of the entries needing heal, deciding which bricks are the sources/sinks can only be done by examining xattrs of all bricks. Doing this in glusterd and responding back to the CLI within the CLI time out can be an issue like you said.  What I can think of is this: If a brick is definitely not a source for any of the files, then its .glusterfs/indices/xattrop will only have the xattrop-base-file and not any more entries (entries undergoing I/O will still be under indices/dirty). So if a replace-brick command is given for a brick, the glusterd on that node can check if link count of indices/xattrop/base-file is 1 and if yes, then permit the operation.

--- Additional comment from Pranith Kumar K on 2017-03-24 09:46:46 EDT ---

(In reply to Ravishankar N from comment #6)
> (In reply to Pranith Kumar K from comment #5)
> > Yes that is correct, question is how do you propose to increase intelligence
> > of replace-brick CLI so that it is safer to replace-brick?
> 
> For each of the entries needing heal, deciding which bricks are the
> sources/sinks can only be done by examining xattrs of all bricks. Doing this
> in glusterd and responding back to the CLI within the CLI time out can be an
> issue like you said.  What I can think of is this: If a brick is definitely
> not a source for any of the files, then its .glusterfs/indices/xattrop will
> only have the xattrop-base-file and not any more entries (entries undergoing
> I/O will still be under indices/dirty). So if a replace-brick command is
> given for a brick, the glusterd on that node can check if link count of
> indices/xattrop/base-file is 1 and if yes, then permit the operation.

That is too much afr specific knowledge for Glusterd to know IMO. Can we do it in a generic fashion?

Comment 1 Amar Tumballi 2019-06-14 08:47:54 UTC
Any update? Looks important... at least to warn user.

Comment 2 Worker Ant 2020-03-12 12:29:38 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/895, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 3 Red Hat Bugzilla 2023-09-14 04:00:37 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days