Bug 1467822 - replace-brick command should fail if brick to be replaced is source brick for data to be healed
replace-brick command should fail if brick to be replaced is source brick for...
Status: NEW
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Ravishankar N
: Triaged
Depends On: 1432969
Blocks: 1432004
  Show dependency treegraph
 
Reported: 2017-07-05 04:54 EDT by Ravishankar N
Modified: 2017-07-17 00:22 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1432969
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ravishankar N 2017-07-05 04:54:29 EDT
+++ This bug was initially created as a clone of Bug #1432969 +++

Description of problem:
On a replica 3 volume if replace brick is run on all three bricks (before self-heal is completed) we might end up in a data loss. This is currently documented in admin guide as a precaution that admin should take. However, it would be good if Gluster stopped admin from doing such a mistake,
:

Additional comment from Ravishankar N on 2017-03-21 00:53:48 EDT ---

Pranith, is this something we want to fix in gluster rather than Heketi? To prevent subsequent replace-bricks while one is already in progress, I think glusterd must be made aware of what an admin would normally run heal-info and verify, i.e. ensure that pending heal count is zero.  If that is the way to go, then Ashiq's patch https://review.gluster.org/#/c/12154/ to get healinfo summary would be helpful. glusterd then would need to run it, parse the xml for no. of entries and then allow/deny every replace brick operation.

--- Additional comment from Pranith Kumar K on 2017-03-22 21:57:26 EDT ---

(In reply to Ravishankar N from comment #2)
> Pranith, is this something we want to fix in gluster rather than Heketi? To
> prevent subsequent replace-bricks while one is already in progress, I think
> glusterd must be made aware of what an admin would normally run heal-info
> and verify, i.e. ensure that pending heal count is zero.  If that is the way
> to go, then Ashiq's patch https://review.gluster.org/#/c/12154/ to get
> healinfo summary would be helpful. glusterd then would need to run it, parse
> the xml for no. of entries and then allow/deny every replace brick operation.

That will lead to handling this part in all the management daemons. One of the recurring theme I have been hearing all this week is that the CLIs shouldn't do anything dangerous. So it would be better to increase intelligence of replace-brick CLI IMO.

--- Additional comment from Ravishankar N on 2017-03-22 23:03:08 EDT ---

(In reply to Pranith Kumar K from comment #3)
> That will lead to handling this part in all the management daemons. One of
> the recurring theme I have been hearing all this week is that the CLIs
> shouldn't do anything dangerous. So it would be better to increase
> intelligence of replace-brick CLI IMO.

`replace-brick` already works only with commit force, so that itself is a good warning I think. Also, if we disallow replace-brick if pending heals are not zero, a user cannot replace a faulty brick because the faulty brick itself might be the reason why heal-info is non zero in the first place (i.e. it can be the only 'sink' brick).

--- Additional comment from Pranith Kumar K on 2017-03-22 23:23:05 EDT ---

(In reply to Ravishankar N from comment #4)
> (In reply to Pranith Kumar K from comment #3)
> > That will lead to handling this part in all the management daemons. One of
> > the recurring theme I have been hearing all this week is that the CLIs
> > shouldn't do anything dangerous. So it would be better to increase
> > intelligence of replace-brick CLI IMO.
> 
> `replace-brick` already works only with commit force, so that itself is a
> good warning I think.

That is interesting, in that case we need to make the command itself user friendly and without warning.

> Also, if we disallow replace-brick if pending heals
> are not zero, a user cannot replace a faulty brick because the faulty brick
> itself might be the reason why heal-info is non zero in the first place
> (i.e. it can be the only 'sink' brick).

Yes that is correct, question is how do you propose to increase intelligence of replace-brick CLI so that it is safer to replace-brick?

--- Additional comment from Ravishankar N on 2017-03-23 06:49:07 EDT ---

(In reply to Pranith Kumar K from comment #5)
> Yes that is correct, question is how do you propose to increase intelligence
> of replace-brick CLI so that it is safer to replace-brick?

For each of the entries needing heal, deciding which bricks are the sources/sinks can only be done by examining xattrs of all bricks. Doing this in glusterd and responding back to the CLI within the CLI time out can be an issue like you said.  What I can think of is this: If a brick is definitely not a source for any of the files, then its .glusterfs/indices/xattrop will only have the xattrop-base-file and not any more entries (entries undergoing I/O will still be under indices/dirty). So if a replace-brick command is given for a brick, the glusterd on that node can check if link count of indices/xattrop/base-file is 1 and if yes, then permit the operation.

--- Additional comment from Pranith Kumar K on 2017-03-24 09:46:46 EDT ---

(In reply to Ravishankar N from comment #6)
> (In reply to Pranith Kumar K from comment #5)
> > Yes that is correct, question is how do you propose to increase intelligence
> > of replace-brick CLI so that it is safer to replace-brick?
> 
> For each of the entries needing heal, deciding which bricks are the
> sources/sinks can only be done by examining xattrs of all bricks. Doing this
> in glusterd and responding back to the CLI within the CLI time out can be an
> issue like you said.  What I can think of is this: If a brick is definitely
> not a source for any of the files, then its .glusterfs/indices/xattrop will
> only have the xattrop-base-file and not any more entries (entries undergoing
> I/O will still be under indices/dirty). So if a replace-brick command is
> given for a brick, the glusterd on that node can check if link count of
> indices/xattrop/base-file is 1 and if yes, then permit the operation.

That is too much afr specific knowledge for Glusterd to know IMO. Can we do it in a generic fashion?

Note You need to log in before you can comment on or make changes to this bug.