1406411 – Fail add-brick command if replica count changes

Bug 1406411 - Fail add-brick command if replica count changes

Summary: Fail add-brick command if replica count changes

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Mohit Agrawal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1404989
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-20 13:09 UTC by Mohit Agrawal
Modified:	2017-03-06 17:40 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.10.0
Clone Of:	1404989
Environment:
Last Closed:	2017-03-06 17:40:20 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Worker Ant 2016-12-20 13:10:56 UTC

REVIEW: http://review.gluster.org/16214 (cluster/dht: Add-brick command fails when one of the replica brick is down              and layout is not set) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 2 Worker Ant 2016-12-21 10:30:54 UTC

REVIEW: http://review.gluster.org/16214 (cluster/dht: Add-brick command fails when one of the replica brick is down              and layout is not set) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2016-12-22 16:13:44 UTC

REVIEW: http://review.gluster.org/16214 (cluster/dht: Add-brick command fails when one of the replica brick is down              and layout is not set) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 4 Mohit Agrawal 2016-12-26 06:21:07 UTC

Hi,

It seems it is expected behavior.As per current dht code in first attempt layout sets only when all subvolumes 
are up otherwise it will not set layout and throws error.

Below is the case of plain distributed environment when i have killed one brick after start volume then mount is failing as per current dht behavior

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

[root@dhcp10-210 ~]# systemctl restart glusterd.service
[root@dhcp10-210 ~]# gluster v create test 10.65.7.254:/dist1/brick1 10.65.7.254:/dist2/brick2
volume create: test: success: please start the volume to access data
[root@dhcp10-210 ~]# gluster v start test
volume start: test: success
[root@dhcp10-210 ~]# gluster v status
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.65.7.254:/dist1/brick1             49152     0          Y       11117
Brick 10.65.7.254:/dist2/brick2             49153     0          Y       11136
 
Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp10-210 ~]# kill 11136
[root@dhcp10-210 ~]# mount -t glusterfs 10.65.7.254:/test /mnt
Mount failed. Please check the log file for more detail

[2016-12-26 06:11:14.871167] W [MSGID: 109005] [dht-selfheal.c:2102:dht_selfheal_directory] 0-test-dht: Directory selfheal failed: 1 subvolumes down.Not fixing. path = /, gfid = 
[2016-12-26 06:11:14.880232] W [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Stale file handle)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

As of now we think it is a corner case , it would be difficult to provide a fix unless there is any data loss in this case.


Regards
Mohit Agrawal

Comment 5 Atin Mukherjee 2016-12-26 09:48:23 UTC

(In reply to Mohit Agrawal from comment #4)
> Hi,
> 
> It seems it is expected behavior.As per current dht code in first attempt
> layout sets only when all subvolumes 
> are up otherwise it will not set layout and throws error.

At worst case, we'd need to have a validation in GlusterD to block users end up into this situation otherwise GlusterD will end up into an inconsistent state where in one of the nodes the commit will fail where as in the others it will go through and the transaction will not be roll backed due to the limitation of GlusterD's design.

> 
> Below is the case of plain distributed environment when i have killed one
> brick after start volume then mount is failing as per current dht behavior
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 
> [root@dhcp10-210 ~]# systemctl restart glusterd.service
> [root@dhcp10-210 ~]# gluster v create test 10.65.7.254:/dist1/brick1
> 10.65.7.254:/dist2/brick2
> volume create: test: success: please start the volume to access data
> [root@dhcp10-210 ~]# gluster v start test
> volume start: test: success
> [root@dhcp10-210 ~]# gluster v status
> Status of volume: test
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.65.7.254:/dist1/brick1             49152     0          Y      
> 11117
> Brick 10.65.7.254:/dist2/brick2             49153     0          Y      
> 11136
>  
> Task Status of Volume test
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> [root@dhcp10-210 ~]# kill 11136
> [root@dhcp10-210 ~]# mount -t glusterfs 10.65.7.254:/test /mnt
> Mount failed. Please check the log file for more detail
> 
> [2016-12-26 06:11:14.871167] W [MSGID: 109005]
> [dht-selfheal.c:2102:dht_selfheal_directory] 0-test-dht: Directory selfheal
> failed: 1 subvolumes down.Not fixing. path = /, gfid = 
> [2016-12-26 06:11:14.880232] W [fuse-bridge.c:767:fuse_attr_cbk]
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Stale file handle)
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 
> As of now we think it is a corner case , it would be difficult to provide a
> fix unless there is any data loss in this case.
> 
> 
> Regards
> Mohit Agrawal

Comment 6 Worker Ant 2017-01-05 10:50:55 UTC

REVIEW: http://review.gluster.org/16330 (glusterd: Fail add-brick on replica count change, if brick is down) posted (#1) for review on master by Karthik U S (ksubrahm)

Comment 7 Worker Ant 2017-01-05 14:56:40 UTC

REVIEW: http://review.gluster.org/16330 (glusterd: Fail add-brick on replica count change, if brick is down) posted (#2) for review on master by Karthik U S (ksubrahm)

Comment 8 Worker Ant 2017-01-05 15:00:30 UTC

REVIEW: http://review.gluster.org/16330 (glusterd: Fail add-brick on replica count change, if brick is down) posted (#3) for review on master by Karthik U S (ksubrahm)

Comment 9 Worker Ant 2017-01-05 15:44:49 UTC

REVIEW: http://review.gluster.org/16330 (glusterd: Fail add-brick on replica count change, if brick is down) posted (#4) for review on master by Atin Mukherjee (amukherj)

Comment 10 Worker Ant 2017-01-06 04:22:55 UTC

REVIEW: http://review.gluster.org/16330 (glusterd: Fail add-brick on replica count change, if brick is down) posted (#5) for review on master by Ravishankar N (ravishankar)

Comment 11 Worker Ant 2017-01-06 09:19:43 UTC

COMMIT: http://review.gluster.org/16330 committed in master by Atin Mukherjee (amukherj) 
------
commit c916a2ffc257b0cfa493410e31b6af28f428c53a
Author: karthik-us <ksubrahm>
Date:   Thu Jan 5 14:06:21 2017 +0530

    glusterd: Fail add-brick on replica count change, if brick is down
    
    Problem:
    1. Have a replica 2 volume with bricks b1 and b2
    2. Before setting the layout, b1 goes down
    3. Set the layout write some data, which gets populated on b2
    4. b2 goes down then b1 comes up
    5. Add another brick b3, and heal will take place from b1 to b3, which
       basically have no data
    6. Write some data. Both b1 and b3 will mark b2 for pending writes
    7. b1 goes down, and b2 comes up
    8. b2 gets heald from b1. During heal it removes the data which is already
       in b2, considering that as stale data. This leads to data loss.
    
    Solution:
    1. In glusterd stage-op, while adding bricks, check whether the replica
       count is being increased
    2. If yes, then check whether any of the bricks are down at that time
    3. If yes, then fail the add-brick to avoid such data loss
    4. Else continue the normal operation.
    
    This check will work enen when we convert plain distribute volume to replicate
    
    Test:
    1. Create a replica 2 volume
    2. Kill one brick from the volume
    3. Try adding a brick to the volume
    4. It should fail with all bricks are not up error
    5. Cretae a distribute volume and kill one of the brick
    6. Try to convert it to replicate volume, by adding bricks.
    7. This should also fail.
    
    Change-Id: I9c8d2ab104263e4206814c94c19212ab914ed07c
    BUG: 1406411
    Signed-off-by: karthik-us <ksubrahm>
    Reviewed-on: http://review.gluster.org/16330
    Tested-by: Ravishankar N <ravishankar>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 12 Worker Ant 2017-01-09 05:30:13 UTC

REVIEW: http://review.gluster.org/16358 (glusterd: bypass add-brick validation with force) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 13 Worker Ant 2017-01-09 07:35:19 UTC

REVIEW: http://review.gluster.org/16358 (glusterd: bypass add-brick validation with force) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 14 Worker Ant 2017-01-09 16:23:00 UTC

REVIEW: http://review.gluster.org/16358 (glusterd: bypass add-brick validation with force) posted (#3) for review on master by Atin Mukherjee (amukherj)

Comment 15 Worker Ant 2017-01-17 13:24:29 UTC

REVIEW: http://review.gluster.org/16358 (glusterd: bypass add-brick validation with force) posted (#4) for review on master by Atin Mukherjee (amukherj)

Comment 16 Worker Ant 2017-01-19 03:50:37 UTC

COMMIT: http://review.gluster.org/16358 committed in master by Atin Mukherjee (amukherj) 
------
commit e8669dc707ffd60fea34c4b8b04f545a9169d5ee
Author: Atin Mukherjee <amukherj>
Date:   Mon Jan 9 10:56:13 2017 +0530

    glusterd: bypass add-brick validation with force
    
    Commit c916a2f added a validation to restrict add-brick operation if a
    replica configuration is changed and any of the bricks belonging to the
    volume is down. However we should bypass this validation with a force
    option if users really want to have add-brick to go through at the sake
    of the corner cases of data loss issue.
    
    The original problem of add-brick getting failed when layout is not set
    will still be a problem with a force option as the issue has to be taken
    care in the DHT layer.
    
    Change-Id: I0ed3df91ea712f77674eb8afc6fdfa577f25a7bb
    BUG: 1406411
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/16358
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 17 Shyamsundar 2017-03-06 17:40:20 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.