Bug 1002556
Summary: | running add-brick then remove-brick, then restarting gluster leads to broken volume brick counts | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Justin Randell <justin.randell> | |
Component: | glusterd | Assignee: | krishnan parthasarathi <kparthas> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.4.0 | CC: | gluster-bugs, marc.seeger, nsathyan, ravishankar | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.4.3 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1019683 (view as bug list) | Environment: | ||
Last Closed: | 2014-04-17 13:14:06 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1019683 |
Description
Justin Randell
2013-08-29 12:52:40 UTC
*** Bug 1000779 has been marked as a duplicate of this bug. *** Additional info: Re-adding a brick results in an "operation failed", but the operation does indeed succeed and it seems to fix it. [13:12:53] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: f3117deb-f5f5-40ff-94b5-98b2095239b2 Status: Started Number of Bricks: 0 x 3 = 2 Transport-type: tcp Bricks: Brick1: fs-15.mseeger.example.dev:/mnt/brick22 Brick2: fs-14.mseeger.example.dev:/mnt/brick23 [13:12:55] root:~# rm -rf /mnt/bla/ [13:13:00] root:~# mkdir /mnt/bla [13:13:02] root:~# gluster volume add-brick test-fs-cluster-1 replica 3 fs-15:/mnt/bla/ Operation failed on fs-14.mseeger.example.dev [13:13:08] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: f3117deb-f5f5-40ff-94b5-98b2095239b2 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: fs-15.mseeger.example.dev:/mnt/brick22 Brick2: fs-14.mseeger.example.dev:/mnt/brick23 Brick3: fs-15:/mnt/bla Adding it a second time will for some reason remove that brick: [13:15:03] root:~# gluster volume add-brick test-fs-cluster-1 replica 3 fs-15:/mnt/bla/ Operation failed [13:15:04] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: f3117deb-f5f5-40ff-94b5-98b2095239b2 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-15.mseeger.example.dev:/mnt/brick22 Brick2: fs-14.mseeger.example.dev:/mnt/brick23 I'm not quite sure what's up with the volume geometry, but it's certainly corrupted REVIEW: http://review.gluster.org/5893 (mgmt/glusterd: Update sub_count on remove brick) posted (#1) for review on master by Vijay Bellur (vbellur) REVIEW: http://review.gluster.org/5893 (mgmt/glusterd: Update sub_count on remove brick) posted (#2) for review on master by Vijay Bellur (vbellur) This seems to have fixed it. Will this be backported to 3.3 / 3.4? This is what it looks like after the fix: [13:14:20] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36 [13:14:47] root:~# mkdir /mnt/bla [13:15:08] root:~# gluster volume add-brick test-fs-cluster-1 replica 3 fs-21:/mnt/bla/ Add Brick successful [13:15:42] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36 Brick3: fs-21:/mnt/bla [13:15:49] root:~# echo y | gluster volume remove-brick test-fs-cluster-1 replica 2 fs-21:/mnt/bla/ Removing brick(s) can result in data loss. Do you want to Continue? (y/n) Remove Brick commit force successful [13:16:17] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36 [13:16:23] root:~# service glusterfs-server stop glusterfs-server stop/waiting [13:16:34] root:~# service glusterfs-server start glusterfs-server start/running, process 29760 [13:16:37] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36 COMMIT: http://review.gluster.org/5893 committed in master by Anand Avati (avati) ------ commit 643533c77fd49316b7d16015fa1a008391d14bb2 Author: Vijay Bellur <vbellur> Date: Wed Sep 11 01:26:13 2013 +0530 mgmt/glusterd: Update sub_count on remove brick Change-Id: I7c17de39da03c6b2764790581e097936da406695 BUG: 1002556 Signed-off-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/5893 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Anand Avati <avati> REVIEW: http://review.gluster.org/5902 (mgmt/glusterd: Update sub_count on remove brick) posted (#1) for review on release-3.4 by Vijay Bellur (vbellur) COMMIT: http://review.gluster.org/5902 committed in release-3.4 by Vijay Bellur (vbellur) ------ commit d9dde294cfd7bb83bccbe777dfd58b925a6f2f7b Author: Vijay Bellur <vbellur> Date: Wed Sep 11 01:26:13 2013 +0530 mgmt/glusterd: Update sub_count on remove brick Change-Id: I7c17de39da03c6b2764790581e097936da406695 BUG: 1002556 Signed-off-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/5902 Tested-by: Gluster Build System <jenkins.com> This is alsy failing in 3.3 Will there be a backport? (I tested the fix on 3.3, worked fine) This is also failing in 3.3 Will there be a backport? (I tested the fix on 3.3, worked fine) This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report. glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5". [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978 [2] http://news.gmane.org/gmane.comp.file-systems.gluster.user [3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 |