Bug 1000779
| Summary: | running add-brick then remove-brick, then restarting gluster leads to broken volume brick counts | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Justin Randell <justin.randell> |
| Component: | cli | Assignee: | Kaushal <kaushal> |
| Status: | CLOSED DUPLICATE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.3.2 | CC: | gluster-bugs |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-08-29 12:53:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
this bug is worse than my initial description. it can reproduce this, on 3.3 and 3.4, with just these steps: 1. create a simple replicated volume across two nodes, on brick on each node 2. add a third brick to the volume from one of the existing nodes 3. remove the brick 4. restart gluster *** This bug has been marked as a duplicate of bug 1002556 *** |
Description of problem: simultaneous remove-brick commands corrupt volumes. Steps to Reproduce: 1. set up a simple replicated volume with two nodes {code} root@gluster1:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 {code} 2. add a third brick to the replica {code} root@gluster2:~# gluster volume add-brick hosting-test replica 3 gluster1.justindev:/export/brick2/sdc1 Add Brick successful root@gluster2:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 Brick3: gluster1.justindev:/export/brick2/sdc1 {code} 3. aaaand now for the fun bit. remove the brick at the same time from both nodes, one will fail, both will report a healthy volume. here's the node that wins: {code} root@gluster1:~# echo y | gluster volume remove-brick hosting-test replica 2 gluster1.justindev:/export/brick2/sdc1 Removing brick(s) can result in data loss. Do you want to Continue? (y/n) Remove Brick commit force successful root@gluster1:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 {code} and the node that fails: {code} root@gluster2:~# echo y | gluster volume remove-brick hosting-test replica 2 gluster1.justindev:/export/brick2/sdc1 Operation failed Removing brick(s) can result in data loss. Do you want to Continue? (y/n) root@gluster2:~# root@gluster2:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 {code} 4. stop and start gluster on either node, and we get funky maths: {code} root@gluster2:~# service glusterfs-server stop glusterfs-server stop/waiting root@gluster2:~# service glusterfs-server start glusterfs-server start/running, process 11739 root@gluster2:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: f8d7132b-6bb1-40d4-8414-b2168cdf2cd7 Status: Started Number of Bricks: 0 x 3 = 2 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 {code} Actual results: volume ends up with funky maths for bricks. Expected results: volumes continue operating normally. Additional info: Ubuntu 13.04, using the 3.3 packages from http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.2/Ubuntu.README