Bug 1000779 - running add-brick then remove-brick, then restarting gluster leads to broken volume brick counts
Summary: running add-brick then remove-brick, then restarting gluster leads to broken ...
Keywords:
Status: CLOSED DUPLICATE of bug 1002556
Alias: None
Product: GlusterFS
Classification: Community
Component: cli
Version: 3.3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-25 07:26 UTC by Justin Randell
Modified: 2013-08-29 12:53 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-29 12:53:35 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Justin Randell 2013-08-25 07:26:19 UTC
Description of problem:

simultaneous remove-brick commands corrupt volumes.

Steps to Reproduce:

1. set up a simple replicated volume with two nodes

{code}
root@gluster1:~# gluster volume info
 
Volume Name: hosting-test
Type: Replicate
Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster2.justindev:/export/brick1/sdb1
Brick2: gluster1.justindev:/export/brick1/sdb1
{code}

2. add a third brick to the replica

{code}
root@gluster2:~# gluster volume add-brick hosting-test replica 3 gluster1.justindev:/export/brick2/sdc1
Add Brick successful
root@gluster2:~# gluster volume info
 
Volume Name: hosting-test
Type: Replicate
Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster2.justindev:/export/brick1/sdb1
Brick2: gluster1.justindev:/export/brick1/sdb1
Brick3: gluster1.justindev:/export/brick2/sdc1
{code}

3. aaaand now for the fun bit. remove the brick at the same time from both nodes, one will fail, both will report a healthy volume.

here's the node that wins:

{code}
root@gluster1:~# echo y | gluster volume remove-brick hosting-test replica 2 gluster1.justindev:/export/brick2/sdc1
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) Remove Brick commit force successful
root@gluster1:~# gluster volume info
 
Volume Name: hosting-test
Type: Replicate
Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster2.justindev:/export/brick1/sdb1
Brick2: gluster1.justindev:/export/brick1/sdb1
{code}

and the node that fails:

{code}
root@gluster2:~# echo y | gluster volume remove-brick hosting-test replica 2 gluster1.justindev:/export/brick2/sdc1
Operation failed
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) root@gluster2:~#
root@gluster2:~# gluster volume info
 
Volume Name: hosting-test
Type: Replicate
Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster2.justindev:/export/brick1/sdb1
Brick2: gluster1.justindev:/export/brick1/sdb1
{code}

4. stop and start gluster on either node, and we get funky maths:

{code}
root@gluster2:~# service glusterfs-server stop
glusterfs-server stop/waiting
root@gluster2:~# service glusterfs-server start
glusterfs-server start/running, process 11739
root@gluster2:~# gluster volume info
 
Volume Name: hosting-test
Type: Replicate
Volume ID: f8d7132b-6bb1-40d4-8414-b2168cdf2cd7
Status: Started
Number of Bricks: 0 x 3 = 2
Transport-type: tcp
Bricks:
Brick1: gluster2.justindev:/export/brick1/sdb1
Brick2: gluster1.justindev:/export/brick1/sdb1
{code}

Actual results:

volume ends up with funky maths for bricks.

Expected results:

volumes continue operating normally.

Additional info:

Ubuntu 13.04, using the 3.3 packages from http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.2/Ubuntu.README

Comment 1 Justin Randell 2013-08-29 12:34:44 UTC
this bug is worse than my initial description.

it can reproduce this, on 3.3 and 3.4, with just these steps:

1. create a simple replicated volume across two nodes, on brick on each node

2. add a third brick to the volume from one of the existing nodes

3. remove the brick

4. restart gluster

Comment 2 Justin Randell 2013-08-29 12:53:35 UTC

*** This bug has been marked as a duplicate of bug 1002556 ***


Note You need to log in before you can comment on or make changes to this bug.