Bug 764238 (GLUSTER-2506)

Summary: In a 3 bricks replica system, it's impossible to replace failed brick
Product: [Community] GlusterFS Reporter: raf <milanraf>
Component: replicateAssignee: kaushik <kbudiger>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.2CC: amarts, gluster-bugs, glusterfs
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs
none
complete logs none

Description raf 2011-03-10 00:49:49 UTC
Given a 3 bricks replica volume, in case a brick fails, trying to replace the failed one with another brick from a new peer, results in non responding volume (unable to write o read from volume).

It should be VERY useful to replace on-the-fly a not-working brick with a new working one, without waiting for that failed brick to be repaired-restarted-reinstalled or whatever.

Comment 1 Pranith Kumar K 2011-03-10 08:40:49 UTC
(In reply to comment #0)
> Given a 3 bricks replica volume, in case a brick fails, trying to replace the
> failed one with another brick from a new peer, results in non responding volume
> (unable to write o read from volume).
> 
> It should be VERY useful to replace on-the-fly a not-working brick with a new
> working one, without waiting for that failed brick to be
> repaired-restarted-reinstalled or whatever.

For better understanding of the issue, could you please let us know what do you mean by brick failing?.

Comment 2 raf 2011-03-10 14:13:02 UTC
I'll write complete sequence:

[192.168.0.1]# gluster volume create test replica 3 transport tcp 192.168.0.1:/var/gluster 192.168.0.2:/var/gluster 192.168.0.3:/var/gluster

[192.168.0.1]# mount -t glusterfs localhost:/test /mnt/gluster

on host 192.168.0.1 let's share directory /mnt/gluster with SAMBA and start copying a bunch of data from a Window$ client

during data copy, hard shutdown of 192.168.0.3 (unplug form main surge)

then issue the command:

[192.168.0.1]# gluster peer probe 192.168.0.4

then, issueing the command:

[192.168.0.1]# gluster volume replace-brick test 192.168.0.3:/var/gluster 192.168.0.4:/var/gluster

results in unresponsive cluster (data copy stops and timeouts).

Raf

Comment 3 Pranith Kumar K 2011-03-10 14:28:35 UTC
(In reply to comment #2)
> I'll write complete sequence:
> 
> [192.168.0.1]# gluster volume create test replica 3 transport tcp
> 192.168.0.1:/var/gluster 192.168.0.2:/var/gluster 192.168.0.3:/var/gluster
> 
> [192.168.0.1]# mount -t glusterfs localhost:/test /mnt/gluster
> 
> on host 192.168.0.1 let's share directory /mnt/gluster with SAMBA and start
> copying a bunch of data from a Window$ client
> 
> during data copy, hard shutdown of 192.168.0.3 (unplug form main surge)
> 
> then issue the command:
> 
> [192.168.0.1]# gluster peer probe 192.168.0.4
> 
> then, issueing the command:
> 
> [192.168.0.1]# gluster volume replace-brick test 192.168.0.3:/var/gluster
> 192.168.0.4:/var/gluster
> 
> results in unresponsive cluster (data copy stops and timeouts).
> 
> Raf

hi Raf,
    Replace-brick is used when you want the data on that brick to be copied onto the new brick. If you dont want the contents, then you can always do 
remove-brick and add-brick.
Example:
pranith @ /etc/glusterd
22:54:16 :) $ sudo gluster volume remove-brick vol `hostname`:/tmp/3
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Remove Brick successful

pranith @ /etc/glusterd
22:54:33 :) $ sudo gluster volume info

Volume Name: vol
Type: Replicate
Status: Created
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: pranith-laptop:/tmp/1
Brick2: pranith-laptop:/tmp/2

pranith @ /etc/glusterd
22:54:45 :) $ sudo gluster volume add-brick vol `hostname`:/tmp/4
Add Brick successful

pranith @ /etc/glusterd
22:55:12 :) $ sudo gluster volume info

Volume Name: vol
Type: Replicate
Status: Created
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: pranith-laptop:/tmp/1
Brick2: pranith-laptop:/tmp/2
Brick3: pranith-laptop:/tmp/4

We shall use this bug to prevent the cluster to go into unresponsive cluster.

Thanks
Pranith.

Comment 4 Pranith Kumar K 2011-03-22 06:35:11 UTC
hi Raf,
      The test case mentioned works fine for me. Is it possible this is also dude to the network.ping-timeout issue?.

Here is the output from my machine for the test case:

pranith @ /mnt
15:02:06 :( $ sudo gluster volume replace-brick vol 192.168.1.44:/tmp/1 192.168.1.121:/tmp/11 start
replace-brick started successfully

pranith @ /mnt
15:02:13 :) $ sudo gluster volume info

Volume Name: vol
Type: Replicate
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.121:/tmp/4
Brick2: 192.168.1.121:/tmp/5
Brick3: 192.168.1.44:/tmp/1

pranith @ /mnt
15:02:42 :) $ sudo gluster peer status
Number of Peers: 1

Hostname: 192.168.1.44
Uuid: 5d050e36-db97-42a9-9f5c-9836e49c93b3
State: Peer in Cluster (Disconnected)

pranith @ /mnt
15:02:47 :) $ sudo gluster volume replace-brick vol 192.168.1.44:/tmp/1 192.168.1.121:/tmp/11 commit force
replace-brick commit successful

pranith @ /mnt
15:03:03 :) $ sudo gluster volume info

Volume Name: vol
Type: Replicate
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.121:/tmp/4
Brick2: 192.168.1.121:/tmp/5
Brick3: 192.168.1.121:/tmp/11

pranith @ /mnt
15:03:07 :) $ 

Pranith.

Comment 5 raf 2011-03-24 16:05:39 UTC
Now I've upgraded to 3.1.3

new test procedure:

[192.168.0.1]# gluster volume create san replica 3 transport tcp 192.168.0.1:/mnt/gluster 192.168.0.2:/mnt/gluster 192.168.0.3:/mnt/gluster
[192.168.0.1]# gluster volume start san
[192.168.0.1]# gluster volume set san network.ping-timeout 5
[192.168.0.1]# mount -t glusterfs localhost:/san /mnt/nfs
share /mnt/nfs thru SAMBA
star copying a lot of data from a SMB client (WinXP)
during copy, kill 192.168.0.3 (unplug from surge)
data copy hangs for few seconds and than continues smoothly
[192.168.0.1]# gluster peer probe 192.168.0.4
[192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster 192.168.0.4:/mnt/gluster start
after 10-15 seconds, prompt returns without messages (data copy is still running althought)
[192.168.0.1]# gluster volume info
Volume Name: san
Type: Replicate
Status: Created
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 192.168.0.1:/mnt/gluster
Brick1: 192.168.0.2:/mnt/gluster
Brick1: 192.168.0.3:/mnt/gluster

still try to issue
[192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster 192.168.0.4:/mnt/gluster start
results in prompt returning without messages

[192.168.0.1]# gluster volume remove-brick san 192.168.0.3:/mnt/gluster
Operation failed

[192.168.0.1]# gluster peer status
192.168.0.2 connected
192.168.0.3 disconnected
192.168.0.4 connected

Raf

Comment 6 Pranith Kumar K 2011-03-24 23:27:56 UTC
(In reply to comment #5)
> Now I've upgraded to 3.1.3
> 
> new test procedure:
> 
> [192.168.0.1]# gluster volume create san replica 3 transport tcp
> 192.168.0.1:/mnt/gluster 192.168.0.2:/mnt/gluster 192.168.0.3:/mnt/gluster
> [192.168.0.1]# gluster volume start san
> [192.168.0.1]# gluster volume set san network.ping-timeout 5
> [192.168.0.1]# mount -t glusterfs localhost:/san /mnt/nfs
> share /mnt/nfs thru SAMBA
> star copying a lot of data from a SMB client (WinXP)
> during copy, kill 192.168.0.3 (unplug from surge)
> data copy hangs for few seconds and than continues smoothly
> [192.168.0.1]# gluster peer probe 192.168.0.4
> [192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster
> 192.168.0.4:/mnt/gluster start
> after 10-15 seconds, prompt returns without messages (data copy is still
> running althought)
> [192.168.0.1]# gluster volume info
> Volume Name: san
> Type: Replicate
> Status: Created
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.0.1:/mnt/gluster
> Brick1: 192.168.0.2:/mnt/gluster
> Brick1: 192.168.0.3:/mnt/gluster
> 
> still try to issue
> [192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster
> 192.168.0.4:/mnt/gluster start
> results in prompt returning without messages
> 
> [192.168.0.1]# gluster volume remove-brick san 192.168.0.3:/mnt/gluster
> Operation failed
> 
> [192.168.0.1]# gluster peer status
> 192.168.0.2 connected
> 192.168.0.3 disconnected
> 192.168.0.4 connected
> 
> Raf

hi Raf,
   Do you have the logs for this run?. Could you please post them?.

Pranith

Comment 7 raf 2011-03-25 16:41:43 UTC
Created attachment 462

Comment 8 raf 2011-03-25 16:43:18 UTC
Created attachment 463

Comment 9 raf 2011-03-25 16:45:20 UTC
Ok, started from 4 clean (rm -rf /etc/gluster/* , rm -rf /usr/local/var/log/gluster/* ) systems.

[192.168.0.202]# gluster peer probe 192.168.0.203
[192.168.0.202]# gluster peer probe 192.168.0.204
[192.168.0.202]# gluster volume create san replica 3 transport tcp 192.168.0.202:/mnt/gluster 192.168.0.203:/mnt/gluster 192.168.0.204:/mnt/gluster
[192.168.0.202]# gluster volume set san network.ping-timeout 5
[192.168.0.202]# gluster volume start san
[192.168.0.202]# mount -t glusterfs localhost:/san /mnt/nfs
share /mnt/nfs thru SAMBA
start copy a lot of data from WinXP client
while copying unplug 192.168.0.204 from main surge
and then the following commands:
[192.168.0.202]# gluster peer probe 192.168.0.205
[192.168.0.202]# gluster volume replace-brick san 192.168.0.204:/mnt/gluster 192.168.0.205:/mnt/gluster start
command return "operation started successfully" but nothing happens
so I tried:
[192.168.0.202]# gluster volume replace-brick san 192.168.0.204:/mnt/gluster 192.168.0.205:/mnt/gluster status
returned "status unknown"
[192.168.0.202]# gluster volume replace-brick san 192.168.0.204:/mnt/gluster 192.168.0.205:/mnt/gluster abort
returned "operation failed"
then I tried:
[192.168.0.202]# gluster volume remove-brick san 192.168.0.204:/mnt/gluster
[192.168.0.202]# gluster volume add-brick san 192.168.0.205:/mnt/gluster
and this did the trick
then restarted 192.168.0.204 machine and tried:
[192.168.0.202]# gluster volume replace-brick san 192.168.0.205:/mnt/gluster 192.168.0.204:/mnt/gluster start
returned "unsuccessful"
[192.168.0.202]# gluster volume remove-brick san 192.168.0.205:/mnt/gluster
operation "successful"
[192.168.0.202]# gluster volume add-brick san 192.168.0.204:/mnt/gluster
returned "operation successful" but 192.168.0.204 peer didn't copied any data (data copy was running all time long).

I've posted the logs before this comment, sorry.

Raf

Comment 10 Amar Tumballi 2011-09-28 05:53:35 UTC
Raf,

In the later versions of glusterfs, there are few of the 'replace-brick' related bugs are fixed. Can you please check and verify that the bugs are fixed for  you? If they still exist, please 're-open' the bug.

We are currently unable to reproduce the issue.