| Summary: | In a 3 bricks replica system, it's impossible to replace failed brick | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | raf <milanraf> | ||||||
| Component: | replicate | Assignee: | kaushik <kbudiger> | ||||||
| Status: | CLOSED WORKSFORME | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.1.2 | CC: | amarts, gluster-bugs, glusterfs | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | i386 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | --- | |||||||
| Regression: | --- | Mount Type: | nfs | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
raf
2011-03-10 00:49:49 UTC
(In reply to comment #0) > Given a 3 bricks replica volume, in case a brick fails, trying to replace the > failed one with another brick from a new peer, results in non responding volume > (unable to write o read from volume). > > It should be VERY useful to replace on-the-fly a not-working brick with a new > working one, without waiting for that failed brick to be > repaired-restarted-reinstalled or whatever. For better understanding of the issue, could you please let us know what do you mean by brick failing?. I'll write complete sequence: [192.168.0.1]# gluster volume create test replica 3 transport tcp 192.168.0.1:/var/gluster 192.168.0.2:/var/gluster 192.168.0.3:/var/gluster [192.168.0.1]# mount -t glusterfs localhost:/test /mnt/gluster on host 192.168.0.1 let's share directory /mnt/gluster with SAMBA and start copying a bunch of data from a Window$ client during data copy, hard shutdown of 192.168.0.3 (unplug form main surge) then issue the command: [192.168.0.1]# gluster peer probe 192.168.0.4 then, issueing the command: [192.168.0.1]# gluster volume replace-brick test 192.168.0.3:/var/gluster 192.168.0.4:/var/gluster results in unresponsive cluster (data copy stops and timeouts). Raf (In reply to comment #2) > I'll write complete sequence: > > [192.168.0.1]# gluster volume create test replica 3 transport tcp > 192.168.0.1:/var/gluster 192.168.0.2:/var/gluster 192.168.0.3:/var/gluster > > [192.168.0.1]# mount -t glusterfs localhost:/test /mnt/gluster > > on host 192.168.0.1 let's share directory /mnt/gluster with SAMBA and start > copying a bunch of data from a Window$ client > > during data copy, hard shutdown of 192.168.0.3 (unplug form main surge) > > then issue the command: > > [192.168.0.1]# gluster peer probe 192.168.0.4 > > then, issueing the command: > > [192.168.0.1]# gluster volume replace-brick test 192.168.0.3:/var/gluster > 192.168.0.4:/var/gluster > > results in unresponsive cluster (data copy stops and timeouts). > > Raf hi Raf, Replace-brick is used when you want the data on that brick to be copied onto the new brick. If you dont want the contents, then you can always do remove-brick and add-brick. Example: pranith @ /etc/glusterd 22:54:16 :) $ sudo gluster volume remove-brick vol `hostname`:/tmp/3 Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y Remove Brick successful pranith @ /etc/glusterd 22:54:33 :) $ sudo gluster volume info Volume Name: vol Type: Replicate Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: pranith-laptop:/tmp/1 Brick2: pranith-laptop:/tmp/2 pranith @ /etc/glusterd 22:54:45 :) $ sudo gluster volume add-brick vol `hostname`:/tmp/4 Add Brick successful pranith @ /etc/glusterd 22:55:12 :) $ sudo gluster volume info Volume Name: vol Type: Replicate Status: Created Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: pranith-laptop:/tmp/1 Brick2: pranith-laptop:/tmp/2 Brick3: pranith-laptop:/tmp/4 We shall use this bug to prevent the cluster to go into unresponsive cluster. Thanks Pranith. hi Raf,
The test case mentioned works fine for me. Is it possible this is also dude to the network.ping-timeout issue?.
Here is the output from my machine for the test case:
pranith @ /mnt
15:02:06 :( $ sudo gluster volume replace-brick vol 192.168.1.44:/tmp/1 192.168.1.121:/tmp/11 start
replace-brick started successfully
pranith @ /mnt
15:02:13 :) $ sudo gluster volume info
Volume Name: vol
Type: Replicate
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.121:/tmp/4
Brick2: 192.168.1.121:/tmp/5
Brick3: 192.168.1.44:/tmp/1
pranith @ /mnt
15:02:42 :) $ sudo gluster peer status
Number of Peers: 1
Hostname: 192.168.1.44
Uuid: 5d050e36-db97-42a9-9f5c-9836e49c93b3
State: Peer in Cluster (Disconnected)
pranith @ /mnt
15:02:47 :) $ sudo gluster volume replace-brick vol 192.168.1.44:/tmp/1 192.168.1.121:/tmp/11 commit force
replace-brick commit successful
pranith @ /mnt
15:03:03 :) $ sudo gluster volume info
Volume Name: vol
Type: Replicate
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.121:/tmp/4
Brick2: 192.168.1.121:/tmp/5
Brick3: 192.168.1.121:/tmp/11
pranith @ /mnt
15:03:07 :) $
Pranith.
Now I've upgraded to 3.1.3 new test procedure: [192.168.0.1]# gluster volume create san replica 3 transport tcp 192.168.0.1:/mnt/gluster 192.168.0.2:/mnt/gluster 192.168.0.3:/mnt/gluster [192.168.0.1]# gluster volume start san [192.168.0.1]# gluster volume set san network.ping-timeout 5 [192.168.0.1]# mount -t glusterfs localhost:/san /mnt/nfs share /mnt/nfs thru SAMBA star copying a lot of data from a SMB client (WinXP) during copy, kill 192.168.0.3 (unplug from surge) data copy hangs for few seconds and than continues smoothly [192.168.0.1]# gluster peer probe 192.168.0.4 [192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster 192.168.0.4:/mnt/gluster start after 10-15 seconds, prompt returns without messages (data copy is still running althought) [192.168.0.1]# gluster volume info Volume Name: san Type: Replicate Status: Created Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: 192.168.0.1:/mnt/gluster Brick1: 192.168.0.2:/mnt/gluster Brick1: 192.168.0.3:/mnt/gluster still try to issue [192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster 192.168.0.4:/mnt/gluster start results in prompt returning without messages [192.168.0.1]# gluster volume remove-brick san 192.168.0.3:/mnt/gluster Operation failed [192.168.0.1]# gluster peer status 192.168.0.2 connected 192.168.0.3 disconnected 192.168.0.4 connected Raf (In reply to comment #5) > Now I've upgraded to 3.1.3 > > new test procedure: > > [192.168.0.1]# gluster volume create san replica 3 transport tcp > 192.168.0.1:/mnt/gluster 192.168.0.2:/mnt/gluster 192.168.0.3:/mnt/gluster > [192.168.0.1]# gluster volume start san > [192.168.0.1]# gluster volume set san network.ping-timeout 5 > [192.168.0.1]# mount -t glusterfs localhost:/san /mnt/nfs > share /mnt/nfs thru SAMBA > star copying a lot of data from a SMB client (WinXP) > during copy, kill 192.168.0.3 (unplug from surge) > data copy hangs for few seconds and than continues smoothly > [192.168.0.1]# gluster peer probe 192.168.0.4 > [192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster > 192.168.0.4:/mnt/gluster start > after 10-15 seconds, prompt returns without messages (data copy is still > running althought) > [192.168.0.1]# gluster volume info > Volume Name: san > Type: Replicate > Status: Created > Number of Bricks: 3 > Transport-type: tcp > Bricks: > Brick1: 192.168.0.1:/mnt/gluster > Brick1: 192.168.0.2:/mnt/gluster > Brick1: 192.168.0.3:/mnt/gluster > > still try to issue > [192.168.0.1]# gluster volume replace-brick san 192.168.0.3:/mnt/gluster > 192.168.0.4:/mnt/gluster start > results in prompt returning without messages > > [192.168.0.1]# gluster volume remove-brick san 192.168.0.3:/mnt/gluster > Operation failed > > [192.168.0.1]# gluster peer status > 192.168.0.2 connected > 192.168.0.3 disconnected > 192.168.0.4 connected > > Raf hi Raf, Do you have the logs for this run?. Could you please post them?. Pranith Created attachment 462 Created attachment 463 Ok, started from 4 clean (rm -rf /etc/gluster/* , rm -rf /usr/local/var/log/gluster/* ) systems. [192.168.0.202]# gluster peer probe 192.168.0.203 [192.168.0.202]# gluster peer probe 192.168.0.204 [192.168.0.202]# gluster volume create san replica 3 transport tcp 192.168.0.202:/mnt/gluster 192.168.0.203:/mnt/gluster 192.168.0.204:/mnt/gluster [192.168.0.202]# gluster volume set san network.ping-timeout 5 [192.168.0.202]# gluster volume start san [192.168.0.202]# mount -t glusterfs localhost:/san /mnt/nfs share /mnt/nfs thru SAMBA start copy a lot of data from WinXP client while copying unplug 192.168.0.204 from main surge and then the following commands: [192.168.0.202]# gluster peer probe 192.168.0.205 [192.168.0.202]# gluster volume replace-brick san 192.168.0.204:/mnt/gluster 192.168.0.205:/mnt/gluster start command return "operation started successfully" but nothing happens so I tried: [192.168.0.202]# gluster volume replace-brick san 192.168.0.204:/mnt/gluster 192.168.0.205:/mnt/gluster status returned "status unknown" [192.168.0.202]# gluster volume replace-brick san 192.168.0.204:/mnt/gluster 192.168.0.205:/mnt/gluster abort returned "operation failed" then I tried: [192.168.0.202]# gluster volume remove-brick san 192.168.0.204:/mnt/gluster [192.168.0.202]# gluster volume add-brick san 192.168.0.205:/mnt/gluster and this did the trick then restarted 192.168.0.204 machine and tried: [192.168.0.202]# gluster volume replace-brick san 192.168.0.205:/mnt/gluster 192.168.0.204:/mnt/gluster start returned "unsuccessful" [192.168.0.202]# gluster volume remove-brick san 192.168.0.205:/mnt/gluster operation "successful" [192.168.0.202]# gluster volume add-brick san 192.168.0.204:/mnt/gluster returned "operation successful" but 192.168.0.204 peer didn't copied any data (data copy was running all time long). I've posted the logs before this comment, sorry. Raf Raf, In the later versions of glusterfs, there are few of the 'replace-brick' related bugs are fixed. Can you please check and verify that the bugs are fixed for you? If they still exist, please 're-open' the bug. We are currently unable to reproduce the issue. |