Bug 818519

Summary: [170a3a411c88f6ce1662c55440a372f512e901d1]: replace-brick commands (abort/commit) fail if all the gluster processes are killed and restarted
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: medium    
Version: mainlineCC: amarts, gluster-bugs, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-11 07:11:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra Bhat 2012-05-03 09:05:57 UTC
Description of problem:

Suppose replace-brick is given on a volume. After the replace-brick is done (replace-brick status command says migration complete), kill all the gluster processes (i.e. glusterfsd, glusterfs, glusterd or the machine can be rebooted).

Now start the gluster processes (starting glusterd).

Now if replace-brick abort or replace-brick commit command is given, then it just gets blocked for 2 minutes and then returns without giving any output.

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1. Start replace-brick on a running volume and wait till the migration is complete
2. killall all the gluster processes (glusterfsd, glusterfs, glusterd or reboot the machine)
3. Restart the gluster processes and give replace-brick commit/abort.
  
Actual results:

replace-brick abort/commit commands gets blocked for 2 minutes and then return without any output

Expected results:

Replce brick commands should work properly.

Additional info:


 5:     option transport.socket.keepalive-time 10
  6:     option transport.socket.keepalive-interval 2
  7:     option transport.socket.read-fail-log off
  8: end-volume

+------------------------------------------------------------------------------+
[2012-05-03 14:19:20.284691] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-05-03 14:19:20.284774] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-05-03 14:19:20.284816] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-05-03 14:19:20.311447] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2012-05-03 14:19:20.313057] I [glusterd-pmap.c:238:pmap_registry_bind] 0-pmap: adding brick /mnt/sda6/export4 on port 24017
[2012-05-03 14:19:20.313672] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2012-05-03 14:19:20.317597] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2012-05-03 14:19:20.318349] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2012-05-03 14:19:20.336120] I [glusterd-handshake.c:255:server_event_notify] 0-: recieved defrag status updated
[2012-05-03 14:19:20.341376] W [socket.c:1521:__socket_proto_state_machine] 0-management: reading from socket failed. Error (Transport endpoint is not connected), peer (/etc/glusterd/vols/mirror/rebalance/c759c363-3988-4a75-aa41-6eee929e825c.sock)
[2012-05-03 14:19:20.374055] I [mem-pool.c:585:mem_pool_destroy] 0-management: size=2236 max=0 total=0
[2012-05-03 14:19:20.374084] I [mem-pool.c:585:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2012-05-03 14:19:20.818158] I [glusterd-pmap.c:238:pmap_registry_bind] 0-pmap: adding brick /mnt/sda10/export3 on port 24015
[2012-05-03 14:21:18.258088] I [glusterd-replace-brick.c:98:glusterd_handle_replace_brick] 0-glusterd: Received replace brick req
[2012-05-03 14:21:18.258156] I [glusterd-replace-brick.c:147:glusterd_handle_replace_brick] 0-glusterd: Received replace brick status request
[2012-05-03 14:21:18.283347] I [glusterd-utils.c:283:glusterd_lock] 0-glusterd: Cluster lock held by c759c363-3988-4a75-aa41-6eee929e825c
[2012-05-03 14:21:18.283477] I [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Acquired local lock
[2012-05-03 14:21:18.283844] I [glusterd-utils.c:855:glusterd_volume_brickinfo_get_by_brick] 0-: brick: hyperspace:/mnt/sda6/export4
[2012-05-03 14:21:18.285059] I [glusterd-utils.c:812:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-05-03 14:21:18.286375] I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 0 peers
[2012-05-03 14:21:18.286474] I [glusterd-utils.c:855:glusterd_volume_brickinfo_get_by_brick] 0-: brick: hyperspace:/mnt/sda6/export4
[2012-05-03 14:21:18.286743] I [glusterd-utils.c:812:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-05-03 14:21:18.286989] I [glusterd-replace-brick.c:1229:rb_update_srcbrick_port] 0-: adding src-brick port no
[2012-05-03 14:21:18.287061] I [glusterd-replace-brick.c:1286:rb_update_dstbrick_port] 0-: adding dst-brick port no
[2012-05-03 14:21:18.453493] I [glusterd-op-sm.c:2358:glusterd_op_ac_send_commit_op] 0-management: Sent op req to 0 peers
[2012-05-03 14:21:18.453577] I [glusterd-op-sm.c:2627:glusterd_op_txn_complete] 0-glusterd: Cleared local lock
[2012-05-03 14:21:20.754563] I [glusterd-replace-brick.c:98:glusterd_handle_replace_brick] 0-glusterd: Received replace brick req
[2012-05-03 14:21:20.754614] I [glusterd-replace-brick.c:147:glusterd_handle_replace_brick] 0-glusterd: Received replace brick abort request
[2012-05-03 14:21:20.754647] I [glusterd-utils.c:283:glusterd_lock] 0-glusterd: Cluster lock held by c759c363-3988-4a75-aa41-6eee929e825c
[2012-05-03 14:21:20.754673] I [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Acquired local lock
[2012-05-03 14:21:20.754748] I [glusterd-utils.c:855:glusterd_volume_brickinfo_get_by_brick] 0-: brick: hyperspace:/mnt/sda6/export4
[2012-05-03 14:21:20.755016] I [glusterd-utils.c:812:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-05-03 14:21:20.755711] I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 0 peers
[2012-05-03 14:21:20.755754] I [glusterd-utils.c:855:glusterd_volume_brickinfo_get_by_brick] 0-: brick: hyperspace:/mnt/sda6/export4
[2012-05-03 14:21:20.755962] I [glusterd-utils.c:812:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-05-03 14:21:20.756195] I [glusterd-replace-brick.c:1229:rb_update_srcbrick_port] 0-: adding src-brick port no
[2012-05-03 14:21:20.756260] I [glusterd-replace-brick.c:1286:rb_update_dstbrick_port] 0-: adding dst-brick port no
(END)

Comment 1 Amar Tumballi 2012-07-11 06:22:21 UTC
patch sent @ http://review.gluster.com/3264

Comment 2 krishnan parthasarathi 2012-07-11 07:11:18 UTC

*** This bug has been marked as a duplicate of bug 816915 ***