Bug 797729

Summary: [glusterfs-3.3.0qa24]: if replace-brick fails (due to crash), then replace-brick cannot be aborted
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: amarts, gluster-bugs, nsathyan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-11 07:19:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 996047    

Description Raghavendra Bhat 2012-02-27 07:01:10 UTC
Description of problem:
Suppose in a replace-brick operation, the source brick crashes, the replace-brick abort cannot be given (the command just hangs and then returns without giving any message), and further volume related operations such as volume stop fails saying replace-brick is in progress.

 gluster volume stop test
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Replace brick is in progress on volume test. Please retry after replace-brick operation is committed or aborted
[root@node130 src]# gluster volume stop mirror
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Replace brick is in progress on volume mirror. Please retry after replace-brick operation is committed or aborted
[root@node130 src]# 


Version-Release number of selected component (if applicable):


How reproducible:

Whenever the source brick is crashed

Steps to Reproduce:
1. create and start a volume
2. mount the volume and do some i/o on the mount point
3. do replace brick
  
Actual results:

after source brick crash, replace-brick cannot be aborted (even after volume start force) and volume stop commands does not work.

Expected results:

after the source brick crash, replace-brick should be able to be aborted.

Additional info:

glusterd logs.

[2012-02-27 01:58:01.306839] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:04.318727] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:07.332177] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:10.344481] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:13.355091] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:16.365892] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:19.376598] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:22.387863] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:25.399657] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:28.411238] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:31.422155] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:34.434952] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:37.445701] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:40.456460] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:43.467230] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:46.478046] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:49.489197] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:52.500438] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:55.511128] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:58:58.521764] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:01.532598] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:04.545361] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:05.926555] I [glusterd-volume-ops.c:353:glusterd_handle_cli_stop_volume] 0-glusterd: Received stop vol reqfor volume test
[2012-02-27 01:59:05.926622] I [glusterd-utils.c:267:glusterd_lock] 0-glusterd: Cluster lock held by 771bfee1-3f30-477f-8f7c-3c9b91f70307
[2012-02-27 01:59:05.926668] I [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Acquired local lock
[2012-02-27 01:59:05.926750] E [glusterd-volume-ops.c:910:glusterd_op_stage_stop_volume] 0-management: Replace brick is in progress on volume test. Please retry after replace-brick operation is committed or aborted
[2012-02-27 01:59:05.926771] E [glusterd-op-sm.c:1685:glusterd_op_ac_send_stage_op] 0-: Staging failed
[2012-02-27 01:59:05.926786] I [glusterd-op-sm.c:1725:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 0 peers
[2012-02-27 01:59:05.926807] I [glusterd-op-sm.c:2107:glusterd_op_txn_complete] 0-glusterd: Cleared local lock
[2012-02-27 01:59:07.556972] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:10.570813] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:12.572551] I [glusterd-volume-ops.c:353:glusterd_handle_cli_stop_volume] 0-glusterd: Received stop vol reqfor volume mirror
[2012-02-27 01:59:12.572600] I [glusterd-utils.c:267:glusterd_lock] 0-glusterd: Cluster lock held by 771bfee1-3f30-477f-8f7c-3c9b91f70307
[2012-02-27 01:59:12.572612] I [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Acquired local lock
[2012-02-27 01:59:12.572659] E [glusterd-volume-ops.c:910:glusterd_op_stage_stop_volume] 0-management: Replace brick is in progress on volume mirror. Please retry after replace-brick operation is committed or aborted
[2012-02-27 01:59:12.572683] E [glusterd-op-sm.c:1685:glusterd_op_ac_send_stage_op] 0-: Staging failed
[2012-02-27 01:59:12.572695] I [glusterd-op-sm.c:1725:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 0 peers
[2012-02-27 01:59:12.572709] I [glusterd-op-sm.c:2107:glusterd_op_txn_complete] 0-glusterd: Cleared local lock
[2012-02-27 01:59:13.581461] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:16.594401] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:19.605427] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
[2012-02-27 01:59:22.617023] E [socket.c:2121:socket_connect] 0-management: connection attempt failed (Connection refused)
(END)

Comment 1 Amar Tumballi 2012-04-27 08:55:39 UTC
sent a patch for fixing this issue @ http://review.gluster.com/3072 - Under discussion.

Comment 2 Vijay Bellur 2012-05-18 13:07:10 UTC
Pushing this to post 3.3.0

Comment 3 krishnan parthasarathi 2012-07-11 07:19:27 UTC

*** This bug has been marked as a duplicate of bug 816915 ***