Bug 763816 (GLUSTER-2084)

Summary: [3.1.1qa5] : replace-brick fails to migrate data when migration from same hostname
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: glusterdAssignee: Vijay Bellur <vbellur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: cww, gluster-bugs, rabhat, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harshavardhana 2010-11-11 01:16:15 UTC
[root@compel1 ~]# gluster volume info

Volume Name: dist
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: compel1:/export1
Brick2: compel4:/export1


[root@compel4 ~]# gluster volume replace-brick dist compel4:/export1 compel4:/export2 start
replace-brick started successfully

"export2" is a different block device on the same node. 

Even after 10mins of starting the operation 

[root@compel4 ~]# gluster volume replace-brick dist compel4:/export1 compel4:/export2 status
Number of files migrated = 0       Current file=  
[root@compel4 ~]#

[root@compel4 ~]# find /export1/ | wc -l
2543

[root@compel4 ~]# find /export2/ | wc -l
1
[root@compel4 ~]# ls /export2/ 
[root@compel4 ~]#

[root@compel4 ~]# gluster volume replace-brick dist compel4:/export1 compel4:/export2 abort
replace-brick aborted successfully
[root@compel4 ~]# 

Nothing special log files.

Comment 1 Raghavendra Bhat 2010-11-11 01:49:21 UTC
[2010-11-11 10:08:01.959440] D [glusterd-op-sm.c:5337:glusterd_op_set_cli_op] : Returning 0
[2010-11-11 10:08:01.959641] I [glusterd-handler.c:1225:glusterd_handle_replace_brick] glusterd: Received replace brick req
[2010-11-11 10:08:01.959694] D [glusterd-handler.c:1258:glusterd_handle_replace_brick] : src brick=bigbang:/d/glusterfs/export/export1
[2010-11-11 10:08:01.959733] D [glusterd-handler.c:1268:glusterd_handle_replace_brick] : dst brick=bigbang:/e/glusterfs/export/export1
[2010-11-11 10:08:01.959819] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 92782297-9b10-4a6d-8aca-a866b764bdf1
[2010-11-11 10:08:01.959858] I [glusterd-handler.c:2788:glusterd_op_txn_begin] glusterd: Acquired local lock
[2010-11-11 10:08:01.959896] D [glusterd-op-sm.c:5194:glusterd_op_sm_inject_event] glusterd: Enqueuing event: 'GD_OP_EVENT_START_LOCK'
[2010-11-11 10:08:01.959933] D [glusterd-handler.c:2792:glusterd_op_txn_begin] glusterd: Returning 0
[2010-11-11 10:08:01.960000] D [glusterd-op-sm.c:5242:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_START_LOCK'
[2010-11-11 10:08:01.960038] I [glusterd3_1-mops.c:1105:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers
[2010-11-11 10:08:01.960073] D [glusterd3_1-mops.c:1108:glusterd3_1_cluster_lock] glusterd: Returning 0
[2010-11-11 10:08:01.960109] D [glusterd-op-sm.c:5194:glusterd_op_sm_inject_event] glusterd: Enqueuing event: 'GD_OP_EVENT_ALL_ACC'
[2010-11-11 10:08:01.960144] D [glusterd-op-sm.c:221:glusterd_op_sm_inject_all_acc] : Returning 0
[2010-11-11 10:08:01.960179] D [glusterd-op-sm.c:4023:glusterd_op_ac_send_lock] : Returning with 0
[2010-11-11 10:08:01.960216] D [glusterd-utils.c:2605:glusterd_sm_tr_log_transition_add] glusterd: Transitioning from 'Default' to 'Lock sent' due to event 'GD_OP_EVENT_START_LOCK'
[2010-11-11 10:08:01.960254] D [glusterd-utils.c:2607:glusterd_sm_tr_log_transition_add] : returning 0
[2010-11-11 10:08:01.960290] D [glusterd-op-sm.c:5242:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_ALL_ACC'
[2010-11-11 10:08:01.960347] D [glusterd-op-sm.c:914:glusterd_op_stage_replace_brick] : src brick=bigbang:/d/glusterfs/export/export1
[2010-11-11 10:08:01.960389] D [glusterd-op-sm.c:924:glusterd_op_stage_replace_brick] : dst brick=bigbang:/e/glusterfs/export/export1
[2010-11-11 10:08:01.960483] D [glusterd-utils.c:795:glusterd_volinfo_find] : Volume vol found
[2010-11-11 10:08:01.960522] D [glusterd-utils.c:803:glusterd_volinfo_find] : Returning 0
[2010-11-11 10:08:01.960557] D [glusterd-utils.c:2358:glusterd_is_rb_started] : is_rb_started:status=0
[2010-11-11 10:08:01.960583] I [glusterd-utils.c:726:glusterd_volume_brickinfo_get_by_brick] : brick: bigbang:/d/glusterfs/export/export1
[2010-11-11 10:08:01.960771] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.960789] D [glusterd-utils.c:2188:glusterd_hostname_to_uuid] : returning 0
[2010-11-11 10:08:01.960803] I [glusterd-utils.c:697:glusterd_volume_brickinfo_get] : Found brick
[2010-11-11 10:08:01.960816] D [glusterd-utils.c:708:glusterd_volume_brickinfo_get] : Returning 0
[2010-11-11 10:08:01.960829] D [glusterd-utils.c:755:glusterd_volume_brickinfo_get_by_brick] : Returning 0
[2010-11-11 10:08:01.960880] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.960896] D [glusterd-op-sm.c:1026:glusterd_op_stage_replace_brick] : I AM THE SOURCE HOST
[2010-11-11 10:08:01.960937] E [dict.c:308:dict_set] dict: @this=(nil) @value=0x21ee110, key=src-brick-port
[2010-11-11 10:08:01.960953] D [glusterd-op-sm.c:1033:glusterd_op_stage_replace_brick] : Could not set src-brick-port=24010
[2010-11-11 10:08:01.961093] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.961110] D [glusterd-utils.c:2188:glusterd_hostname_to_uuid] : returning 0
[2010-11-11 10:08:01.961124] D [glusterd-utils.c:708:glusterd_volume_brickinfo_get] : Returning -1
[2010-11-11 10:08:01.961147] D [glusterd-utils.c:610:glusterd_brickinfo_new] : Returning 0
[2010-11-11 10:08:01.961163] D [glusterd-utils.c:667:glusterd_brickinfo_from_brick] : Returning 0
[2010-11-11 10:08:01.961217] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.961262] D [glusterd-utils.c:2436:glusterd_brick_create_path] : returning 0
[2010-11-11 10:08:01.961278] D [glusterd-op-sm.c:1112:glusterd_op_stage_replace_brick] : Returning 0
[2010-11-11 10:08:01.961291] D [glusterd-op-sm.c:4941:glusterd_op_stage_validate] : Returning 0


I have observed it too. I think it is because in glusterd_op_stage_replace brick we set the port number in the dict (rsp_dict), somehow that dict has become null.

2010-11-11 10:08:01.960896] D [glusterd-op-sm.c:1026:glusterd_op_stage_replace_brick] : I AM THE SOURCE HOST
[2010-11-11 10:08:01.960937] E [dict.c:308:dict_set] dict: @this=(nil) @value=0x21ee110, key=src-brick-port
[2010-11-11 10:08:01.960953] D [glusterd-op-sm.c:1033:glusterd_op_stage_replace_brick] : Could not set src-brick-port=24010

Comment 2 Raghavendra Bhat 2010-11-11 03:40:40 UTC
[2010-11-11 12:08:03.848210] D [glusterd-utils.c:2607:glusterd_sm_tr_log_transition_add] : returning 0
[2010-11-11 12:08:03.848224] D [glusterd-op-sm.c:5244:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_ALL_ACC'
[2010-11-11 12:08:03.848241] M [glusterd3_1-mops.c:1220:glusterd3_1_stage_op] : IT IS COMING HERE, and calling glusterd_op_stage_validate
[2010-11-11 12:08:03.848261] D [glusterd-op-sm.c:914:glusterd_op_stage_replace_brick] : src brick=bigbang:/d/glusterfs/export/export1
[2010-11-11 12:08:03.848277] D [glusterd-op-sm.c:924:glusterd_op_stage_replace_brick] : dst brick=bigbang:/e/glusterfs/export/export1
[2010-11-11 12:08:03.848291] D [glusterd-utils.c:795:glusterd_volinfo_find] : Volume vol found
[2010-11-11 12:08:03.848306] D [glusterd-utils.c:803:glusterd_volinfo_find] : Returning 0
[2010-11-11 12:08:03.848320] I [glusterd-utils.c:726:glusterd_volume_brickinfo_get_by_brick] : brick: bigbang:/d/glusterfs/export/export1


In the above log it says glusterd_op_stage_validate is being called from glusterd3_1_stage_op and it sends NULL in place of rsp_dict.

Comment 3 Anand Avati 2010-11-14 15:26:47 UTC
PATCH: http://patches.gluster.com/patch/5690 in master (cluster/pump: Reset saved path upon pump completion)

Comment 4 Anand Avati 2010-11-14 15:26:51 UTC
PATCH: http://patches.gluster.com/patch/5691 in master (mgmt/glusterd: fixes for uninterrupted replace-brick with nfs)

Comment 5 Anand Avati 2010-11-14 15:26:59 UTC
PATCH: http://patches.gluster.com/patch/5688 in master (check for dict also while setting the port for source brick while doing replace brick)

Comment 6 Anand Avati 2010-11-18 10:55:58 UTC
PATCH: http://patches.gluster.com/patch/5744 in master (mgmt/glusterd: Avoid creating multiple destination brickinfo during replace-brick)

Comment 7 Anand Avati 2010-11-25 05:34:02 UTC
PATCH: http://patches.gluster.com/patch/5778 in master (mgmt/glusterd: Temporary fix for a crash seen in replace-brick)

Comment 8 Raghavendra Bhat 2011-02-21 03:24:38 UTC
checked with the git head (26cedae57d5b7cb8d50ed077ce29c92e30d6e260). Migration where the source and the destination are the same machine, worked fine.