Bug 763816 - (GLUSTER-2084) [3.1.1qa5] : replace-brick fails to migrate data when migration from same hostname
[3.1.1qa5] : replace-brick fails to migrate data when migration from same hos...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
mainline
All Linux
low Severity low
: ---
: ---
Assigned To: Vijay Bellur
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-10 20:16 EST by Harshavardhana
Modified: 2015-03-22 21:03 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: RTP
Mount Type: ---
Documentation: DNR
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Harshavardhana 2010-11-10 20:16:15 EST
[root@compel1 ~]# gluster volume info

Volume Name: dist
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: compel1:/export1
Brick2: compel4:/export1


[root@compel4 ~]# gluster volume replace-brick dist compel4:/export1 compel4:/export2 start
replace-brick started successfully

"export2" is a different block device on the same node. 

Even after 10mins of starting the operation 

[root@compel4 ~]# gluster volume replace-brick dist compel4:/export1 compel4:/export2 status
Number of files migrated = 0       Current file=  
[root@compel4 ~]#

[root@compel4 ~]# find /export1/ | wc -l
2543

[root@compel4 ~]# find /export2/ | wc -l
1
[root@compel4 ~]# ls /export2/ 
[root@compel4 ~]#

[root@compel4 ~]# gluster volume replace-brick dist compel4:/export1 compel4:/export2 abort
replace-brick aborted successfully
[root@compel4 ~]# 

Nothing special log files.
Comment 1 Raghavendra Bhat 2010-11-10 20:49:21 EST
[2010-11-11 10:08:01.959440] D [glusterd-op-sm.c:5337:glusterd_op_set_cli_op] : Returning 0
[2010-11-11 10:08:01.959641] I [glusterd-handler.c:1225:glusterd_handle_replace_brick] glusterd: Received replace brick req
[2010-11-11 10:08:01.959694] D [glusterd-handler.c:1258:glusterd_handle_replace_brick] : src brick=bigbang:/d/glusterfs/export/export1
[2010-11-11 10:08:01.959733] D [glusterd-handler.c:1268:glusterd_handle_replace_brick] : dst brick=bigbang:/e/glusterfs/export/export1
[2010-11-11 10:08:01.959819] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 92782297-9b10-4a6d-8aca-a866b764bdf1
[2010-11-11 10:08:01.959858] I [glusterd-handler.c:2788:glusterd_op_txn_begin] glusterd: Acquired local lock
[2010-11-11 10:08:01.959896] D [glusterd-op-sm.c:5194:glusterd_op_sm_inject_event] glusterd: Enqueuing event: 'GD_OP_EVENT_START_LOCK'
[2010-11-11 10:08:01.959933] D [glusterd-handler.c:2792:glusterd_op_txn_begin] glusterd: Returning 0
[2010-11-11 10:08:01.960000] D [glusterd-op-sm.c:5242:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_START_LOCK'
[2010-11-11 10:08:01.960038] I [glusterd3_1-mops.c:1105:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers
[2010-11-11 10:08:01.960073] D [glusterd3_1-mops.c:1108:glusterd3_1_cluster_lock] glusterd: Returning 0
[2010-11-11 10:08:01.960109] D [glusterd-op-sm.c:5194:glusterd_op_sm_inject_event] glusterd: Enqueuing event: 'GD_OP_EVENT_ALL_ACC'
[2010-11-11 10:08:01.960144] D [glusterd-op-sm.c:221:glusterd_op_sm_inject_all_acc] : Returning 0
[2010-11-11 10:08:01.960179] D [glusterd-op-sm.c:4023:glusterd_op_ac_send_lock] : Returning with 0
[2010-11-11 10:08:01.960216] D [glusterd-utils.c:2605:glusterd_sm_tr_log_transition_add] glusterd: Transitioning from 'Default' to 'Lock sent' due to event 'GD_OP_EVENT_START_LOCK'
[2010-11-11 10:08:01.960254] D [glusterd-utils.c:2607:glusterd_sm_tr_log_transition_add] : returning 0
[2010-11-11 10:08:01.960290] D [glusterd-op-sm.c:5242:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_ALL_ACC'
[2010-11-11 10:08:01.960347] D [glusterd-op-sm.c:914:glusterd_op_stage_replace_brick] : src brick=bigbang:/d/glusterfs/export/export1
[2010-11-11 10:08:01.960389] D [glusterd-op-sm.c:924:glusterd_op_stage_replace_brick] : dst brick=bigbang:/e/glusterfs/export/export1
[2010-11-11 10:08:01.960483] D [glusterd-utils.c:795:glusterd_volinfo_find] : Volume vol found
[2010-11-11 10:08:01.960522] D [glusterd-utils.c:803:glusterd_volinfo_find] : Returning 0
[2010-11-11 10:08:01.960557] D [glusterd-utils.c:2358:glusterd_is_rb_started] : is_rb_started:status=0
[2010-11-11 10:08:01.960583] I [glusterd-utils.c:726:glusterd_volume_brickinfo_get_by_brick] : brick: bigbang:/d/glusterfs/export/export1
[2010-11-11 10:08:01.960771] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.960789] D [glusterd-utils.c:2188:glusterd_hostname_to_uuid] : returning 0
[2010-11-11 10:08:01.960803] I [glusterd-utils.c:697:glusterd_volume_brickinfo_get] : Found brick
[2010-11-11 10:08:01.960816] D [glusterd-utils.c:708:glusterd_volume_brickinfo_get] : Returning 0
[2010-11-11 10:08:01.960829] D [glusterd-utils.c:755:glusterd_volume_brickinfo_get_by_brick] : Returning 0
[2010-11-11 10:08:01.960880] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.960896] D [glusterd-op-sm.c:1026:glusterd_op_stage_replace_brick] : I AM THE SOURCE HOST
[2010-11-11 10:08:01.960937] E [dict.c:308:dict_set] dict: @this=(nil) @value=0x21ee110, key=src-brick-port
[2010-11-11 10:08:01.960953] D [glusterd-op-sm.c:1033:glusterd_op_stage_replace_brick] : Could not set src-brick-port=24010
[2010-11-11 10:08:01.961093] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.961110] D [glusterd-utils.c:2188:glusterd_hostname_to_uuid] : returning 0
[2010-11-11 10:08:01.961124] D [glusterd-utils.c:708:glusterd_volume_brickinfo_get] : Returning -1
[2010-11-11 10:08:01.961147] D [glusterd-utils.c:610:glusterd_brickinfo_new] : Returning 0
[2010-11-11 10:08:01.961163] D [glusterd-utils.c:667:glusterd_brickinfo_from_brick] : Returning 0
[2010-11-11 10:08:01.961217] D [glusterd-utils.c:199:glusterd_is_local_addr] glusterd: bigbang is local
[2010-11-11 10:08:01.961262] D [glusterd-utils.c:2436:glusterd_brick_create_path] : returning 0
[2010-11-11 10:08:01.961278] D [glusterd-op-sm.c:1112:glusterd_op_stage_replace_brick] : Returning 0
[2010-11-11 10:08:01.961291] D [glusterd-op-sm.c:4941:glusterd_op_stage_validate] : Returning 0


I have observed it too. I think it is because in glusterd_op_stage_replace brick we set the port number in the dict (rsp_dict), somehow that dict has become null.

2010-11-11 10:08:01.960896] D [glusterd-op-sm.c:1026:glusterd_op_stage_replace_brick] : I AM THE SOURCE HOST
[2010-11-11 10:08:01.960937] E [dict.c:308:dict_set] dict: @this=(nil) @value=0x21ee110, key=src-brick-port
[2010-11-11 10:08:01.960953] D [glusterd-op-sm.c:1033:glusterd_op_stage_replace_brick] : Could not set src-brick-port=24010
Comment 2 Raghavendra Bhat 2010-11-10 22:40:40 EST
[2010-11-11 12:08:03.848210] D [glusterd-utils.c:2607:glusterd_sm_tr_log_transition_add] : returning 0
[2010-11-11 12:08:03.848224] D [glusterd-op-sm.c:5244:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_ALL_ACC'
[2010-11-11 12:08:03.848241] M [glusterd3_1-mops.c:1220:glusterd3_1_stage_op] : IT IS COMING HERE, and calling glusterd_op_stage_validate
[2010-11-11 12:08:03.848261] D [glusterd-op-sm.c:914:glusterd_op_stage_replace_brick] : src brick=bigbang:/d/glusterfs/export/export1
[2010-11-11 12:08:03.848277] D [glusterd-op-sm.c:924:glusterd_op_stage_replace_brick] : dst brick=bigbang:/e/glusterfs/export/export1
[2010-11-11 12:08:03.848291] D [glusterd-utils.c:795:glusterd_volinfo_find] : Volume vol found
[2010-11-11 12:08:03.848306] D [glusterd-utils.c:803:glusterd_volinfo_find] : Returning 0
[2010-11-11 12:08:03.848320] I [glusterd-utils.c:726:glusterd_volume_brickinfo_get_by_brick] : brick: bigbang:/d/glusterfs/export/export1


In the above log it says glusterd_op_stage_validate is being called from glusterd3_1_stage_op and it sends NULL in place of rsp_dict.
Comment 3 Anand Avati 2010-11-14 10:26:47 EST
PATCH: http://patches.gluster.com/patch/5690 in master (cluster/pump: Reset saved path upon pump completion)
Comment 4 Anand Avati 2010-11-14 10:26:51 EST
PATCH: http://patches.gluster.com/patch/5691 in master (mgmt/glusterd: fixes for uninterrupted replace-brick with nfs)
Comment 5 Anand Avati 2010-11-14 10:26:59 EST
PATCH: http://patches.gluster.com/patch/5688 in master (check for dict also while setting the port for source brick while doing replace brick)
Comment 6 Anand Avati 2010-11-18 05:55:58 EST
PATCH: http://patches.gluster.com/patch/5744 in master (mgmt/glusterd: Avoid creating multiple destination brickinfo during replace-brick)
Comment 7 Anand Avati 2010-11-25 00:34:02 EST
PATCH: http://patches.gluster.com/patch/5778 in master (mgmt/glusterd: Temporary fix for a crash seen in replace-brick)
Comment 8 Raghavendra Bhat 2011-02-20 22:24:38 EST
checked with the git head (26cedae57d5b7cb8d50ed077ce29c92e30d6e260). Migration where the source and the destination are the same machine, worked fine.

Note You need to log in before you can comment on or make changes to this bug.