Bug 1244527

Summary:	DHT-rebalance: Rebalance hangs on distribute volume when glusterd is stopped on peer node
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Triveni Rao <trao>
Component:	distribute	Assignee:	Anand Nekkunti <anekkunt>
Status:	CLOSED ERRATA	QA Contact:	Byreddy <bsrirama>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	amukherj, anekkunt, asriram, asrivast, divya, nsathyan, rcyriac, sashinde, sasundar
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.1.1
Hardware:	x86_64
OS:	Linux
Whiteboard:	glusterd
Fixed In Version:	glusterfs-3.7.1-12	Doc Type:	Bug Fix
Doc Text:	Previously, the "gluster vol rebalance <vol_name> start" command might be hung if any nodes in a cluster go down simultaneously. With this fix, this issue is resolved.	Story Points:	---
Clone Of:
Clones:	1245142 (view as bug list)		Environment:
Last Closed:	2015-10-05 07:12:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1216951, 1245142, 1249925, 1251815

Description Triveni Rao 2015-07-19 17:25:47 UTC

Problem statement:
============================

DHT-rebalance: Rebalance hangs on distribute volume when glusterd is stopped on peer node.

operations such as gluster v status not successful, it displays message as 
failed: Another transaction is in progress for great. Please try again after sometime.

Steps/procedure:

1. create a distribut volume using 2 nodes.
2. fuse and nfs mount it and add file/dir etc 
3. add brick and initiate rebalance.
4. while rebalance is happening stop glusterd on second node.
5. check the results.

Actual results:
================

1.rebalance command did not complete, it was hung.
2. no other operations could be done volume.

Expected results:
==================
1. rebalance should have continued even after glusterd waas down on second node.


Output:
===================

[root@casino-vm1 ~]# gluster v create great 10.70.35.57:/rhs/brick2/g0 10.70.35.136:/rhs/brick2/g0
volume create: great: success: please start the volume to access data
[root@casino-vm1 ~]# gluster v start great force
volume start: great: success
[root@casino-vm1 ~]# gluster v info great

Volume Name: great   
Type: Distribute
Volume ID: 2d3b4766-c99d-4113-8f0b-43116aeb931f
Status: Started
Number of Bricks: 2  
Transport-type: tcp  
Bricks:
Brick1: 10.70.35.57:/rhs/brick2/g0
Brick2: 10.70.35.136:/rhs/brick2/g0
Options Reconfigured:
performance.readdir-ahead: on
[root@casino-vm1 ~]#
[root@casino-vm1 ~]# gluster v add-brick great 10.70.35.57:/rhs/brick4/g0 10.70.35.136:/rhs/brick4/g0
volume add-brick: success
[root@casino-vm1 ~]# gluster v rebalance great start force; gluster v rebalance great status


^C
[root@casino-vm1 ~]# gluster v rebalance great start force
volume rebalance: great: failed: Another transaction is in progress for great. Please try again after sometime.
[root@casino-vm1 ~]# gluster v rebalance great start force
volume rebalance: great: failed: Another transaction is in progress for great. Please try again after sometime.
[root@casino-vm1 ~]# 


[root@casino-vm1 ~]# less  /var/log/glusterfs/great-rebalance.log 
..skipping...
-client-7: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-07-19 10:03:26.409868 (xid=0x1)
[2015-07-19 10:03:28.614410] W [MSGID: 114032] [client-handshake.c:1623:client_dump_version_cbk] 0-great-client-7: received RPC status error [Transport endpoint is not connected]
[2015-07-19 10:03:28.614441] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-great-client-7: disconnected from great-client-7. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-19 10:03:28.614472] W [MSGID: 109073] [dht-common.c:6815:dht_notify] 0-great-dht: Received CHILD_DOWN. Exiting
[2015-07-19 10:03:28.614511] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-great-client-5: disconnected from great-client-5. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-19 10:03:28.614517] W [socket.c:642:__socket_rwv] 0-great-client-1: readv on 10.70.35.136:24007 failed (No data available)
[2015-07-19 10:03:28.614552] W [MSGID: 109073] [dht-common.c:6815:dht_notify] 0-great-dht: Received CHILD_DOWN. Exiting
[2015-07-19 10:03:28.614799] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7faf7fce0580] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7faf7faac167] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7faf7faac27e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7faf7faac34b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7faf7faac90f] ))))) 0-great-client-1: forced unwinding frame type(Gluster Portmap) op(PORTBYBRICK(1)) called at 2015-07-19 10:03:26.383580 (xid=0x2)
[2015-07-19 10:03:28.614846] W [MSGID: 114032] [client-handshake.c:1506:client_query_portmap_cbk] 0-great-client-1: received RPC status error, try again later [Transport endpoint is not connected]
[2015-07-19 10:03:28.614879] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-great-client-1: disconnected from great-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-19 10:03:28.614898] W [MSGID: 109073] [dht-common.c:6815:dht_notify] 0-great-dht: Received CHILD_DOWN. Exiting
[2015-07-19 10:03:28.622710] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 0-great-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=2 overlaps=0
[2015-07-19 10:03:28.622759] W [MSGID: 109005] [dht-selfheal.c:1782:dht_selfheal_directory] 0-great-dht: Directory selfheal failed: 4 subvolumes down.Not fixing. path = /, gfid = 00000000-0000-0000-0000-000000000001
[2015-07-19 10:03:28.622908] I [dht-rebalance.c:2652:gf_defrag_start_crawl] 0-great-dht: gf_defrag_start_crawl using commit hash 2908467018
[2015-07-19 10:03:28.623813] I [MSGID: 109081] [dht-common.c:3783:dht_setxattr] 0-great-dht: fixing the layout of /
[2015-07-19 10:03:28.623848] W [MSGID: 109016] [dht-selfheal.c:1474:dht_fix_layout_of_directory] 0-great-dht: Layout fix failed: 4 subvolume(s) are down. Skipping fix layout.
[2015-07-19 10:03:28.624132] E [MSGID: 109026] [dht-rebalance.c:2694:gf_defrag_start_crawl] 0-great-dht: fix layout on / failed
[2015-07-19 10:03:28.624174] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-great-dht: Rebalance is failed. Time taken is 0.00 secs
[2015-07-19 10:03:28.624192] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-great-dht: Files migrated: 0, size: 0, lookups: 0, failures: 1, skipped: 0
[2015-07-19 10:03:28.625060] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x374b207a51) [0x7faf7edc7a51] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7faf801a6075] -->/usr/sbin/glusterfs(cleanup_and_exit+0x71) [0x7faf801a5ba1] ) 0-: received signum (15), shutting down
(END) 


[root@casino-vm1 ~]# gluster v info great
 
Volume Name: great
Type: Distribute
Volume ID: 2d3b4766-c99d-4113-8f0b-43116aeb931f
Status: Started
Number of Bricks: 8
Transport-type: tcp
Bricks:
Brick1: 10.70.35.57:/rhs/brick2/g0
Brick2: 10.70.35.136:/rhs/brick2/g0
Brick3: 10.70.35.57:/rhs/brick1/g0
Brick4: 10.70.35.136:/rhs/brick1/g0
Brick5: 10.70.35.57:/rhs/brick3/g0
Brick6: 10.70.35.136:/rhs/brick3/g0
Brick7: 10.70.35.57:/rhs/brick4/g0
Brick8: 10.70.35.136:/rhs/brick4/g0
Options Reconfigured:
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.min-free-disk: 10
performance.readdir-ahead: on
[root@casino-vm1 ~]# gluster v status great
Another transaction is in progress for great. Please try again after sometime.
[root@casino-vm1 ~]# 

[root@casino-vm1 ~]# gluster v add-brick great 10.70.35.57:/rhs/brick5/g0 10.70.35.136:/rhs/brick5/g0
volume add-brick: failed: Another transaction is in progress for great. Please try again after sometime.
[root@casino-vm1 ~]# 
[root@casino-vm1 ~]# 


Glusterd message:

[2015-07-19 10:03:40.013604] E [socket.c:2332:socket_connect_finish] 0-management: connection to 10.70.35.136:24007 failed (Connection refused)
The message "I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <10.70.35.136> (<13bf706c-824a-4857-bad6-d928b5d11c4e>), in state <Peer in Cluster>, has disconnected from glusterd." repeated 22 times between [2015-07-19 10:03:28.629581] and [2015-07-19 10:04:42.111938]
[2015-07-19 10:04:48.315408] I [MSGID: 106163] [glusterd-handshake.c:1198:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30703
[2015-07-19 10:04:48.324162] I [MSGID: 106490] [glusterd-handler.c:2530:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 13bf706c-824a-4857-bad6-d928b5d11c4e
[2015-07-19 10:04:48.328611] I [MSGID: 106493] [glusterd-handler.c:3771:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.35.136 (0), ret: 0
[2015-07-19 10:04:48.403831] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 13bf706c-824a-4857-bad6-d928b5d11c4e
[2015-07-19 10:04:48.403881] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2015-07-19 10:04:48.403994] I [MSGID: 106493] [glusterd-rpc-ops.c:692:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 13bf706c-824a-4857-bad6-d928b5d11c4e
[2015-07-19 10:04:55.409542] I [MSGID: 106493] [glusterd-rpc-ops.c:478:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 13bf706c-824a-4857-bad6-d928b5d11c4e, host: 10.70.35.136, port: 0
[2015-07-19 10:04:55.414727] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 13bf706c-824a-4857-bad6-d928b5d11c4e
[2015-07-19 10:04:55.414776] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2015-07-19 10:04:55.425708] I [MSGID: 106493] [glusterd-rpc-ops.c:692:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 13bf706c-824a-4857-bad6-d928b5d11c4e
[2015-07-19 10:05:07.140768] W [glusterd-locks.c:575:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f6736153580] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1d4)[0x7f672abf5d04] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_op_txn_begin+0x51a)[0x7f672ab5792a] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_defrag_volume+0x27e)[0x7f672abbb47e] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f672ab3e81f] ))))) 0-management: Lock for great held by abbb8155-7e01-4fea-983e-2e2c929ebd7c
[2015-07-19 10:05:07.140826] E [MSGID: 106119] [glusterd-handler.c:719:glusterd_op_txn_begin] 0-management: Unable to acquire lock for great
[2015-07-19 10:07:46.796666] W [glusterd-locks.c:575:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f6736153580] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1d4)[0x7f672abf5d04] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_op_txn_begin+0x51a)[0x7f672ab5792a] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_defrag_volume+0x27e)[0x7f672abbb47e] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f672ab3e81f] ))))) 0-management: Lock for great held by abbb8155-7e01-4fea-983e-2e2c929ebd7c
[2015-07-19 10:07:46.796699] E [MSGID: 106119] [glusterd-handler.c:719:glusterd_op_txn_begin] 0-management: Unable to acquire lock for great
(END)

Comment 2 Triveni Rao 2015-07-19 17:43:31 UTC

[root@casino-vm1 ~]# rpm -qa | grep gluster
gluster-nagios-addons-0.2.3-1.el6rhs.x86_64
glusterfs-api-3.7.1-10.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-10.el6rhs.x86_64
gluster-nagios-common-0.2.0-1.el6rhs.noarch
glusterfs-libs-3.7.1-10.el6rhs.x86_64
glusterfs-client-xlators-3.7.1-10.el6rhs.x86_64
glusterfs-fuse-3.7.1-10.el6rhs.x86_64
glusterfs-server-3.7.1-10.el6rhs.x86_64
glusterfs-rdma-3.7.1-10.el6rhs.x86_64
vdsm-gluster-4.16.20-1.1.el6rhs.noarch
glusterfs-3.7.1-10.el6rhs.x86_64
glusterfs-cli-3.7.1-10.el6rhs.x86_64
[root@casino-vm1 ~]# 


sosreport uploaded

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1244527/sosreport-casino-vm1.lab.eng.blr.redhat.com.003-20150719063653.tar

Comment 3 Triveni Rao 2015-07-19 17:49:25 UTC

replication options
[2015-07-19 10:35:55.178461] E [MSGID: 106119] [glusterd-syncop.c:1819:gd_sync_task_begin] 0-management: Unable to acquire lock for great
The message "I [MSGID: 106488] [glusterd-handler.c:1463:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 2 times between [2015-07-19 10:36:55.325248] and [2015-07-19 10:36:55.328233]
[2015-07-19 10:37:45.643073] I [MSGID: 106487] [glusterd-handler.c:1402:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2015-07-19 10:37:45.766638] I [MSGID: 106062] [glusterd-geo-rep.c:309:__glusterd_handle_gsync_set] 0-management: slave not found, while handling geo-replication options
[2015-07-19 10:37:45.767872] W [glusterd-locks.c:575:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f6736153580] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1d4)[0x7f672abf5d04] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x915)[0x7f672abf1ec5] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f672abf1f8b] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_gsync_set+0x16f)[0x7f672abd083f] ))))) 0-management: Lock for great held by abbb8155-7e01-4fea-983e-2e2c929ebd7c
[2015-07-19 10:37:45.767895] E [MSGID: 106119] [glusterd-syncop.c:1819:gd_sync_task_begin] 0-management: Unable to acquire lock for great
[2015-07-19 10:37:46.020277] I [MSGID: 106499] [glusterd-handler.c:4258:__glusterd_handle_status_volume] 0-management: Received status volume req for volume great
[2015-07-19 10:37:46.020553] W [glusterd-locks.c:575:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f6736153580] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1d4)[0x7f672abf5d04] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x915)[0x7f672abf1ec5] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f672abf1f8b] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_status_volume+0x1b2)[0x7f672ab3dd42] ))))) 0-management: Lock for great held by abbb8155-7e01-4fea-983e-2e2c929ebd7c
[2015-07-19 10:37:46.022241] I [MSGID: 106499] [glusterd-handler.c:4258:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test
[2015-07-19 10:37:45.881620] I [MSGID: 106062] [glusterd-geo-rep.c:309:__glusterd_handle_gsync_set] 0-management: slave not found, while handling geo-replication options
[2015-07-19 10:37:46.020575] E [MSGID: 106119] [glusterd-syncop.c:1819:gd_sync_task_begin] 0-management: Unable to acquire lock for great
[2015-07-19 10:48:38.245354] W [glusterd-locks.c:575:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f6736153580] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1d4)[0x7f672abf5d04] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_op_txn_begin+0x51a)[0x7f672ab5792a] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_handle_defrag_volume+0x27e)[0x7f672abbb47e] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f672ab3e81f] ))))) 0-management: Lock for great held by abbb8155-7e01-4fea-983e-2e2c929ebd7c
[2015-07-19 10:48:38.245561] E [MSGID: 106119] [glusterd-handler.c:719:glusterd_op_txn_begin] 0-management: Unable to acquire lock for great
(END)

Comment 5 Anand Nekkunti 2015-07-21 06:34:30 UTC

RCA: when one of the glusterd went down during rebalance start then call back function (_glusterd_commit_op_cbk ) called with rpc_status is -1, In case of rpc success we are getting txn_id from response but in failure case of rpc, we are referring global_txn_id which is always Zero, this resulting op_sm into inconsistent state.

Comment 6 Anand Nekkunti 2015-07-21 12:40:15 UTC

upstream patch: http://review.gluster.org/#/c/11728/

Comment 11 Atin Mukherjee 2015-08-13 12:25:41 UTC

Downstream patch https://code.engineering.redhat.com/gerrit/#/c/55080/ is been merged now, moving the state to 'Modified'.

Comment 12 Byreddy 2015-08-26 13:16:58 UTC


Veriified this bug with the version "glusterfs-3.7.1-12"

Issue verified steps:
=====================
    1. Created Distributed volume using two node cluster.
    2. Mounted the volume as FUSE
    3. Created some files in the mnt point.
    4. Started rebalance on one node and killed glusterD on other node.
    5. rebalance started successfully and after that able to issue other volume related cmds.


Fix is working good, Moving this bug to verified state.

Comment 13 Divya 2015-09-23 07:15:03 UTC

Please review and sign-off the edited doc text.

Comment 14 Anand Nekkunti 2015-09-23 09:58:31 UTC

Divya 
      glusterd was not hanging always, we might end up with glusterd hung. 

I would linke to changes as  
Previously, the "gluster vol rebalance <vol_name> start" command might be hung if any nodes in a cluster go down simultaneously. With this fix, this issue is resolved.

Comment 16 errata-xmlrpc 2015-10-05 07:12:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html