Bug 1303325 - Glusterd not releasing cli lock for RHGS 3.0.4 and op-version 2
Glusterd not releasing cli lock for RHGS 3.0.4 and op-version 2
Status: CLOSED NOTABUG
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core (Show other bugs)
3.0
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Gaurav Kumar Garg
Anoop
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-30 14:54 EST by Chris Blum
Modified: 2016-09-17 10:36 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-01 04:21:05 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Chris Blum 2016-01-30 14:54:38 EST
Description of problem:
It seems like if RHGS 3.0.4 is run with op-version 2 it is easily possible to cause a situation where the cli lock is not released.
Therefore no additional cli commands can be executed on the storage pool until the glusterd process that allegedly has the lock is restarted 

Version-Release number of selected component (if applicable):
RHGS 3.0.4
gluster --version is 3.6.0.53 (includes customer specific hotfixes)

How reproducible:
always

Steps to Reproduce:
1) Execute 'gluster vol status' on all nodes in parallel
  --> All commands will fail
2) Lock will be held by one glusterd process, but even this glusterd process will be unable to execute any cli commands

Actual results:
Lock is allegedly held by one glusterd process, but this one doesn't know about it himself

Expected results:
Lock should be released

Additional info:

glusterd log file with debug logging enabled:
[2016-01-30 18:54:34.475812] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: b3e6c376-c288-4c5e-88e1-7698a2d87e7b
[2016-01-30 18:54:34.478629] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.482365] I [MSGID: 106490] [glusterd-handler.c:2530:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.482699] I [MSGID: 106493] [glusterd-handler.c:3771:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to rhgs2 (0), ret: 0
[2016-01-30 18:54:34.489565] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.489634] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.496387] I [MSGID: 106493] [glusterd-rpc-ops.c:692:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f00f9da2-14ce-43da-b55f-16c0d1efc594
[2016-01-30 18:54:34.496541] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.496583] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.503255] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.503315] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.509304] I [MSGID: 106493] [glusterd-rpc-ops.c:692:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: b3e6c376-c288-4c5e-88e1-7698a2d87e7b
[2016-01-30 18:54:34.526280] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.526507] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.534227] I [MSGID: 106493] [glusterd-rpc-ops.c:478:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: e0240641-8246-41a8-818d-96dcd0b33684, host: rhgs2, port: 0
[2016-01-30 18:54:34.537575] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.537636] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.544401] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f00f9da2-14ce-43da-b55f-16c0d1efc594
[2016-01-30 18:54:34.547102] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.550798] I [MSGID: 106493] [glusterd-rpc-ops.c:692:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f00f9da2-14ce-43da-b55f-16c0d1efc594
[2016-01-30 18:54:34.551713] I [MSGID: 106493] [glusterd-rpc-ops.c:692:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.570773] I [MSGID: 106493] [glusterd-rpc-ops.c:692:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: b3e6c376-c288-4c5e-88e1-7698a2d87e7b
[2016-01-30 18:54:34.584747] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e0240641-8246-41a8-818d-96dcd0b33684
[2016-01-30 18:54:34.584858] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:54:34.601728] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: b3e6c376-c288-4c5e-88e1-7698a2d87e7b
[2016-01-30 18:54:34.604335] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2016-01-30 18:56:38.066038] E [MSGID: 106155] [glusterd-utils.c:165:glusterd_lock] 0-management: Unable to get lock for uuid: 04dce505-1c50-41e9-8a64-e4eff53796ee, lock held by: 04dce505-1c50-41e9-8a64-e4eff53796ee
[2016-01-30 18:56:38.066062] E [MSGID: 106155] [glusterd-syncop.c:1783:gd_sync_task_begin] 0-management: Unable to acquire lock
[2016-01-30 18:58:02.134248] E [MSGID: 106155] [glusterd-utils.c:165:glusterd_lock] 0-management: Unable to get lock for uuid: 04dce505-1c50-41e9-8a64-e4eff53796ee, lock held by: 04dce505-1c50-41e9-8a64-e4eff53796ee
[2016-01-30 18:58:02.134250] E [MSGID: 106155] [glusterd-syncop.c:1783:gd_sync_task_begin] 0-management: Unable to acquire lock
[2016-01-30 19:04:45.571516] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(4)) xid = 0xb sent = 2016-01-30 18:54:37.976364. timeout = 600 for 192.168.37.200:24007
[2016-01-30 19:04:45.571627] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on RHGS1. Please check log file for details.
[2016-01-30 19:14:46.634224] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(2)) xid = 0xc sent = 2016-01-30 19:04:45.571845. timeout = 600 for 192.168.37.102:24007
[2016-01-30 19:14:46.634304] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Unlocking failed on rhgs4. Please check log file for details.
[2016-01-30 19:14:46.634360] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(2)) xid = 0xc sent = 2016-01-30 19:04:45.571878. timeout = 600 for 192.168.37.200:24007
[2016-01-30 19:14:46.634387] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Unlocking failed on RHGS1. Please check log file for details.
[2016-01-30 19:14:46.634423] E [MSGID: 106152] [glusterd-syncop.c:1562:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
[2016-01-30 19:14:46.634547] E [glusterd-op-sm.c:349:glusterd_clear_txn_opinfo] (-->/usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b) [0x7fe5e696da5b] -->/usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x18a) [0x7fe5e696d20a] -->/usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_clear_txn_opinfo+0xd7) [0x7fe5e68dfab7] ) 0-management: Unable to get transaction opinfo for transaction ID : 00000000-0000-0000-0000-000000000000
[2016-01-30 19:14:46.634566] E [MSGID: 106154] [glusterd-syncop.c:1908:gd_sync_task_begin] 0-management: Unable to clear transaction's opinfo for transaction ID : 00000000-0000-0000-0000-000000000000
[2016-01-30 19:14:46.634605] I [socket.c:3436:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-01-30 19:14:46.634615] E [rpcsvc.c:1312:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2016-01-30 19:14:46.634630] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2016-01-30 19:15:17.385249] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Locking failed on rhgs4. Please check log file for details.
[2016-01-30 19:15:17.385607] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Locking failed on RHGS1. Please check log file for details.
[2016-01-30 19:15:17.385984] E [MSGID: 106151] [glusterd-syncop.c:1858:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2016-01-30 19:15:27.267040] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Locking failed on rhgs4. Please check log file for details.
[2016-01-30 19:15:27.267212] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Locking failed on RHGS1. Please check log file for details.
[2016-01-30 19:15:27.267840] E [MSGID: 106151] [glusterd-syncop.c:1858:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2016-01-30 19:15:43.484237] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Locking failed on rhgs4. Please check log file for details.
[2016-01-30 19:15:43.484334] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Locking failed on RHGS1. Please check log file for details.
[2016-01-30 19:15:43.484398] E [MSGID: 106151] [glusterd-syncop.c:1858:gd_sync_task_begin] 0-management: Locking Peers Failed.
Comment 1 Gaurav Kumar Garg 2016-01-31 01:50:41 EST
In RHS 2.* release we was having cluster-wide lock means one transaction can issue at one time. But in RHGS-3* release we have introduced Volume-wide lock you can see https://bugzilla.redhat.com/show_bug.cgi?id=1011470 bug and commit id:  97ce783de326b51fcba65737f07db2c314d1e218  in RHGS 3* downstream and http://review.gluster.org/#/c/5994/ patch in upstream.

Will update further.
Comment 3 Chris Blum 2016-01-31 18:30:49 EST
we could avoid further locking right now by trying to avoid any parallel CLI command executions. This has worked now for ~22h.
Comment 4 Gaurav Kumar Garg 2016-02-01 04:21:05 EST
This is a expected behavior. I don't think its a bug. Closing this bug. Feel free to reopen this bug if anybody have issue.

Note You need to log in before you can comment on or make changes to this bug.