Bug 981661 - quota + core: Another transaction is in progress. Please try again after sometime.
quota + core: Another transaction is in progress. Please try again after some...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Krutika Dhananjay
SATHEESARAN
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-05 08:30 EDT by Saurabh
Modified: 2016-01-19 01:12 EST (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.12rhs.beta6-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:24:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Saurabh 2013-07-05 08:30:38 EDT
Description of problem:

after the BZ 981653

I am finding that "gluster volume status"
fails on other nodes of the cluster also


Version-Release number of selected component (if applicable):
[root@quota2 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta2-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta2-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta2-1.el6rhs.x86_64

How reproducible:
found after BZ 981653

Steps to Reproduce:
after BZ 981653
execute gluster volume status on any of the nodes of the cluster.

Actual results:
[root@quota2 ~]# gluster volume status
Another transaction is in progress. Please try again after sometime.
 
glusterd logs say,
2013-07-05 05:13:59.931692] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:14:02.238117] E [socket.c:2158:socket_connect_finish] 0-management: connection to 10.70.37.98:24007 failed (Connection refused)
[2013-07-05 05:22:13.211163] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:22:13.211243] E [glusterd-syncop.c:1128:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-07-05 05:22:13.211373] E [glusterd-utils.c:375:glusterd_unlock] 0-management: Cluster lock held by 236e161a-fc82-4964-8e6d-bb0d9160990d ,unlock req from cc7bc8ba-fa3a-43d9-a899-114e34d27eb4!
[2013-07-05 05:22:13.211404] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:31:08.545951] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:31:08.546016] E [glusterd-syncop.c:1128:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-07-05 05:31:08.546121] E [glusterd-utils.c:375:glusterd_unlock] 0-management: Cluster lock held by 236e161a-fc82-4964-8e6d-bb0d9160990d ,unlock req from cc7bc8ba-fa3a-43d9-a899-114e34d27eb4!
[2013-07-05 05:31:08.546142] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:31:13.491554] I [glusterd-handler.c:966:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2013-07-05 05:38:01.968142] I [glusterd-handler.c:966:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2013-07-05 05:38:02.187306] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:38:02.187355] E [glusterd-syncop.c:1128:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-07-05 05:38:02.187453] E [glusterd-utils.c:375:glusterd_unlock] 0-management: Cluster lock held by 236e161a-fc82-4964-8e6d-bb0d9160990d ,unlock req from cc7bc8ba-fa3a-43d9-a899-114e34d27eb4!
[2013-07-05 05:38:02.187490] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
(END) 


Expected results:
if node crashed because of some reason, some other node should provide the information without fail. other wise whole cluster becomes unuseful without some "workaround"

Additional info:
Comment 4 Krutika Dhananjay 2013-07-17 04:55:58 EDT
https://code.engineering.redhat.com/gerrit/#/c/10364/ <-- Posted for review.

PROBLEM:

When the originator of a volume transaction goes down while it is still
the owning the lock, volume ops issued from the other nodes also fail
with the message that the lock is still held by the node that went down.

FIX:

Upon receiving DISCONNECT from the originator of a transaction, on the rest
of the nodes, perform the following actions:

a. Release the lock; and
b. reset the state of the node to GD_OP_STATE_DEFAULT.


Note:
This bug is not confined to 'volume quota' command. This state may be reached for any volume command when the originator goes down while in possession of the lock.
Comment 5 Krutika Dhananjay 2013-07-19 01:15:47 EDT
The change has been merged in downstream. Hence moving the state of the bug to MODIFIED.
Comment 7 Krutika Dhananjay 2013-08-02 05:28:32 EDT
https://code.engineering.redhat.com/gerrit/#/c/10364/ <-- Same as in comment #4
Comment 8 SATHEESARAN 2013-08-08 07:04:33 EDT
Tested this with glusterfs-3.4.0.17rhs-1

Steps
=====
1. Created a trusted storage pool of 3 nodes
2. Created a replica volume with 2 bricks ( 1 brick in node1 and another in node2 )
3. Start the volume
4. Abruptly powered down node1
5. Issue "gluster volume heal <vol-name>" from node2
6. 'heal' command waits [BZ 866758] for frame-timeout which is 600 secs
7. Issue, gluster volume status from the node3.
You will get the error as follows : 

[Thu Aug  8 10:50:50 UTC 2013 root@10.70.37.61:~ ] # gluster volume status
Another transaction is in progress. Please try again after sometime.

NOTE: above command is executed in node3, which doesn't actually have bricks in it

8. Abruptly power down, node2 also.
9. Check for "gluster volume status"

"gluster volume status" succeeded and thus moving it to VERIFIED state
Comment 9 SATHEESARAN 2013-08-08 07:32:19 EDT
Correction with #comment8,

Verified with glusterfs-3.4.0.18rhs-1
Comment 10 Scott Haines 2013-09-23 18:24:55 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.