981661 – quota + core: Another transaction is in progress. Please try again after sometime.

Bug 981661 - quota + core: Another transaction is in progress. Please try again after sometime.

Summary: quota + core: Another transaction is in progress. Please try again after some...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Krutika Dhananjay
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-05 12:30 UTC by Saurabh
Modified:	2016-01-19 06:12 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.4.0.12rhs.beta6-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:24:55 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Saurabh 2013-07-05 12:30:38 UTC

Description of problem:

after the BZ 981653

I am finding that "gluster volume status"
fails on other nodes of the cluster also


Version-Release number of selected component (if applicable):
[root@quota2 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta2-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta2-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta2-1.el6rhs.x86_64

How reproducible:
found after BZ 981653

Steps to Reproduce:
after BZ 981653
execute gluster volume status on any of the nodes of the cluster.

Actual results:
[root@quota2 ~]# gluster volume status
Another transaction is in progress. Please try again after sometime.
 
glusterd logs say,
2013-07-05 05:13:59.931692] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:14:02.238117] E [socket.c:2158:socket_connect_finish] 0-management: connection to 10.70.37.98:24007 failed (Connection refused)
[2013-07-05 05:22:13.211163] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:22:13.211243] E [glusterd-syncop.c:1128:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-07-05 05:22:13.211373] E [glusterd-utils.c:375:glusterd_unlock] 0-management: Cluster lock held by 236e161a-fc82-4964-8e6d-bb0d9160990d ,unlock req from cc7bc8ba-fa3a-43d9-a899-114e34d27eb4!
[2013-07-05 05:22:13.211404] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:31:08.545951] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:31:08.546016] E [glusterd-syncop.c:1128:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-07-05 05:31:08.546121] E [glusterd-utils.c:375:glusterd_unlock] 0-management: Cluster lock held by 236e161a-fc82-4964-8e6d-bb0d9160990d ,unlock req from cc7bc8ba-fa3a-43d9-a899-114e34d27eb4!
[2013-07-05 05:31:08.546142] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:31:13.491554] I [glusterd-handler.c:966:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2013-07-05 05:38:01.968142] I [glusterd-handler.c:966:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2013-07-05 05:38:02.187306] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
[2013-07-05 05:38:02.187355] E [glusterd-syncop.c:1128:gd_sync_task_begin] 0-management: Unable to acquire lock
[2013-07-05 05:38:02.187453] E [glusterd-utils.c:375:glusterd_unlock] 0-management: Cluster lock held by 236e161a-fc82-4964-8e6d-bb0d9160990d ,unlock req from cc7bc8ba-fa3a-43d9-a899-114e34d27eb4!
[2013-07-05 05:38:02.187490] E [glusterd-utils.c:333:glusterd_lock] 0-management: Unable to get lock for uuid: cc7bc8ba-fa3a-43d9-a899-114e34d27eb4, lock held by: 236e161a-fc82-4964-8e6d-bb0d9160990d
(END) 


Expected results:
if node crashed because of some reason, some other node should provide the information without fail. other wise whole cluster becomes unuseful without some "workaround"

Additional info:

Comment 4 Krutika Dhananjay 2013-07-17 08:55:58 UTC

https://code.engineering.redhat.com/gerrit/#/c/10364/ <-- Posted for review.

PROBLEM:

When the originator of a volume transaction goes down while it is still
the owning the lock, volume ops issued from the other nodes also fail
with the message that the lock is still held by the node that went down.

FIX:

Upon receiving DISCONNECT from the originator of a transaction, on the rest
of the nodes, perform the following actions:

a. Release the lock; and
b. reset the state of the node to GD_OP_STATE_DEFAULT.


Note:
This bug is not confined to 'volume quota' command. This state may be reached for any volume command when the originator goes down while in possession of the lock.

Comment 5 Krutika Dhananjay 2013-07-19 05:15:47 UTC

The change has been merged in downstream. Hence moving the state of the bug to MODIFIED.

Comment 7 Krutika Dhananjay 2013-08-02 09:28:32 UTC

https://code.engineering.redhat.com/gerrit/#/c/10364/ <-- Same as in comment #4

Comment 8 SATHEESARAN 2013-08-08 11:04:33 UTC

Tested this with glusterfs-3.4.0.17rhs-1

Steps
=====
1. Created a trusted storage pool of 3 nodes
2. Created a replica volume with 2 bricks ( 1 brick in node1 and another in node2 )
3. Start the volume
4. Abruptly powered down node1
5. Issue "gluster volume heal <vol-name>" from node2
6. 'heal' command waits [BZ 866758] for frame-timeout which is 600 secs
7. Issue, gluster volume status from the node3.
You will get the error as follows : 

[Thu Aug  8 10:50:50 UTC 2013 root.37.61:~ ] # gluster volume status
Another transaction is in progress. Please try again after sometime.

NOTE: above command is executed in node3, which doesn't actually have bricks in it

8. Abruptly power down, node2 also.
9. Check for "gluster volume status"

"gluster volume status" succeeded and thus moving it to VERIFIED state

Comment 9 SATHEESARAN 2013-08-08 11:32:19 UTC

Correction with #comment8,

Verified with glusterfs-3.4.0.18rhs-1

Comment 10 Scott Haines 2013-09-23 22:24:55 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.