Bug 1442983
Summary: | Unable to acquire lock for gluster volume leading to 'another transaction in progress' error | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | ||||||
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | ||||||
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | rhgs-3.2 | CC: | amukherj, bkunal, ccalhoun, rhinduja, rhs-bugs, sasundar, sheggodu, storage-qa-internal, timo.kramer_ext, vbellur | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | RHGS 3.4.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.12.2-1 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: |
Cause:
TBD
Consequence:
Workaround (if any):
Result:
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1526372 (view as bug list) | Environment: | |||||||
Last Closed: | 2018-09-04 06:32:03 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1503134, 1526372 | ||||||||
Attachments: |
|
Description
SATHEESARAN
2017-04-18 08:44:47 UTC
This is the exact error message in glusterd logs <snip> [2017-04-15 22:14:52.186079] W [glusterd-locks.c:572:glusterd_mgmt_v3_lock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xcfb30) [0x7ff4ccf23b30] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xcfa60) [0x7ff4ccf23a60] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd4d6f) [0x7ff4ccf28d6f] ) 0-management: Lock for data held by e45f76c0-89e4-4601-bb41-ba3110a15681 [2017-04-15 22:14:52.186111] E [MSGID: 106119] [glusterd-syncop.c:1851:gd_sync_task_begin] 0-management: Unable to acquire lock for data </snip> The volume name is 'data' and its of type 'replica' Created attachment 1272257 [details]
gluster logs from one of the node
Created attachment 1272258 [details]
glusterd statedump from one of the node
glusterd.mgmt_v3_lock= debug.last-success-bt-data-vol:(--> /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd496c)[0x7ff4ccf2896c] (--> /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x2e195)[0x7ff4cce82195] (--> /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x3cd1f)[0x7ff4cce90d1f] (--> /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xf801d)[0x7ff4ccf4c01d] (--> /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x20540)[0x7ff4cce74540] ))))) data_vol:e45f76c0-89e4-4601-bb41-ba3110a15681 stale lock is on volume "data" From the backtrace of the lock: (gdb) info symbol 0x7ff4ccf2896c glusterd_mgmt_v3_lock + 492 in section .text of /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so (gdb) info symbol 0x7ff4cce82195 glusterd_op_ac_lock + 149 in section .text of /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so (gdb) info symbol 0x7ff4cce90d1f glusterd_op_sm + 671 in section .text of /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so (gdb) info symbol 0x7ff4ccf4c01d glusterd_handle_mgmt_v3_lock_fn + 1245 in section .text of /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so (gdb) info symbol 0x7ff4cce74540 glusterd_big_locked_handler + 48 in section .text of /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so so gluster volume profile and gluster volume status consecutive transactions collided on one node resulting into two op-sm transactions running into the same state machine where we can end up into a stale lock. It is explained in detail at https://bugzilla.redhat.com/show_bug.cgi?id=1425681#c4 . The only way to fix this is to port volume profile command into mgmt_v3. But is it worth of an effort at this stage with GD2 is under active development is what we'd need to assess. I have tried the workaround suggested by Atin and the stale lock was released. 1. Reset server quorum on all the volumes # gluster volume set <vol> server-quorum-type none 2. Restarted glusterd on all the nodes using gdeploy [hosts] host1 host2 host3 host4 host5 host6 [service] action=restart service=glusterd Note: Restarting glusterd on all the nodes is required. 3. Set server-quorum on all the volumes # gluster volume set <vol> server-quorum-type server @atin, I have a case, 01874385, which seems to be presenting with very similar errors. glusterd.log: [2017-06-28 20:56:22.549828] E [MSGID: 106119] [glusterd-syncop.c:1851:gd_sync_task_begin] 0-management: Unable to acquire lock for ACL_VEEAM_BCK_VOL1 and associated: cmd_history.log: [2017-06-28 20:56:22.549842] : volume status all tasks : FAILED : Another transaction is in progress for ACL_VEEAM_BCK_VOL1. Please try again after sometime. These occur on almost exactly a 1:1 ratio. Can I get an opinion about this being the same issue? What additional information can I provide to help make that determination? Get me the cmd_history & glusterd log from all the nodes along with the glusterd statedump taken on the node where the locking has failed. @Atin, I've requested new logs and the statedump info from the customer. I will attach them when they come in. Hi, I do have a similar problem with glusterfs 3.4.5 on redhat 6. If I shall provide some logs, please tell me. gluster volume status all Another transaction could be in progress. Please try again after sometime. [2017-08-08 11:35:57.139221] E [glusterd-utils.c:332:glusterd_lock] 0-management: Unable to get lock for uuid: a813ad42-bf64-4b3b-ae24-59883671a8e8, lock held by: a813ad42-bf64-4b3b-ae24-59883671a8e8 [2017-08-08 11:35:57.139272] E [glusterd-op-sm.c:5445:glusterd_op_sm] 0-management: handler returned: -1 [2017-08-08 11:35:57.139920] E [glusterd-syncop.c:715:gd_lock_op_phase] 0-management: Failed to acquire lock [2017-08-08 11:35:57.140762] E [glusterd-utils.c:365:glusterd_unlock] 0-management: Cluster lock not held! On all the servers in the cluster I had the server itself in the peers-file. this was the problem in my system. simple mistake...took me quite long to figure out. Tested with RHV 4.2 and glusterfs-3.12. 1. Added RHGS nodes to the cluster. 2. Repeated gluster volume status are queries There are no 'Another transaction in progress' errors Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |