Bug 841764 - 3.1 - NFS datacenter cannot be created due to lock acquire failure
3.1 - NFS datacenter cannot be created due to lock acquire failure
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.3
Unspecified Unspecified
unspecified Severity urgent
: rc
: ---
Assigned To: Federico Simoncelli
Haim
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-20 04:21 EDT by Tomas Dosek
Modified: 2014-01-12 19:53 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-20 16:13:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
VDSM log (6.05 MB, text/x-log)
2012-07-20 04:21 EDT, Tomas Dosek
no flags Details

  None (edit)
Description Tomas Dosek 2012-07-20 04:21:12 EDT
Created attachment 599325 [details]
VDSM log

Description of problem:
NFS datacenter cannot be created due to lock acquire failure
(I'm not sure if this is vdsm or sanlock issue)

Version-Release number of selected component (if applicable):
RHEVM - si11
vdsm-4.9.6-23.0.el6_3.x86_64
sanlock-2.3-2.1.el6.x86_64
libvirt-0.9.10-21.el6.x86_64

How reproducible:
100 %

Steps to Reproduce:
1. Install rhev-m
2. Try to initialize new NFS datacenter
3.
  
Actual results:
Fails with a lot of vdsBroker messages

Expected results:
Should work fine

Additional info:
Log from rhevm backend:
2012-07-20 10:13:43,599 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-50) [4831e5ad] START, CreateStoragePoolVDSCommand(vdsId = e0f017ea-d23a-11e1-8a40-001a4a013f58, storagePoolId=494d549e-d239-11e1-bba2-001a4a013f58, storageType=NFS, storagePoolName=Default, masterDomainId=ae494c12-678d-4196-a5fc-5ded24388484, domainsIdList=[ae494c12-678d-4196-a5fc-5ded24388484], masterVersion=12), log id: 767f4fcd
2012-07-20 10:14:07,677 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Failed in CreateStoragePoolVDS method
2012-07-20 10:14:07,677 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Error code AcquireLockFailure and error message VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"
2012-07-20 10:14:07,679 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Command org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         651
mMessage                      Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"


2012-07-20 10:14:07,681 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Vds: 10.34.63.217
2012-07-20 10:14:07,682 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-50) [4831e5ad] Command CreateStoragePoolVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"
2012-07-20 10:14:07,683 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-50) [4831e5ad] FINISH, CreateStoragePoolVDSCommand, log id: 767f4fcd
2012-07-20 10:14:07,685 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-50) [4831e5ad] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand.
2012-07-20 10:14:07,685 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-3-thread-50) [4831e5ad] transaction rolled back
2012-07-20 10:14:07,686 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"
2012-07-20 10:14:07,694 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_pool; snapshot: id=494d549e-d239-11e1-bba2-001a4a013f58.
2012-07-20 10:14:07,698 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.storage_pool_iso_map; snapshot: storagePoolId = 494d549e-d239-11e1-bba2-001a4a013f58, storageId = ae494c12-678d-4196-a5fc-5ded24388484.
2012-07-20 10:14:07,707 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_domain_static; snapshot: id=ae494c12-678d-4196-a5fc-5ded24388484.
2012-07-20 10:14:07,711 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand.
2012-07-20 10:14:07,732 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Lock freed to object EngineLock [exclusiveLocks= key: 494d549e-d239-11e1-bba2-001a4a013f58 value: POOL
, sharedLocks= ]
Comment 2 Federico Simoncelli 2012-07-20 10:31:29 EDT
Adding the relevant part from sanlock log:

2012-07-20 10:11:20+0200 46 [7957]: sanlock daemon started 2.3 aio 1 10 renew 20 80 host 26d992df-51e2-4652-890c-a6d97e2c866d.srh-12.rhe time 1342771880
2012-07-20 10:12:08+0200 94 [7962]: s1 lockspace ae494c12-678d-4196-a5fc-5ded24388484:250:/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_20_nfs03/ae494c12-678d-4196-a5fc-5ded24388484/dom_md/ids:0
2012-07-20 10:12:29+0200 115 [7964]: s1:r1 resource ae494c12-678d-4196-a5fc-5ded24388484:SDM:/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_20_nfs03/ae494c12-678d-4196-a5fc-5ded24388484/dom_md/leases:1048576 for 3,14,8419
2012-07-20 10:12:29+0200 115 [7964]: worker1 aio collect 0x7f01e80008c0:0x7f01e80008d0:0x7f01f41f9000 result 0:0 match len 1048576
2012-07-20 10:12:29+0200 115 [7964]: r1 cmd_acquire 3,14,8419 acquire_token -90
Comment 3 Federico Simoncelli 2012-07-20 13:38:49 EDT
The problem is that the domain ae494c12-678d-4196-a5fc-5ded24388484 was created with a quite old vdsm release, vdsm-4.9-16.0:

MainThread::INFO::2012-07-20 09:19:28,220::vdsm::70::vds::(run) I am the actual vdsm 4.9-16.0
[...]
Thread-150::INFO::2012-07-20 09:25:15,196::logUtils::37::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=1, sdUUID='ae494c12-678d-4196-a5fc-5ded24388484', domainName='nfs03-2', typeSpecificArg='10.34.63.204:/mnt/export/nfs/20/nfs03', domClass=1, domVersion='3', options=None)

In that version of VDSM we used the offset 0 for the SPM resource in the leases file. We recently changed that to an offset of 1M in order to make safelease and sanlock live together for the short time of the live upgrade.

That said the sanlock aio was failing because it was trying to read beyond the end of the leases file.

I also saw that when you recreated the domain with the new vdsm version everything worked as expected.

I'd close this as NOTABUG (or eventually CURRENTRELEASE).
Comment 4 Ayal Baron 2012-07-20 16:13:51 EDT
considering comment #7 and seeing as vdsm-4.9.16.0 was never released to customers I'm closing.

Note You need to log in before you can comment on or make changes to this bug.