Bug 841764
Summary: | 3.1 - NFS datacenter cannot be created due to lock acquire failure | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Tomas Dosek <tdosek> | ||||
Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Haim <hateya> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3 | CC: | abaron, bazulay, iheim, lpeer, yeylon, ykaul | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-07-20 20:13:51 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Adding the relevant part from sanlock log: 2012-07-20 10:11:20+0200 46 [7957]: sanlock daemon started 2.3 aio 1 10 renew 20 80 host 26d992df-51e2-4652-890c-a6d97e2c866d.srh-12.rhe time 1342771880 2012-07-20 10:12:08+0200 94 [7962]: s1 lockspace ae494c12-678d-4196-a5fc-5ded24388484:250:/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_20_nfs03/ae494c12-678d-4196-a5fc-5ded24388484/dom_md/ids:0 2012-07-20 10:12:29+0200 115 [7964]: s1:r1 resource ae494c12-678d-4196-a5fc-5ded24388484:SDM:/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_20_nfs03/ae494c12-678d-4196-a5fc-5ded24388484/dom_md/leases:1048576 for 3,14,8419 2012-07-20 10:12:29+0200 115 [7964]: worker1 aio collect 0x7f01e80008c0:0x7f01e80008d0:0x7f01f41f9000 result 0:0 match len 1048576 2012-07-20 10:12:29+0200 115 [7964]: r1 cmd_acquire 3,14,8419 acquire_token -90 The problem is that the domain ae494c12-678d-4196-a5fc-5ded24388484 was created with a quite old vdsm release, vdsm-4.9-16.0: MainThread::INFO::2012-07-20 09:19:28,220::vdsm::70::vds::(run) I am the actual vdsm 4.9-16.0 [...] Thread-150::INFO::2012-07-20 09:25:15,196::logUtils::37::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=1, sdUUID='ae494c12-678d-4196-a5fc-5ded24388484', domainName='nfs03-2', typeSpecificArg='10.34.63.204:/mnt/export/nfs/20/nfs03', domClass=1, domVersion='3', options=None) In that version of VDSM we used the offset 0 for the SPM resource in the leases file. We recently changed that to an offset of 1M in order to make safelease and sanlock live together for the short time of the live upgrade. That said the sanlock aio was failing because it was trying to read beyond the end of the leases file. I also saw that when you recreated the domain with the new vdsm version everything worked as expected. I'd close this as NOTABUG (or eventually CURRENTRELEASE). considering comment #7 and seeing as vdsm-4.9.16.0 was never released to customers I'm closing. |
Created attachment 599325 [details] VDSM log Description of problem: NFS datacenter cannot be created due to lock acquire failure (I'm not sure if this is vdsm or sanlock issue) Version-Release number of selected component (if applicable): RHEVM - si11 vdsm-4.9.6-23.0.el6_3.x86_64 sanlock-2.3-2.1.el6.x86_64 libvirt-0.9.10-21.el6.x86_64 How reproducible: 100 % Steps to Reproduce: 1. Install rhev-m 2. Try to initialize new NFS datacenter 3. Actual results: Fails with a lot of vdsBroker messages Expected results: Should work fine Additional info: Log from rhevm backend: 2012-07-20 10:13:43,599 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-50) [4831e5ad] START, CreateStoragePoolVDSCommand(vdsId = e0f017ea-d23a-11e1-8a40-001a4a013f58, storagePoolId=494d549e-d239-11e1-bba2-001a4a013f58, storageType=NFS, storagePoolName=Default, masterDomainId=ae494c12-678d-4196-a5fc-5ded24388484, domainsIdList=[ae494c12-678d-4196-a5fc-5ded24388484], masterVersion=12), log id: 767f4fcd 2012-07-20 10:14:07,677 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Failed in CreateStoragePoolVDS method 2012-07-20 10:14:07,677 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Error code AcquireLockFailure and error message VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')" 2012-07-20 10:14:07,679 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Command org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 651 mMessage Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')" 2012-07-20 10:14:07,681 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Vds: 10.34.63.217 2012-07-20 10:14:07,682 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-50) [4831e5ad] Command CreateStoragePoolVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')" 2012-07-20 10:14:07,683 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-50) [4831e5ad] FINISH, CreateStoragePoolVDSCommand, log id: 767f4fcd 2012-07-20 10:14:07,685 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-50) [4831e5ad] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand. 2012-07-20 10:14:07,685 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-3-thread-50) [4831e5ad] transaction rolled back 2012-07-20 10:14:07,686 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')" 2012-07-20 10:14:07,694 INFO [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_pool; snapshot: id=494d549e-d239-11e1-bba2-001a4a013f58. 2012-07-20 10:14:07,698 INFO [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.storage_pool_iso_map; snapshot: storagePoolId = 494d549e-d239-11e1-bba2-001a4a013f58, storageId = ae494c12-678d-4196-a5fc-5ded24388484. 2012-07-20 10:14:07,707 INFO [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_domain_static; snapshot: id=ae494c12-678d-4196-a5fc-5ded24388484. 2012-07-20 10:14:07,711 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand. 2012-07-20 10:14:07,732 INFO [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Lock freed to object EngineLock [exclusiveLocks= key: 494d549e-d239-11e1-bba2-001a4a013f58 value: POOL , sharedLocks= ]