Bug 841764

Summary: 3.1 - NFS datacenter cannot be created due to lock acquire failure
Product: Red Hat Enterprise Linux 6 Reporter: Tomas Dosek <tdosek>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED NOTABUG QA Contact: Haim <hateya>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.3CC: abaron, bazulay, iheim, lpeer, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-20 20:13:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
VDSM log none

Description Tomas Dosek 2012-07-20 08:21:12 UTC
Created attachment 599325 [details]
VDSM log

Description of problem:
NFS datacenter cannot be created due to lock acquire failure
(I'm not sure if this is vdsm or sanlock issue)

Version-Release number of selected component (if applicable):
RHEVM - si11
vdsm-4.9.6-23.0.el6_3.x86_64
sanlock-2.3-2.1.el6.x86_64
libvirt-0.9.10-21.el6.x86_64

How reproducible:
100 %

Steps to Reproduce:
1. Install rhev-m
2. Try to initialize new NFS datacenter
3.
  
Actual results:
Fails with a lot of vdsBroker messages

Expected results:
Should work fine

Additional info:
Log from rhevm backend:
2012-07-20 10:13:43,599 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-50) [4831e5ad] START, CreateStoragePoolVDSCommand(vdsId = e0f017ea-d23a-11e1-8a40-001a4a013f58, storagePoolId=494d549e-d239-11e1-bba2-001a4a013f58, storageType=NFS, storagePoolName=Default, masterDomainId=ae494c12-678d-4196-a5fc-5ded24388484, domainsIdList=[ae494c12-678d-4196-a5fc-5ded24388484], masterVersion=12), log id: 767f4fcd
2012-07-20 10:14:07,677 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Failed in CreateStoragePoolVDS method
2012-07-20 10:14:07,677 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Error code AcquireLockFailure and error message VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"
2012-07-20 10:14:07,679 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Command org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         651
mMessage                      Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"


2012-07-20 10:14:07,681 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-50) [4831e5ad] Vds: 10.34.63.217
2012-07-20 10:14:07,682 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-50) [4831e5ad] Command CreateStoragePoolVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"
2012-07-20 10:14:07,683 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-50) [4831e5ad] FINISH, CreateStoragePoolVDSCommand, log id: 767f4fcd
2012-07-20 10:14:07,685 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-50) [4831e5ad] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand.
2012-07-20 10:14:07,685 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (pool-3-thread-50) [4831e5ad] transaction rolled back
2012-07-20 10:14:07,686 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot obtain lock: "id=ae494c12-678d-4196-a5fc-5ded24388484, rc=90, out=Cannot acquire cluster lock, err=(90, 'Sanlock resource not acquired', 'Message too long')"
2012-07-20 10:14:07,694 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_pool; snapshot: id=494d549e-d239-11e1-bba2-001a4a013f58.
2012-07-20 10:14:07,698 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.storage_pool_iso_map; snapshot: storagePoolId = 494d549e-d239-11e1-bba2-001a4a013f58, storageId = ae494c12-678d-4196-a5fc-5ded24388484.
2012-07-20 10:14:07,707 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Command [id=6f9e468d-1258-402e-9972-bcafae2a4ad8]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_domain_static; snapshot: id=ae494c12-678d-4196-a5fc-5ded24388484.
2012-07-20 10:14:07,711 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand.
2012-07-20 10:14:07,732 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-50) [4831e5ad] Lock freed to object EngineLock [exclusiveLocks= key: 494d549e-d239-11e1-bba2-001a4a013f58 value: POOL
, sharedLocks= ]

Comment 2 Federico Simoncelli 2012-07-20 14:31:29 UTC
Adding the relevant part from sanlock log:

2012-07-20 10:11:20+0200 46 [7957]: sanlock daemon started 2.3 aio 1 10 renew 20 80 host 26d992df-51e2-4652-890c-a6d97e2c866d.srh-12.rhe time 1342771880
2012-07-20 10:12:08+0200 94 [7962]: s1 lockspace ae494c12-678d-4196-a5fc-5ded24388484:250:/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_20_nfs03/ae494c12-678d-4196-a5fc-5ded24388484/dom_md/ids:0
2012-07-20 10:12:29+0200 115 [7964]: s1:r1 resource ae494c12-678d-4196-a5fc-5ded24388484:SDM:/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_20_nfs03/ae494c12-678d-4196-a5fc-5ded24388484/dom_md/leases:1048576 for 3,14,8419
2012-07-20 10:12:29+0200 115 [7964]: worker1 aio collect 0x7f01e80008c0:0x7f01e80008d0:0x7f01f41f9000 result 0:0 match len 1048576
2012-07-20 10:12:29+0200 115 [7964]: r1 cmd_acquire 3,14,8419 acquire_token -90

Comment 3 Federico Simoncelli 2012-07-20 17:38:49 UTC
The problem is that the domain ae494c12-678d-4196-a5fc-5ded24388484 was created with a quite old vdsm release, vdsm-4.9-16.0:

MainThread::INFO::2012-07-20 09:19:28,220::vdsm::70::vds::(run) I am the actual vdsm 4.9-16.0
[...]
Thread-150::INFO::2012-07-20 09:25:15,196::logUtils::37::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=1, sdUUID='ae494c12-678d-4196-a5fc-5ded24388484', domainName='nfs03-2', typeSpecificArg='10.34.63.204:/mnt/export/nfs/20/nfs03', domClass=1, domVersion='3', options=None)

In that version of VDSM we used the offset 0 for the SPM resource in the leases file. We recently changed that to an offset of 1M in order to make safelease and sanlock live together for the short time of the live upgrade.

That said the sanlock aio was failing because it was trying to read beyond the end of the leases file.

I also saw that when you recreated the domain with the new vdsm version everything worked as expected.

I'd close this as NOTABUG (or eventually CURRENTRELEASE).

Comment 4 Ayal Baron 2012-07-20 20:13:51 UTC
considering comment #7 and seeing as vdsm-4.9.16.0 was never released to customers I'm closing.