Description of problem: VDSM StorageDomain.create internally ignores a ClusterLockInitError (due to other reasons) and reports a success also if it failed. Seen on gluster in bug rhbz#1386443, not sure if gluster specific. jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,843::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'StorageDomain.create' in bridge with {u'name': u'hosted_storage', u'domainType': 7, u'domainClass': 1, u'typeArgs': u'10.1.200.84:/engine', u'version': 3, u'storagedomainID': u'ee2bf992-ec7d-4e32-97b7-aa214aca587b'} jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,844::task::597::Storage.TaskManager.Task::(_updateState) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::moving from state init -> state preparing jsonrpc.Executor/4::INFO::2016-10-18 20:47:31,844::logUtils::49::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=7, sdUUID=u'ee2bf992-ec7d-4e32-97b7-aa214aca587b', domainName=u'hosted_storage', typeSpecificArg=u'10.1.200.84:/engine', domClass=1, domVersion=3, options=None) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,844::misc::556::Storage.SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::misc::559::Storage.SamplingMethod::(__call__) Got in to sampling method jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::misc::556::Storage.SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::misc::559::Storage.SamplingMethod::(__call__) Got in to sampling method jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::iscsi::434::Storage.ISCSI::(rescan) Performing SCSI scan, this will take up to 30 seconds jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::iscsiadm::119::Storage.Misc.excCmd::(_runCmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /sbin/iscsiadm -m session -R (cwd None) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,873::misc::566::Storage.SamplingMethod::(__call__) Returning last result jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,873::misc::556::Storage.SamplingMethod::(__call__) Trying to enter sampling method (vdsm.storage.hba.rescan) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,873::misc::559::Storage.SamplingMethod::(__call__) Got in to sampling method jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,874::hba::58::Storage.HBA::(rescan) Starting scan jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,976::hba::64::Storage.HBA::(rescan) Scan finished jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,976::misc::566::Storage.SamplingMethod::(__call__) Returning last result jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,976::multipath::78::Storage.Misc.excCmd::(rescan) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /usr/sbin/multipath (cwd None) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,075::multipath::78::Storage.Misc.excCmd::(rescan) SUCCESS: <err> = ''; <rc> = 0 jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,075::commands::68::root::(execCmd) /usr/bin/taskset --cpu-list 0-7 /sbin/udevadm settle --timeout=5 (cwd None) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,104::commands::86::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,107::misc::566::Storage.SamplingMethod::(__call__) Returning last result jsonrpc.Executor/4::ERROR::2016-10-18 20:47:32,107::sdc::140::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain ee2bf992-ec7d-4e32-97b7-aa214aca587b jsonrpc.Executor/4::ERROR::2016-10-18 20:47:32,107::sdc::157::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain ee2bf992-ec7d-4e32-97b7-aa214aca587b jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,110::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001e4f025c17a001f98457d0501591b|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name ee2bf992-ec7d-4e32-97b7-aa214aca587b (cwd None) jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,186::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n Volume group "ee2bf992-ec7d-4e32-97b7-aa214aca587b" not found\n Cannot process volume group ee2bf992-ec7d-4e32-97b7-aa214aca587b\n'; <rc> = 5 jsonrpc.Executor/4::WARNING::2016-10-18 20:47:32,188::lvm::376::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', ' Volume group "ee2bf992-ec7d-4e32-97b7-aa214aca587b" not found', ' Cannot process volume group ee2bf992-ec7d-4e32-97b7-aa214aca587b'] Thread-15::DEBUG::2016-10-18 20:47:32,193::outOfProcess::87::Storage.oop::(getProcessPool) Creating ioprocess /rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmpwu0SKR Thread-15::INFO::2016-10-18 20:47:32,193::__init__::325::IOProcessClient::(__init__) Starting client ioprocess-2 Thread-15::DEBUG::2016-10-18 20:47:32,194::__init__::334::IOProcessClient::(_run) Starting ioprocess for client ioprocess-2 Thread-15::DEBUG::2016-10-18 20:47:32,207::__init__::386::IOProcessClient::(_startCommunication) Starting communication thread for client ioprocess-2 Thread-15::DEBUG::2016-10-18 20:47:32,208::__init__::398::IOProcessClient::(_startCommunication) Communication thread for client ioprocess-2 started ioprocess communication (5190)::INFO::2016-10-18 20:47:32,209::__init__::447::IOProcess::(_processLogs) Starting ioprocess ioprocess communication (5190)::INFO::2016-10-18 20:47:32,209::__init__::447::IOProcess::(_processLogs) Starting ioprocess jsonrpc.Executor/4::ERROR::2016-10-18 20:47:32,214::sdc::146::Storage.StorageDomainCache::(_findDomain) domain ee2bf992-ec7d-4e32-97b7-aa214aca587b not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'ee2bf992-ec7d-4e32-97b7-aa214aca587b',) jsonrpc.Executor/4::INFO::2016-10-18 20:47:32,215::nfsSD::70::Storage.StorageDomain::(create) sdUUID=ee2bf992-ec7d-4e32-97b7-aa214aca587b domainName=hosted_storage remotePath=10.1.200.84:/engine domClass=1 jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,238::outOfProcess::87::Storage.oop::(getProcessPool) Creating ioprocess ee2bf992-ec7d-4e32-97b7-aa214aca587b jsonrpc.Executor/4::INFO::2016-10-18 20:47:32,238::__init__::325::IOProcessClient::(__init__) Starting client ioprocess-3 jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,238::__init__::334::IOProcessClient::(_run) Starting ioprocess for client ioprocess-3 jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,252::__init__::386::IOProcessClient::(_startCommunication) Starting communication thread for client ioprocess-3 jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,254::__init__::398::IOProcessClient::(_startCommunication) Communication thread for client ioprocess-3 started ioprocess communication (5202)::INFO::2016-10-18 20:47:32,255::__init__::447::IOProcess::(_processLogs) Starting ioprocess ioprocess communication (5202)::INFO::2016-10-18 20:47:32,255::__init__::447::IOProcess::(_processLogs) Starting ioprocess jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,371::persistent::194::Storage.PersistentDict::(__init__) Created a persistent dict with FileMetadataRW backend jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,372::persistent::236::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=[] jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::254::Storage.PersistentDict::(refresh) Empty metadata jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::169::Storage.PersistentDict::(transaction) Starting transaction jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::175::Storage.PersistentDict::(transaction) Flushing changes jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::307::Storage.PersistentDict::(flush) about to write lines (FileMetadataRW)=['CLASS=Data', u'DESCRIPTION=hosted_storage', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=10.1.200.84:/engine', 'ROLE=Regular', 'SDUUID=ee2bf992-ec7d-4e32-97b7-aa214aca587b', 'TYPE=GLUSTERFS', 'VERSION=3', '_SHA_CKSUM=71ad1d42b694777ba8a536e70d8beb6b6f178b8b'] jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,387::persistent::177::Storage.PersistentDict::(transaction) Finished transaction jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,398::fileSD::159::Storage.StorageDomainManifest::(__init__) Reading domain in path /rhev/data-center/mnt/glusterSD/10.1.200.84:_engine/ee2bf992-ec7d-4e32-97b7-aa214aca587b jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,399::persistent::194::Storage.PersistentDict::(__init__) Created a persistent dict with FileMetadataRW backend jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,402::persistent::236::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=hosted_storage', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=10.1.200.84:/engine', 'ROLE=Regular', 'SDUUID=ee2bf992-ec7d-4e32-97b7-aa214aca587b', 'TYPE=GLUSTERFS', 'VERSION=3', '_SHA_CKSUM=71ad1d42b694777ba8a536e70d8beb6b6f178b8b'] jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::fileSD::679::Storage.StorageDomain::(imageGarbageCollector) Removing remnants of deleted images [] jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::resourceManager::426::Storage.ResourceManager::(registerNamespace) Registering namespace 'ee2bf992-ec7d-4e32-97b7-aa214aca587b_imageNS' jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::resourceManager::426::Storage.ResourceManager::(registerNamespace) Registering namespace 'ee2bf992-ec7d-4e32-97b7-aa214aca587b_volumeNS' jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::clusterlock::183::Storage.initSANLock::(initSANLock) Initializing SANLock for domain ee2bf992-ec7d-4e32-97b7-aa214aca587b Reactor thread::INFO::2016-10-18 20:47:33,343::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from ::ffff:127.0.0.1:53818 Reactor thread::DEBUG::2016-10-18 20:47:33,354::protocoldetector::85::ProtocolDetector.Detector::(__init__) Using required_size=11 Reactor thread::INFO::2016-10-18 20:47:33,355::protocoldetector::121::ProtocolDetector.Detector::(handle_read) Detected protocol xml from ::ffff:127.0.0.1:53818 Reactor thread::DEBUG::2016-10-18 20:47:33,355::bindingxmlrpc::1302::XmlDetector::(handle_socket) xml over http detected from ('::ffff:127.0.0.1', 53818) BindingXMLRPC::INFO::2016-10-18 20:47:33,355::xmlrpc::73::vds.XMLRPCServer::(handle_request) Starting request handler for ::ffff:127.0.0.1:53818 Thread-18::INFO::2016-10-18 20:47:33,355::xmlrpc::83::vds.XMLRPCServer::(_process_requests) Request handler for ::ffff:127.0.0.1:53818 started Thread-18::DEBUG::2016-10-18 20:47:33,356::bindingxmlrpc::1247::vds::(wrapper) client [::ffff:127.0.0.1]::call getAllVmStats with () {} Thread-18::INFO::2016-10-18 20:47:33,357::throttledlog::105::throttled::(log) Current getAllVmStats: {} Thread-18::DEBUG::2016-10-18 20:47:33,357::bindingxmlrpc::1255::vds::(wrapper) return getAllVmStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} Thread-18::INFO::2016-10-18 20:47:33,357::bindingxmlrpc::1282::vds::(wrapper) RPC call getAllVmStats finished (code=0) in 0.00 seconds Thread-18::INFO::2016-10-18 20:47:33,358::xmlrpc::91::vds.XMLRPCServer::(_process_requests) Request handler for ::ffff:127.0.0.1:53818 stopped jsonrpc.Executor/4::ERROR::2016-10-18 20:47:33,623::clusterlock::191::Storage.initSANLock::(initSANLock) Cannot initialize SANLock for domain ee2bf992-ec7d-4e32-97b7-aa214aca587b Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 186, in initSANLock sanlock.init_lockspace(sdUUID, idsPath) SanlockException: (22, 'Sanlock lockspace init failure', 'Invalid argument') jsonrpc.Executor/4::WARNING::2016-10-18 20:47:33,624::sd::467::Storage.StorageDomainManifest::(initDomainLock) lease did not initialize successfully Traceback (most recent call last): File "/usr/share/vdsm/storage/sd.py", line 462, in initDomainLock self._domainLock.initLock() File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 222, in initLock initSANLock(self._sdUUID, self._idsPath, self._leasesPath) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 192, in initSANLock raise se.ClusterLockInitError() ClusterLockInitError: Could not initialize cluster lock: () jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,625::hsm::2598::Storage.HSM::(createStorageDomain) knownSDs: {ee2bf992-ec7d-4e32-97b7-aa214aca587b: storage.glusterSD.findDomain} jsonrpc.Executor/4::INFO::2016-10-18 20:47:33,625::logUtils::52::dispatcher::(wrapper) Run and protect: createStorageDomain, Return response: None jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,625::task::1193::Storage.TaskManager.Task::(prepare) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::finished: None jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,625::task::597::Storage.TaskManager.Task::(_updateState) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::moving from state preparing -> state finished jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::resourceManager::952::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::task::995::Storage.TaskManager.Task::(_decref) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::ref 0 aborting False jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::__init__::555::jsonrpc.JsonRpcServer::(_handle_request) Return 'StorageDomain.create' in bridge with True jsonrpc.Executor/4::INFO::2016-10-18 20:47:33,626::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call StorageDomain.create succeeded in 1.78 seconds Version-Release number of selected component (if applicable): How reproducible: Seen on gluster in bug rhbz#1386443 Steps to Reproduce: 1. tbd 2. 3. Actual results: VDSM StorageDomain.create internally ignores a ClusterLockInitError and reports a success also if it failed. Expected results: VDSM should notify the failure creating the storage domain. Additional info: Seen on gluster, not sure how to trigger it again.
Moving out all non blocker\exceptions.
The underlying issue in the Gluster bug 1386443 was a storage configuration problem relating to alignment. However, we should fail to create the storage domain if we cannot initialize the lease. Otherwise, we will fail when trying to use that domain.
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: infra
Adam, this bug is targetted to 4.1.3. You should either backport the patch to the 4.1 stable branch, or we should discuss retargetting it if you don't think this is zstream material.
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Tag 'v4.19.17' doesn't contain patch 'https://gerrit.ovirt.org/77566'] gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.17 For more info please contact: infra
Backported. Clearing NEEDINFO.
Hi Simone, Can you please provide steps to reproduce?
Hi, We need steps to reproduce this bz in order to verify. Please advise
This is difficult to reproduce without modifying vdsm code. You can either modify vdsm (1) OR try to create a setup (2) 1. To modify code: - Open /usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py - Edit the initSANLock function and insert a 'raise Exception()' statement inside the existing try/except block there. - Restart vdsm 2. To create a setup: - In a brand new DC with no domains login to the host on which you plan to use for creating an initial domain. - Run the following looping code in the terminal: leases=/rhev/data-center/*/*/dom_md/leases while true; do rm -f $leases; done - Attempt to create a domain. It should fail with ClusterLockInitError. Note that I used method 1 (editing code) to test this flow.
Moving to Verified