Bug 1386830 - When creating a storage domain fails due to ClusterLockInitError, it is ignored and action completes
Summary: When creating a storage domain fails due to ClusterLockInitError, it is ignor...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.18.14
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.1.3
: 4.19.18
Assignee: Adam Litke
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-19 16:53 UTC by Simone Tiraboschi
Modified: 2017-07-06 13:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-06 13:31:07 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1386443 0 unspecified CLOSED Installation of gluster gives Failed to execute stage 'Misc configuration': Cannot acquire host id: ... SanlockException... 2021-02-22 00:41:40 UTC
oVirt gerrit 77021 0 'None' MERGED storage: Don't swallow domain lock init failures 2021-01-29 10:24:25 UTC
oVirt gerrit 77566 0 'None' MERGED storage: Don't swallow domain lock init failures 2021-01-29 10:24:25 UTC

Internal Links: 1386443

Description Simone Tiraboschi 2016-10-19 16:53:08 UTC
Description of problem:
VDSM StorageDomain.create internally ignores a ClusterLockInitError (due to other reasons) and reports a success also if it failed.
Seen on gluster in bug rhbz#1386443, not sure if gluster specific.

jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,843::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'StorageDomain.create' in bridge with {u'name': u'hosted_storage', u'domainType': 7, u'domainClass': 1, u'typeArgs': u'10.1.200.84:/engine', u'version': 3, u'storagedomainID': u'ee2bf992-ec7d-4e32-97b7-aa214aca587b'}
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,844::task::597::Storage.TaskManager.Task::(_updateState) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::moving from state init -> state preparing
jsonrpc.Executor/4::INFO::2016-10-18 20:47:31,844::logUtils::49::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=7, sdUUID=u'ee2bf992-ec7d-4e32-97b7-aa214aca587b', domainName=u'hosted_storage', typeSpecificArg=u'10.1.200.84:/engine', domClass=1, domVersion=3, options=None)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,844::misc::556::Storage.SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::misc::559::Storage.SamplingMethod::(__call__) Got in to sampling method
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::misc::556::Storage.SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::misc::559::Storage.SamplingMethod::(__call__) Got in to sampling method
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::iscsi::434::Storage.ISCSI::(rescan) Performing SCSI scan, this will take up to 30 seconds
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,845::iscsiadm::119::Storage.Misc.excCmd::(_runCmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /sbin/iscsiadm -m session -R (cwd None)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,873::misc::566::Storage.SamplingMethod::(__call__) Returning last result
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,873::misc::556::Storage.SamplingMethod::(__call__) Trying to enter sampling method (vdsm.storage.hba.rescan)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,873::misc::559::Storage.SamplingMethod::(__call__) Got in to sampling method
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,874::hba::58::Storage.HBA::(rescan) Starting scan
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,976::hba::64::Storage.HBA::(rescan) Scan finished
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,976::misc::566::Storage.SamplingMethod::(__call__) Returning last result
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:31,976::multipath::78::Storage.Misc.excCmd::(rescan) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /usr/sbin/multipath (cwd None)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,075::multipath::78::Storage.Misc.excCmd::(rescan) SUCCESS: <err> = ''; <rc> = 0
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,075::commands::68::root::(execCmd) /usr/bin/taskset --cpu-list 0-7 /sbin/udevadm settle --timeout=5 (cwd None)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,104::commands::86::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,107::misc::566::Storage.SamplingMethod::(__call__) Returning last result
jsonrpc.Executor/4::ERROR::2016-10-18 20:47:32,107::sdc::140::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain ee2bf992-ec7d-4e32-97b7-aa214aca587b
jsonrpc.Executor/4::ERROR::2016-10-18 20:47:32,107::sdc::157::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain ee2bf992-ec7d-4e32-97b7-aa214aca587b
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,110::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001e4f025c17a001f98457d0501591b|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name ee2bf992-ec7d-4e32-97b7-aa214aca587b (cwd None)
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,186::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n  Volume group "ee2bf992-ec7d-4e32-97b7-aa214aca587b" not found\n  Cannot process volume group ee2bf992-ec7d-4e32-97b7-aa214aca587b\n'; <rc> = 5
jsonrpc.Executor/4::WARNING::2016-10-18 20:47:32,188::lvm::376::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', '  Volume group "ee2bf992-ec7d-4e32-97b7-aa214aca587b" not found', '  Cannot process volume group ee2bf992-ec7d-4e32-97b7-aa214aca587b']
Thread-15::DEBUG::2016-10-18 20:47:32,193::outOfProcess::87::Storage.oop::(getProcessPool) Creating ioprocess /rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmpwu0SKR
Thread-15::INFO::2016-10-18 20:47:32,193::__init__::325::IOProcessClient::(__init__) Starting client ioprocess-2
Thread-15::DEBUG::2016-10-18 20:47:32,194::__init__::334::IOProcessClient::(_run) Starting ioprocess for client ioprocess-2
Thread-15::DEBUG::2016-10-18 20:47:32,207::__init__::386::IOProcessClient::(_startCommunication) Starting communication thread for client ioprocess-2
Thread-15::DEBUG::2016-10-18 20:47:32,208::__init__::398::IOProcessClient::(_startCommunication) Communication thread for client ioprocess-2 started
ioprocess communication (5190)::INFO::2016-10-18 20:47:32,209::__init__::447::IOProcess::(_processLogs) Starting ioprocess
ioprocess communication (5190)::INFO::2016-10-18 20:47:32,209::__init__::447::IOProcess::(_processLogs) Starting ioprocess
jsonrpc.Executor/4::ERROR::2016-10-18 20:47:32,214::sdc::146::Storage.StorageDomainCache::(_findDomain) domain ee2bf992-ec7d-4e32-97b7-aa214aca587b not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'ee2bf992-ec7d-4e32-97b7-aa214aca587b',)
jsonrpc.Executor/4::INFO::2016-10-18 20:47:32,215::nfsSD::70::Storage.StorageDomain::(create) sdUUID=ee2bf992-ec7d-4e32-97b7-aa214aca587b domainName=hosted_storage remotePath=10.1.200.84:/engine domClass=1
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,238::outOfProcess::87::Storage.oop::(getProcessPool) Creating ioprocess ee2bf992-ec7d-4e32-97b7-aa214aca587b
jsonrpc.Executor/4::INFO::2016-10-18 20:47:32,238::__init__::325::IOProcessClient::(__init__) Starting client ioprocess-3
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,238::__init__::334::IOProcessClient::(_run) Starting ioprocess for client ioprocess-3
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,252::__init__::386::IOProcessClient::(_startCommunication) Starting communication thread for client ioprocess-3
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,254::__init__::398::IOProcessClient::(_startCommunication) Communication thread for client ioprocess-3 started
ioprocess communication (5202)::INFO::2016-10-18 20:47:32,255::__init__::447::IOProcess::(_processLogs) Starting ioprocess
ioprocess communication (5202)::INFO::2016-10-18 20:47:32,255::__init__::447::IOProcess::(_processLogs) Starting ioprocess
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,371::persistent::194::Storage.PersistentDict::(__init__) Created a persistent dict with FileMetadataRW backend
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,372::persistent::236::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=[]
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::254::Storage.PersistentDict::(refresh) Empty metadata
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::169::Storage.PersistentDict::(transaction) Starting transaction
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::175::Storage.PersistentDict::(transaction) Flushing changes
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,373::persistent::307::Storage.PersistentDict::(flush) about to write lines (FileMetadataRW)=['CLASS=Data', u'DESCRIPTION=hosted_storage', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=10.1.200.84:/engine', 'ROLE=Regular', 'SDUUID=ee2bf992-ec7d-4e32-97b7-aa214aca587b', 'TYPE=GLUSTERFS', 'VERSION=3', '_SHA_CKSUM=71ad1d42b694777ba8a536e70d8beb6b6f178b8b']
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,387::persistent::177::Storage.PersistentDict::(transaction) Finished transaction
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,398::fileSD::159::Storage.StorageDomainManifest::(__init__) Reading domain in path /rhev/data-center/mnt/glusterSD/10.1.200.84:_engine/ee2bf992-ec7d-4e32-97b7-aa214aca587b
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,399::persistent::194::Storage.PersistentDict::(__init__) Created a persistent dict with FileMetadataRW backend
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,402::persistent::236::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=hosted_storage', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=10.1.200.84:/engine', 'ROLE=Regular', 'SDUUID=ee2bf992-ec7d-4e32-97b7-aa214aca587b', 'TYPE=GLUSTERFS', 'VERSION=3', '_SHA_CKSUM=71ad1d42b694777ba8a536e70d8beb6b6f178b8b']
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::fileSD::679::Storage.StorageDomain::(imageGarbageCollector) Removing remnants of deleted images []
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::resourceManager::426::Storage.ResourceManager::(registerNamespace) Registering namespace 'ee2bf992-ec7d-4e32-97b7-aa214aca587b_imageNS'
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::resourceManager::426::Storage.ResourceManager::(registerNamespace) Registering namespace 'ee2bf992-ec7d-4e32-97b7-aa214aca587b_volumeNS'
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:32,408::clusterlock::183::Storage.initSANLock::(initSANLock) Initializing SANLock for domain ee2bf992-ec7d-4e32-97b7-aa214aca587b
Reactor thread::INFO::2016-10-18 20:47:33,343::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from ::ffff:127.0.0.1:53818
Reactor thread::DEBUG::2016-10-18 20:47:33,354::protocoldetector::85::ProtocolDetector.Detector::(__init__) Using required_size=11
Reactor thread::INFO::2016-10-18 20:47:33,355::protocoldetector::121::ProtocolDetector.Detector::(handle_read) Detected protocol xml from ::ffff:127.0.0.1:53818
Reactor thread::DEBUG::2016-10-18 20:47:33,355::bindingxmlrpc::1302::XmlDetector::(handle_socket) xml over http detected from ('::ffff:127.0.0.1', 53818)
BindingXMLRPC::INFO::2016-10-18 20:47:33,355::xmlrpc::73::vds.XMLRPCServer::(handle_request) Starting request handler for ::ffff:127.0.0.1:53818
Thread-18::INFO::2016-10-18 20:47:33,355::xmlrpc::83::vds.XMLRPCServer::(_process_requests) Request handler for ::ffff:127.0.0.1:53818 started
Thread-18::DEBUG::2016-10-18 20:47:33,356::bindingxmlrpc::1247::vds::(wrapper) client [::ffff:127.0.0.1]::call getAllVmStats with () {}
Thread-18::INFO::2016-10-18 20:47:33,357::throttledlog::105::throttled::(log) Current getAllVmStats: {}
Thread-18::DEBUG::2016-10-18 20:47:33,357::bindingxmlrpc::1255::vds::(wrapper) return getAllVmStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)}
Thread-18::INFO::2016-10-18 20:47:33,357::bindingxmlrpc::1282::vds::(wrapper) RPC call getAllVmStats finished (code=0) in 0.00 seconds
Thread-18::INFO::2016-10-18 20:47:33,358::xmlrpc::91::vds.XMLRPCServer::(_process_requests) Request handler for ::ffff:127.0.0.1:53818 stopped
jsonrpc.Executor/4::ERROR::2016-10-18 20:47:33,623::clusterlock::191::Storage.initSANLock::(initSANLock) Cannot initialize SANLock for domain ee2bf992-ec7d-4e32-97b7-aa214aca587b
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 186, in initSANLock
    sanlock.init_lockspace(sdUUID, idsPath)
SanlockException: (22, 'Sanlock lockspace init failure', 'Invalid argument')
jsonrpc.Executor/4::WARNING::2016-10-18 20:47:33,624::sd::467::Storage.StorageDomainManifest::(initDomainLock) lease did not initialize successfully
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sd.py", line 462, in initDomainLock
    self._domainLock.initLock()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 222, in initLock
    initSANLock(self._sdUUID, self._idsPath, self._leasesPath)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 192, in initSANLock
    raise se.ClusterLockInitError()
ClusterLockInitError: Could not initialize cluster lock: ()
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,625::hsm::2598::Storage.HSM::(createStorageDomain) knownSDs: {ee2bf992-ec7d-4e32-97b7-aa214aca587b: storage.glusterSD.findDomain}
jsonrpc.Executor/4::INFO::2016-10-18 20:47:33,625::logUtils::52::dispatcher::(wrapper) Run and protect: createStorageDomain, Return response: None
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,625::task::1193::Storage.TaskManager.Task::(prepare) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::finished: None
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,625::task::597::Storage.TaskManager.Task::(_updateState) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::moving from state preparing -> state finished
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::resourceManager::952::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::task::995::Storage.TaskManager.Task::(_decref) Task=`8d24a62c-c1c0-462c-9ccc-57552cae39ec`::ref 0 aborting False
jsonrpc.Executor/4::DEBUG::2016-10-18 20:47:33,626::__init__::555::jsonrpc.JsonRpcServer::(_handle_request) Return 'StorageDomain.create' in bridge with True
jsonrpc.Executor/4::INFO::2016-10-18 20:47:33,626::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call StorageDomain.create succeeded in 1.78 seconds

Version-Release number of selected component (if applicable):


How reproducible:
Seen on gluster in bug rhbz#1386443

Steps to Reproduce:
1. tbd
2.
3.

Actual results:
VDSM StorageDomain.create internally ignores a ClusterLockInitError and reports a success also if it failed.

Expected results:
VDSM should notify the failure creating the storage domain.

Additional info:
Seen on gluster, not sure how to trigger it again.

Comment 1 Yaniv Lavi 2017-02-23 11:26:23 UTC
Moving out all non blocker\exceptions.

Comment 2 Yaniv Lavi 2017-02-23 11:26:37 UTC
Moving out all non blocker\exceptions.

Comment 3 Adam Litke 2017-05-16 16:11:06 UTC
The underlying issue in the Gluster bug 1386443 was a storage configuration problem relating to alignment.  However, we should fail to create the storage domain if we cannot initialize the lease.  Otherwise, we will fail when trying to use that domain.

Comment 4 rhev-integ 2017-05-28 14:46:25 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: infra

Comment 5 Allon Mureinik 2017-05-28 15:56:33 UTC
Adam, this bug is targetted to 4.1.3.
You should either backport the patch to the 4.1 stable branch, or we should discuss retargetting it if you don't think this is zstream material.

Comment 6 rhev-integ 2017-06-01 16:13:37 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'v4.19.17' doesn't contain patch 'https://gerrit.ovirt.org/77566']
gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.17

For more info please contact: infra

Comment 7 Adam Litke 2017-06-05 14:02:50 UTC
Backported.  Clearing NEEDINFO.

Comment 8 Raz Tamir 2017-06-12 17:17:13 UTC
Hi Simone,
Can you please provide steps to reproduce?

Comment 9 Kevin Alon Goldblatt 2017-06-20 15:36:18 UTC
Hi,

We need steps to reproduce this bz in order to verify. Please advise

Comment 10 Kevin Alon Goldblatt 2017-06-21 11:52:48 UTC
Hi,

We need steps to reproduce this bz in order to verify. Please advise

Comment 11 Adam Litke 2017-06-21 19:24:39 UTC
This is difficult to reproduce without modifying vdsm code.  You can either modify vdsm (1) OR try to create a setup (2)

1. To modify code:

- Open /usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py
- Edit the initSANLock function and insert a 'raise Exception()' statement inside the existing try/except block there.
- Restart vdsm


2. To create a setup:

- In a brand new DC with no domains login to the host on which you plan to use for creating an initial domain.
- Run the following looping code in the terminal:

    leases=/rhev/data-center/*/*/dom_md/leases
    while true; do rm -f $leases; done

- Attempt to create a domain.  It should fail with ClusterLockInitError.


Note that I used method 1 (editing code) to test this flow.

Comment 12 Kevin Alon Goldblatt 2017-06-22 10:24:53 UTC
Moving to Verified


Note You need to log in before you can comment on or make changes to this bug.