Bug 1021557
Summary: | Failed to create StoragePool to FCP Data Center | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | vvyazmin <vvyazmin> | ||||
Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
Status: | CLOSED ERRATA | QA Contact: | Aharon Canan <acanan> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.3.0 | CC: | abaron, acanan, amureini, bazulay, iheim, lpeer, scohen, yeylon | ||||
Target Milestone: | --- | Keywords: | Regression | ||||
Target Release: | 3.3.0 | Flags: | amureini:
Triaged+
|
||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | is24 | Doc Type: | Bug Fix | ||||
Doc Text: |
The domain monitoring thread assumed that the list of monitored domains was the same as the list of attached domains, however this was not valid when any storage domain gets detached. In such cases, the thread was not stopped, and the host ID was still locked or in use. Consequently, users could not attach a previously-detached storage domain to a data center. This update adds a tag mechanism for pool-monitored domains, so now host IDs are correctly released, and unattached storage domains can be attached to data centers.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-01-21 16:18:54 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1021374, 1038284 | ||||||
Attachments: |
|
Description
vvyazmin@redhat.com
2013-10-21 13:57:47 UTC
Created attachment 814617 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
Vlad, from the logs it seems as though the failure occurs when attaching the FIRST storage domain to a DC without any domains. Is this true? Does this reproduce when adding the second/third/Nth domain to an existing DC? (In reply to Allon Mureinik from comment #3) > Vlad, from the logs it seems as though the failure occurs when attaching the > FIRST storage domain to a DC without any domains. > Is this true? > Does this reproduce when adding the second/third/Nth domain to an existing > DC? Vlad, from the logs it seems as though the failure occurs when attaching the FIRST storage domain to a DC without any domains. Is this true? - Yes, it's true Does this reproduce when adding the second/third/Nth domain to an existing DC? - No, I succeed attach second “Unattached” Storage Domain to an existing DC. Lowering this to medium as after speaking with vvyazmin he wasn't able to reproduce the issue. Anyway I don't want to close this yet since I saw a couple glitches in the logs that I want to investigate. This error means that the host id was already acquired in the past: AcquireHostIdFailure: Cannot acquire host id: ('c7812c6a-5959-44ff-a990-df1713f2aef7', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) After investigating it and successfully reproduce the issue it seems that this is related to destroyStoragePool not stopping the relevant monitoring domain threads. From the attached logs we can see the preparation to destroy the storage pool: Thread-4199::INFO::2013-10-21 13:23:47,241::logUtils::44::dispatcher::(wrapper) Run and protect: deactivateStorageDomain(sdUUID='c7812c6a-5959-44ff-a990-df1713f2aef7', spUUID='445dea2d-511a-4529-a66a-670720e587de', msdUUID='00000000-0000-0000-0000-000000000000', masterVersion=1, options=None) ... Thread-4201::INFO::2013-10-21 13:23:47,275::logUtils::44::dispatcher::(wrapper) Run and protect: spmStop(spUUID='445dea2d-511a-4529-a66a-670720e587de', options=None) ... Thread-4204::INFO::2013-10-21 13:23:54,687::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID='445dea2d-511a-4529-a66a-670720e587de', hostID=1, scsiKey='445dea2d-511a-4529-a66a-670720e587de', remove=False, options=None) and the destroy storage pool flow: Thread-4256::INFO::2013-10-21 13:26:01,937::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='445dea2d-511a-4529-a66a-670720e587de', hostID=1, scsiKey='445dea2d-511a-4529-a66a-670720e587de', msdUUID='c7812c6a-5959-44ff-a990-df1713f2aef7', masterVersion=1, options=None) ... Thread-4346::INFO::2013-10-21 13:26:07,967::logUtils::44::dispatcher::(wrapper) Run and protect: spmStart(spUUID='445dea2d-511a-4529-a66a-670720e587de', prevID=-1, prevLVER='0', recoveryMode=None, scsiFencing='false', maxHostID=250, domVersion='3', options=None) ... f302e86a-7954-433a-adbd-328aed14f849::INFO::2013-10-21 13:26:28,282::clusterlock::225::SANLock::(acquire) Acquiring cluster lock for domain c7812c6a-5959-44ff-a990-df1713f2aef7 (id: 1) f302e86a-7954-433a-adbd-328aed14f849::DEBUG::2013-10-21 13:26:28,292::clusterlock::251::SANLock::(acquire) Cluster lock for domain c7812c6a-5959-44ff-a990-df1713f2aef7 successfully acquired (id: 1) ... Thread-4387::INFO::2013-10-21 13:26:31,515::logUtils::44::dispatcher::(wrapper) Run and protect: destroyStoragePool(spUUID='445dea2d-511a-4529-a66a-670720e587de', hostID=1, scsiKey='445dea2d-511a-4529-a66a-670720e587de', options=None) ... and as we see the host id (1) is never released. Later on when we try to use the same storage domain for a new pool we get the EINVAL error because the host id was left acquired (with a different id) from destroyStoragePool: Thread-4660::INFO::2013-10-21 13:38:47,355::logUtils::44::dispatcher::(wrapper) Run and protect: createStoragePool(poolType=None, spUUID='f6b336b6-08a2-4c6f-a063-a16ad7571bd1', poolName='DC-aaa', masterDom='c7812c6a-5959-44ff-a990-df1713f2aef7', domList=['c7812c6a-5959-44ff-a990-df1713f2aef7'], masterVersion=1, lockPolicy=None, lockRenewalIntervalSec=5, leaseTimeSec=60, ioOpTimeoutSec=10, leaseRetries=3, options=None) ... Thread-4660::INFO::2013-10-21 13:38:49,323::clusterlock::174::SANLock::(acquireHostId) Acquiring host id for domain c7812c6a-5959-44ff-a990-df1713f2aef7 (id: 250) Thread-4660::ERROR::2013-10-21 13:38:49,324::task::850::TaskManager.Task::(_setError) Task=`36a06b9c-b911-47f9-bc7f-ca964f62eb23`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 981, in createStoragePool poolName, masterDom, domList, masterVersion, leaseParams) File "/usr/share/vdsm/storage/sp.py", line 615, in create self._acquireTemporaryClusterLock(msdUUID, leaseParams) File "/usr/share/vdsm/storage/sp.py", line 557, in _acquireTemporaryClusterLock msd.acquireHostId(self.id) File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId self._clusterLock.acquireHostId(hostId, async) File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('c7812c6a-5959-44ff-a990-df1713f2aef7', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) I've been able to identify that this issue was introduced as part of the monitoring implementation: 7b1cc6a Adding [start|stop]MonitoringDomain() in fact updateMonitoringThreads has been modified to assume that the list of the monitored domains is the same as the list of the attached domains: + poolDoms = self.getDomains() ... - for sdUUID in monitoredDomains: + for sdUUID in poolDoms: if sdUUID not in activeDomains: try: self.domainMonitor.stopMonitoring(sdUUID) This assumption is not valid when we are detaching a storage domain (as it won't be listed by getDomains). The resulting issue is that the monitor domain thread is not stopped and the host id is left acquired (ids file/device kept open by sanlock). The steps to reproduce are: 1. create a storage domain 2. attach the storage domain to an empty data center 3. once the spm is acquired, deactivate the storage domain 4. remove the data center 5. create a new data center 6. attach the storage domain to the newly created data center Actual results: Attach storage domain fails with SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')). Expected results: Attach storage domain should succeed. verified using is24.1 using FC according to steps in comment #9 LIBVIRT Version: libvirt-0.10.2-29.el6 VDSM Version: vdsm-4.13.0-0.9.beta1.el6ev Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html |