Bug 1032925
Summary: | Domain monitoring stops after disconnecting and reconnecting storage pool leading to bogus "No space left" error | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ilanit Stein <istein> | ||||||||
Component: | vdsm | Assignee: | Liron Aravot <laravot> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Aharon Canan <acanan> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.3.0 | CC: | abaron, acanan, acathrow, amureini, bazulay, eedri, fsimonce, iheim, istein, jentrena, knesenko, lpeer, nsoffer, obasan, Rhev-m-bugs, rmcswain, scohen, tnisan, yeylon | ||||||||
Target Milestone: | --- | Keywords: | AutomationBlocker, Regression, TestBlocker | ||||||||
Target Release: | 3.3.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | storage | ||||||||||
Fixed In Version: | is27 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
Domain monitoring stopped after disconnecting and reconnecting the storage pool, which yielded misleading "No space left" errors. This was caused by an anonymous thread running at unexpected times, and silently stopping all domain monitors. Now, the monitoring is stopped when appropriate, so users do not receive incorrect storage space limitation errors.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-01-21 16:22:02 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1049022 | ||||||||||
Attachments: |
|
Description
Ilanit Stein
2013-11-21 09:17:51 UTC
Created attachment 827086 [details]
logs
The given message is being when various issues related to the storage domain of the template disks are being encountered by the engine - regardless of this issue it's bad and should be fixed so we can know explictly what's the error and not have general message for few errors. In that case, it appears that the error is free space related, I suspect that the issue was caused by a vdsm bug that was handled here: http://gerrit.ovirt.org/#/c/21490/ Ilanit, please check what are the disks sizes that appear in the engine in that case so we can know if that's the issue. Regardless, the error messages propogation issue should be handled, but that's possibly 3.4 bug (Allon?). Additional info : http://gerrit.ovirt.org/#/c/21490/ shouldn't be relevant as this issue was present only upstream. Ilanit, if you can - please attach a pg_dump of the db when this scenario occurs so i can look exactly on the current state. thanks. Created attachment 829320 [details]
pg_dump
Ilanit, the pg_dump file is corrupted - therefore i can't see the creation of the template/disk sizes/etc. I tested it manually on 3.3 and it seems to work fine (Atleast the described scenario) please try to reproduce it and attach again or if you come to clear reproduction steps it'll be even better. thanks. Created attachment 831130 [details]
pg dump unzipped
*** Bug 1035822 has been marked as a duplicate of this bug. *** The "create vmpool" failure, might have same root cause, as another failure in this same test, "clone VM from template". In this link both failures occur: http://jenkins-ci.eng.lab.tlv.redhat.com/job/rhevm_3.3_automation_coretools_two_hosts_restapi_vms_nfs_rest_factory/72/ amureini is on that. I've looked into the db dump/vdsm/engine logs - I'll provide here short analyses and then will add the relevant log parts: The host cinteg14 becomes the spm, getStoragePoolInfo returns us the info including free space on each domain. Then a copy image operation is being started on that host (copying the image within the same domain) - we stop the domain monitoring on all the domain (which should be inspected) and later on fail to stop it on the domain the image is copied within - this will lead to GetStoragePoolInfo to return from now without the diskfree key indicating the free space and to not have repoStats on any of the domains. In the engine, the response of GetStoragePoolInfo is updated to the engine db, as now the available_disk_size should be 0/null, we will get CDA messages for insufficient space (till the domains will move to inactive). Therefore i'm moving the bug to the vdsm as it's the main issue here - in the engine logs could be added in case that the free space isn't retrieved from vdsm, but that's regardless. Some related log parts: (1) note this reponse of getStoragePoolInfo, it contains the "diskfree" key for each of the domains -------------------------------------------------------------------------- Thread-1496::INFO::2013-11-20 13:11:05,641::logUtils::47::dispatcher::(wrapper) Run and protect: getStoragePoolInfo, Return response: {'info': {'spm_id': 1, 'master_uuid': ' 9502a003-6e7e-496e-97c0-225d45b476c1', 'name': 'DefaultRest', 'version': '3', 'domains': '8981cbaa-732a-43cd-b143-00c71734458b:Active,9502a003-6e7e-496e-97c0-225d45b476c1:Ac tive,7df91534-49bd-499a-bb59-9c26b87b7f2b:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'NFS', 'master_ver': 1, 'lver': 3}, 'dominfo': {'8981cbaa-732a-43cd-b 143-00c71734458b': {'status': 'Active', 'diskfree': '1325691109376', 'isoprefix': '', 'alerts': [], 'disktotal': '1460288880640', 'version': 0}, '9502a003-6e7e-496e-97c0-225 d45b476c1': {'status': 'Active', 'diskfree': '1325691109376', 'isoprefix': '', 'alerts': [], 'disktotal': '1460288880640', 'version': 3}, '7df91534-49bd-499a-bb59-9c26b87b7f 2b': {'status': 'Active', 'diskfree': '1325691109376', 'isoprefix': '', 'alerts': [], 'disktotal': '1460288880640', 'version': 3}}} Thread-1496::DEBUG::2013-11-20 13:11:05,641::task::1168::TaskManager.Task::(prepare) Task=`6f773615-784c-45cd-9046-e8b4b2b03289`::finished: {'info': {'spm_id': 1, 'master_uu id': '9502a003-6e7e-496e-97c0-225d45b476c1', 'name': 'DefaultRest', 'version': '3', 'domains': '8981cbaa-732a-43cd-b143-00c71734458b:Active,9502a003-6e7e-496e-97c0-225d45b47 6c1:Active,7df91534-49bd-499a-bb59-9c26b87b7f2b:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'NFS', 'master_ver': 1, 'lver': 3}, 'dominfo': {'8981cbaa-732a- 43cd-b143-00c71734458b': {'status': 'Active', 'diskfree': '1325691109376', 'isoprefix': '', 'alerts': [], 'disktotal': '1460288880640', 'version': 0}, '9502a003-6e7e-496e-97 c0-225d45b476c1': {'status': 'Active', 'diskfree': '1325691109376', 'isoprefix': '', 'alerts': [], 'disktotal': '1460288880640', 'version': 3}, '7df91534-49bd-499a-bb59-9c26 b87b7f2b': {'status': 'Active', 'diskfree': '1325691109376', 'isoprefix': '', 'alerts': [], 'disktotal': '1460288880640', 'version': 3}}} -------------------------------------------------------------------------- 2. then a copy image operation is being started -------------------------------------------------------------------------- Thread-1498::INFO::2013-11-20 13:11:09,385::logUtils::44::dispatcher::(wrapper) Run and protect: copyImage(sdUUID='9502a003-6e7e-496e-97c0-225d45b476c1', spUUID='1d695907-39 24-4eb7-9484-812d82023213', vmUUID='', srcImgUUID='cf92af3b-317b-433a-a374-eef1c2ed4307', srcVolUUID='bba48cad-35d1-4f79-8e8b-5034ca6949ce', dstImgUUID='06a2c384-a49c-4c67-9 110-6df59ab3b658', dstVolUUID='897b2c2d-680d-4122-bc53-d40c84b28e18', description='Active VM', dstSdUUID='9502a003-6e7e-496e-97c0-225d45b476c1', volType=6, volFormat=4, prea llocate=2, postZero='false', force='false') Thread-1498::DEBUG::2013-11-20 13:11:09,448::taskManager::68::TaskManager::(scheduleJob) scheduled job copyImage_06a2c384-a49c-4c67-9110-6df59ab3b658 for task 6828283e-958c- 41a5-8dd0-b215bfab904d Thread-1498::INFO::2013-11-20 13:11:09,448::logUtils::47::dispatcher::(wrapper) Run and protect: copyImage, Return response: None -------------------------------------------------------------------------- 3. the domain monitors are being stopped for the domains for some reason (need to see if that's related or not) -------------------------------------------------------------------------- Thread-833::DEBUG::2013-11-20 13:11:09,466::domainMonitor::169::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 8981cbaa-732a-43cd-b143-00c71734458b Thread-833::DEBUG::2013-11-20 13:11:09,467::clusterlock::86::SafeLease::(releaseHostId) Host id for domain 8981cbaa-732a-43cd-b143-00c71734458b released successfully (id: 1) 6828283e-958c-41a5-8dd0-b215bfab904d::DEBUG::2013-11-20 13:11:09,467::task::736::TaskManager.Task::(_save) Task=`6828283e-958c-41a5-8dd0-b215bfab904d`::_save: orig /rhev/dat a-center/1d695907-3924-4eb7-9484-812d82023213/mastersd/master/tasks/6828283e-958c-41a5-8dd0-b215bfab904d temp /rhev/data-center/1d695907-3924-4eb7-9484-812d82023213/mastersd /master/tasks/6828283e-958c-41a5-8dd0-b215bfab904d.temp Thread-834::DEBUG::2013-11-20 13:11:09,468::domainMonitor::169::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 7df91534-49bd-499a-bb59-9c26b87b7f2b Thread-834::INFO::2013-11-20 13:11:09,468::clusterlock::197::SANLock::(releaseHostId) Releasing host id for domain 7df91534-49bd-499a-bb59-9c26b87b7f2b (id: 1) 6828283e-958c-41a5-8dd0-b215bfab904d::DEBUG::2013-11-20 ) -------------------------------------------------------------------------- 4. the image is being copied: -------------------------------------------------------------------------- 6828283e-958c-41a5-8dd0-b215bfab904d::INFO::2013-11-20 13:11:09,510::image::645::Storage.Image::(copy) sdUUID=9502a003-6e7e-496e-97c0-225d45b476c1 vmUUID= srcImgUUID=cf92af3 b-317b-433a-a374-eef1c2ed4307 srcVolUUID=bba48cad-35d1-4f79-8e8b-5034ca6949ce dstImgUUID=06a2c384-a49c-4c67-9110-6df59ab3b658 dstVolUUID=897b2c2d-680d-4122-bc53-d40c84b28e18 dstSdUUID=9502a003-6e7e-496e-97c0-225d45b476c1 volType=6 volFormat=COW preallocate=SPARSE force=False postZero=False -------------------------------------------------------------------------- 5. we attempt to stop the domain monitoring thread of the domain the image is copied within and fail - -------------------------------------------------------------------------- Thread-838::DEBUG::2013-11-20 13:11:11,470::domainMonitor::169::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 9502a003-6e7e-496e-97c0-225d45b476c1 Thread-838::INFO::2013-11-20 13:11:11,470::clusterlock::197::SANLock::(releaseHostId) Releasing host id for domain 9502a003-6e7e-496e-97c0-225d45b476c1 (id: 1) Thread-838::DEBUG::2013-11-20 13:11:11,470::domainMonitor::178::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain 9502a003-6e7e-496e-97c0-225d45b476c1 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 175, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 461, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 204, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('9502a003-6e7e-496e-97c0-225d45b476c1', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) Thread-1503::DEBUG::2013-11-20 13:11:15,662::BindingXMLRPC::177::vds::(wrapper) client [10.35.19.10] -------------------------------------------------------------------------- 6. Later on GetStoragePoolInfo request will return without freespaceinfo - it can be compared with (1), diskFree key is missing from the reponse.. -------------------------------------------------------------------------- Run and protect: getStoragePoolInfo, Return response: {'info': {'spm_id': 1, 'master_uuid': '9502a003-6e7e-496e-97c0-225d45b476c1', 'name': 'DefaultRest', 'version': '3', 'domains': '8981cbaa-732a-43cd-b143-00c71734458b:Active,9502a003-6e7e-496e-97c0-225d45b476c1:Active,7df91534-49bd-499a-bb59-9c26b87b7f2b:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'NFS', 'master_ver': 1, 'lver': 3}, 'dominfo': {'8981cbaa-732a-43cd-b143-00c71734458b': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '9502a003-6e7e-496e-97c0-225d45b476c1': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '7df91534-49bd-499a-bb59-9c26b87b7f2b': {'status': 'Active', 'isoprefix': '', 'alerts': []}}} Thread-1504::DEBUG::2013-11-20 13:11:15,673::task::1168::TaskManager.Task::(prepare) Task=`5a807298-58d6-4cde-8e83-8a78d343df15`::finished: {'info': {'spm_id': 1, 'master_uuid': '9502a003-6e7e-496e-97c0-225d45b476c1', 'name': 'DefaultRest', 'version': '3', 'domains': '8981cbaa-732a-43cd-b143-00c71734458b:Active,9502a003-6e7e-496e-97c0-225d45b476c1:Active,7df91534-49bd-499a-bb59-9c26b87b7f2b:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'NFS', 'master_ver': 1, 'lver': 3}, 'dominfo': {'8981cbaa-732a-43cd-b143-00c71734458b': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '9502a003-6e7e-496e-97c0-225d45b476c1': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '7df91534-49bd-499a-bb59-9c26b87b7f2b': {'status': 'Active', 'isoprefix': '', 'alerts': []}}} Furthermore, we don't have repostats anymore - which will cause all the domains to go inactive. Thread-1557::INFO::2013-11-20 13:12:10,656::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-1557::INFO::2013-11-20 13:12:10,656::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {} Nir, can you take a look? it's possibly a regression/duplicate. Federico, do you have any idea why releaseHostId is failing here? 13:11:11,470::domainMonitor::178::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain 9502a003-6e7e-496e-97c0-225d45b476c1 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 175, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 461, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 204, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('9502a003-6e7e-496e-97c0-225d45b476c1', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) (In reply to Liron Aravot from comment #12) > > 3. the domain monitors are being stopped for the domains for some reason > (need to see if that's related or not) > I could not find why they were stopped - please apply http://gerrit.ovirt.org/21904 when you run the test again. The gerrit patch is not a bug fix but a debugging patch. After adding more logging we understand the problem. 1. Storage pool is disconnected Thread-762::INFO::2013-12-04 18:49:25,262::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID='7b7cf016-bb4c-44d4-aceb-7454d9680064', hostID=1, scsiKey='7b7cf016-bb4c-44d4-aceb-745 4d9680064', remove=False, options=None) 2. Domain monitors are stopped Thread-762::INFO::2013-12-04 18:49:25,265::sp::726::Storage.StoragePool::(disconnect) Disconnect from the storage pool 7b7cf016-bb4c-44d4-aceb-7454d9680064 Thread-762::INFO::2013-12-04 18:49:25,266::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring ea0101ca-8a0e-4ff9-9032-98caa3a02825 Thread-68::DEBUG::2013-12-04 18:49:25,267::domainMonitor::174::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for ea0101ca-8a0e-4ff9-9032-98caa3a02825 Thread-68::INFO::2013-12-04 18:49:25,267::clusterlock::197::SANLock::(releaseHostId) Releasing host id for domain ea0101ca-8a0e-4ff9-9032-98caa3a02825 (id: 1) Thread-68::DEBUG::2013-12-04 18:49:27,267::clusterlock::207::SANLock::(releaseHostId) Host id for domain ea0101ca-8a0e-4ff9-9032-98caa3a02825 released successfully (id: 1) Thread-762::INFO::2013-12-04 18:49:27,268::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 Thread-149::DEBUG::2013-12-04 18:49:27,269::domainMonitor::174::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 Thread-149::INFO::2013-12-04 18:49:27,269::clusterlock::197::SANLock::(releaseHostId) Releasing host id for domain 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 (id: 1) 3. Storage pool is connected again Thread-771::INFO::2013-12-04 18:51:21,136::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='7b7cf016-bb4c-44d4-aceb-7454d9680064', hostID=1, scsiKey='7b7cf016-bb4c-44d4-aceb-7454d9680064', msdUUID='ea0101ca-8a0e-4ff9-9032-98caa3a02825', masterVersion=1, options=None) 4. Storage pool starts monitoring domains Thread-771::INFO::2013-12-04 18:51:25,305::domainMonitor::100::Storage.DomainMonitor::(startMonitoring) Start monitoring ea0101ca-8a0e-4ff9-9032-98caa3a02825 Thread-775::DEBUG::2013-12-04 18:51:25,305::domainMonitor::164::Storage.DomainMonitorThread::(_monitorLoop) Starting domain monitor for ea0101ca-8a0e-4ff9-9032-98caa3a02825 Thread-771::DEBUG::2013-12-04 18:51:25,305::sp::1577::Storage.StoragePool::(updateMonitoringThreads) Storage Pool `7b7cf016-bb4c-44d4-aceb-7454d9680064` started monitoring domain `ea0101ca-8a0e-4ff9-9032-98caa3a02825` Thread-771::INFO::2013-12-04 18:51:25,306::domainMonitor::100::Storage.DomainMonitor::(startMonitoring) Start monitoring 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 Thread-771::DEBUG::2013-12-04 18:51:25,308::sp::1577::Storage.StoragePool::(updateMonitoringThreads) Storage Pool `7b7cf016-bb4c-44d4-aceb-7454d9680064` started monitoring domain `61ac6bae-79ba-4e79-a05a-dde38e0aa1c6` Thread-776::DEBUG::2013-12-04 18:51:25,308::domainMonitor::164::Storage.DomainMonitorThread::(_monitorLoop) Starting domain monitor for 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 Thread-771::INFO::2013-12-04 18:51:25,310::domainMonitor::100::Storage.DomainMonitor::(startMonitoring) Start monitoring 29f87f6b-e39b-4215-a66d-bdc4a8ec5afe Thread-771::DEBUG::2013-12-04 18:51:25,313::sp::1577::Storage.StoragePool::(updateMonitoringThreads) Storage Pool `7b7cf016-bb4c-44d4-aceb-7454d9680064` started monitoring domain `29f87f6b-e39b-4215-a66d-bdc4a8ec5afe` 5. Storage pool desctructor is invoked 3b41b49b-1ae4-4109-beef-97d700ea94f1::INFO::2013-12-04 19:03:10,363::sp::156::Storage.StoragePool::(__del__) Storage Pool 7b7cf016-bb4c-44d4-aceb-7454d9680064 starting cleanup thread 6. Storage pool cleanup tread running, stopping all domain monitors Thread-1426::INFO::2013-12-04 19:03:10,365::sp::150::Storage.StoragePool::(run) Storage Pool 7b7cf016-bb4c-44d4-aceb-7454d9680064 cleanup thread started 3b41b49b-1ae4-4109-beef-97d700ea94f1::DEBUG::2013-12-04 19:03:10,366::task::579::TaskManager.Task::(_updateState) Task=`3b41b49b-1ae4-4109-beef-97d700ea94f1`::moving from state queued -> state running Thread-1426::INFO::2013-12-04 19:03:10,366::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring ea0101ca-8a0e-4ff9-9032-98caa3a02825 Thread-775::DEBUG::2013-12-04 19:03:10,367::domainMonitor::174::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for ea0101ca-8a0e-4ff9-9032-98caa3a02825 Thread-775::INFO::2013-12-04 19:03:10,368::clusterlock::197::SANLock::(releaseHostId) Releasing host id for domain ea0101ca-8a0e-4ff9-9032-98caa3a02825 (id: 1) 3b41b49b-1ae4-4109-beef-97d700ea94f1::DEBUG::2013-12-04 19:03:10,369::task::736::TaskManager.Task::(_save) Task=`3b41b49b-1ae4-4109-beef-97d700ea94f1`::_save: orig /rhev/data-center/7b7cf016-bb4c-44d4-aceb-7454d9680064/mastersd/master/tasks/3b41b49b-1ae4-4109-beef-97d700ea94f1 temp /rhev/data-center/7b7cf016-bb4c-44d4-aceb-7454d9680064/mastersd/master/tasks/3b41b49b-1ae4-4109-beef-97d700ea94f1.temp Thread-775::DEBUG::2013-12-04 19:03:10,370::domainMonitor::183::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain ea0101ca-8a0e-4ff9-9032-98caa3a02825 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 180, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 461, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 204, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('ea0101ca-8a0e-4ff9-9032-98caa3a02825', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) Thread-1426::INFO::2013-12-04 19:03:10,372::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 Thread-776::DEBUG::2013-12-04 19:03:10,372::domainMonitor::174::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 Thread-776::INFO::2013-12-04 19:03:10,373::clusterlock::197::SANLock::(releaseHostId) Releasing host id for domain 61ac6bae-79ba-4e79-a05a-dde38e0aa1c6 (id: 1) 7. Since domain monitorign stopped, getStoragePoolInfo returns no diskfree info, leading to "No Space" error. Thread-1435::INFO::2013-12-04 19:03:19,962::logUtils::47::dispatcher::(wrapper) Run and protect: getStoragePoolInfo, Return response: {'info': {'spm_id': 1, 'master_uuid': 'ea0101ca-8a0e-4ff9-9032-98caa3a02825', 'name': 'DefaultRest', 'version': '3', 'domains': 'ea0101ca-8a0e-4ff9-9032-98caa3a02825:Active,61ac6bae-79ba-4e79-a05a-dde38e0aa1c6:Active,29f87f6b-e39b-4215-a66d-bdc4a8ec5afe:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'NFS', 'master_ver': 1, 'lver': 4}, 'dominfo': {'ea0101ca-8a0e-4ff9-9032-98caa3a02825': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '61ac6bae-79ba-4e79-a05a-dde38e0aa1c6': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '29f87f6b-e39b-4215-a66d-bdc4a8ec5afe': {'status': 'Active', 'isoprefix': '', 'alerts': []}}} Thread-1435::DEBUG::2013-12-04 19:03:19,962::task::1168::TaskManager.Task::(prepare) Task=`0c9391c3-fa14-4bbf-816b-4510a418cbd1`::finished: {'info': {'spm_id': 1, 'master_uuid': 'ea0101ca-8a0e-4ff9-9032-98caa3a02825', 'name': 'DefaultRest', 'version': '3', 'domains': 'ea0101ca-8a0e-4ff9-9032-98caa3a02825:Active,61ac6bae-79ba-4e79-a05a-dde38e0aa1c6:Active,29f87f6b-e39b-4215-a66d-bdc4a8ec5afe:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'NFS', 'master_ver': 1, 'lver': 4}, 'dominfo': {'ea0101ca-8a0e-4ff9-9032-98caa3a02825': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '61ac6bae-79ba-4e79-a05a-dde38e0aa1c6': {'status': 'Active', 'isoprefix': '', 'alerts': []}, '29f87f6b-e39b-4215-a66d-bdc4a8ec5afe': {'status': 'Active', 'isoprefix': '', 'alerts': []}}} Verified on is27 (vdsm-4.13.2-0.2.rc.el6ev.x86_64). Problem did not occur on 5 runs on jenkins qe / 20 runs on ci jenkins. *** Bug 1024269 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html *** Bug 1024018 has been marked as a duplicate of this bug. *** |