Bug 870024
Summary: | 3.1.z - vdsm: Unexpected exception when upgrading local/NFS domain from 3.0 to 3.1 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||||||||
Component: | vdsm | Assignee: | Haim <hateya> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Dafna Ron <dron> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 6.3 | CC: | abaron, achan, amureini, bazulay, cpelland, dpaikov, hateya, iheim, ilvovsky, lpeer, thildred, yeylon, ykaul | ||||||||||
Target Milestone: | rc | Keywords: | TestBlocker, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | storage | ||||||||||||
Fixed In Version: | vdsm-4.9.6-41.0 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
As part of the upgrade from Red Hat Enterprise Virtualization 3.0 to 3.1, storage domain metadata is also upgraded. Previously, storage lease metadata files were having incorrect permissions on them, which prevented file-based storage domains from being recovered after an upgrade
This fix modifies the permissions on the storage lease metadata files during the upgrade procedure. The local domain can now be upgraded without issue, and will come online after the upgrade.
|
Story Points: | --- | ||||||||||
Clone Of: | |||||||||||||
: | 883327 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2012-12-04 19:13:22 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 883327, 887900 | ||||||||||||
Attachments: |
|
I tested the upgrade with NFS type storage and I got the same issue: Thread-963::ERROR::2012-10-25 15:32:51,766::task::853::TaskManager.Task::(_setError) Task=`dbdf6dd4-2f42-4274-aba3-f31104e40548`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3068, in upgradeStoragePool pool._upgradePool(targetDomVersion) File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 413, in _upgradePool self._convertDomain(self.masterDomain, str(targetDomVersion)) File "/usr/share/vdsm/storage/sp.py", line 1031, in _convertDomain domain.getRealDomain(), isMsd, targetFormat) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 252, in convert converter(repoPath, hostId, imageRepo, isMsd) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 99, in v3DomainConverter newClusterLock.acquireHostId(hostId, async=False) File "/usr/share/vdsm/storage/safelease.py", line 175, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('63c20d84-f712-4f69-b44a-0c87a33ba2e5', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')) Thread-963::DEBUG::2012-10-25 15:32:51,769::task::872::TaskManager.Task::(_run) Task=`dbdf6dd4-2f42-4274-aba3-f31104e40548`::Task._run: dbdf6dd4-2f42-4274-aba3-f31104e40548 ('c 58fc8ee-0b57-46d9-8a7e-b7ce7416c56d', '3') {} failed - stopping task Thread-963::DEBUG::2012-10-25 15:32:51,770::task::1199::TaskManager.Task::(stop) Task=`dbdf6dd4-2f42-4274-aba3-f31104e40548`::stopping in state preparing (force False) Thread-963::DEBUG::2012-10-25 15:32:51,770::task::978::TaskManager.Task::(_decref) Task=`dbdf6dd4-2f42-4274-aba3-f31104e40548`::ref 1 aborting True Thread-963::INFO::2012-10-25 15:32:51,771::task::1157::TaskManager.Task::(prepare) Task=`dbdf6dd4-2f42-4274-aba3-f31104e40548`::aborting: Task is aborted: 'Cannot acquire host id' - code 661 Thread-963::DEBUG::2012-10-25 15:32:51,771::task::1162::TaskMan [root@gold-vdsc ~]# ls -lZd /rhev/ drwxr-xr-x. root root system_u:object_r:root_t:s0 /rhev/ [root@gold-vdsc ~]# ls -lZd /rhev/* drwxrwxr-x. vdsm kvm system_u:object_r:mnt_t:s0 /rhev/data-center [root@gold-vdsc ~]# Created attachment 633340 [details]
nfs logs
after this error in nfs pool cannot be recovered. Created attachment 633343 [details]
pool is not recovered logs
Once the pool is marked as 3.1 in the engine the upgrade process is started automatically when the spmStart is issued (expected domainVersion). If such process keeps failing (on the master domain) it could prevent you from re-activating the pool. Since the V3 upgrade is stuck on the acquireHostId command I fear that we might start experiencing a selinux issue on rhel6 too. Dafna could you try to temporarily disable selinux (setenforce 0) and activate the pool again. Thanks. (In reply to comment #7) > Once the pool is marked as 3.1 in the engine the upgrade process is started > automatically when the spmStart is issued (expected domainVersion). If such > process keeps failing (on the master domain) it could prevent you from > re-activating the pool. Since the V3 upgrade is stuck on the acquireHostId > command I fear that we might start experiencing a selinux issue on rhel6 > too. Dafna could you try to temporarily disable selinux (setenforce 0) and > activate the pool again. Thanks. Federico, would you please elaborate on the reason you need this scenario reproduced again ? to me, its obvious that its related to Selinux and disabling it will workaround the problem, furthermore, we have an AVC for wdmd service on audit.log. type=AVC msg=audit(1351160857.306:122085): avc: denied { read } for pid=12435 comm="setfiles" path=2F746D702F73682D7468642D31333531313833363630202864656C6574656429 dev=dm-0 ino=915187 scontext=unconfined_u:system_r:setfiles_t:s0 tcontext=unconfined_u:object_r:initrc_tmp_t:s0 tclass=file type=AVC msg=audit(1351160858.196:122086): avc: denied { write } for pid=12443 comm="shutdown" path="/tmp/vds_installer.805233.log" dev=dm-0 ino=915855 scontext=unconfined_u:unconfined_r:shutdown_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file type=AVC msg=audit(1351160858.196:122086): avc: denied { write } for pid=12443 comm="shutdown" path="/tmp/vds_bootstrap_complete.20121025_122705.log" dev=dm-0 ino=915858 scontext=unconfined_u:unconfined_r:shutdown_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file type=AVC msg=audit(1351160860.863:122103): avc: denied { search } for pid=6519 comm="wdmd" name="/" dev=tmpfs ino=5538 scontext=system_u:system_r:wdmd_t:s0 tcontext=system_u:object_r:tmpfs_t:s0 tclass=dir type=AVC msg=audit(1351160860.863:122103): avc: denied { write } for pid=6519 comm="wdmd" name="/" dev=tmpfs ino=5538 scontext=system_u:system_r:wdmd_t:s0 tcontext=system_u:object_r:tmpfs_t:s0 tclass=dir type=AVC msg=audit(1351160860.863:122103): avc: denied { remove_name } for pid=6519 comm="wdmd" name="wdmd" dev=tmpfs ino=18610 scontext=system_u:system_r:wdmd_t:s0 tcontext=system_u:object_r:tmpfs_t:s0 tclass=dir type=AVC msg=audit(1351160860.863:122103): avc: denied { unlink } for pid=6519 comm="wdmd" name="wdmd" dev=tmpfs ino=18610 scontext=system_u:system_r:wdmd_t:s0 tcontext=system_u:object_r:tmpfs_t:s0 tclass=file type=AVC msg=audit(1351171127.900:61645): avc: denied { getattr } for pid=3534 comm="qemu-kvm" name="/" dev=0:20 ino=2 scontext=system_u:system_r:svirt_t:s0:c600,c657 tcontext=system_u:object_r:nfs_t:s0 tclass=filesystem type=AVC msg=audit(1351171128.967:61651): avc: denied { getattr } for pid=3534 comm="qemu-kvm" name="/" dev=0:20 ino=2 scontext=system_u:system_r:svirt_t:s0:c600,c657 tcontext=system_u:object_r:nfs_t:s0 tclass=filesystem (In reply to comment #8) > (In reply to comment #7) > > Once the pool is marked as 3.1 in the engine the upgrade process is started > > automatically when the spmStart is issued (expected domainVersion). If such > > process keeps failing (on the master domain) it could prevent you from > > re-activating the pool. Since the V3 upgrade is stuck on the acquireHostId > > command I fear that we might start experiencing a selinux issue on rhel6 > > too. Dafna could you try to temporarily disable selinux (setenforce 0) and > > activate the pool again. Thanks. > > Federico, would you please elaborate on the reason you need this scenario > reproduced again ? to me, its obvious that its related to Selinux and > disabling it will workaround the problem, furthermore, we have an AVC for > wdmd service on audit.log. It is not obvious, in fact the avc warnings that you reported are related to the bootstrap and wdmd (and not to sanlock). Anyway it is not related to selinux at all. I logged to the host and I did some tests. Probably one of my (bugzilla) comments got lost (or not submitted) but what I saw was that the metadata files (ids/leases) had permissions 600 (vdsm:kvm) and sanlock wasn't able to access the files. What I also wrote (and lost) was to change such permissions to 660 and activate the pool again. Now we need to understand if we need to make this automatic (are those the standard permissions in 3.0?). Just to be completely clear - does this issue fix itslef in VDSM if I chmod to 660, or do we corrupt the dc/sd? (In reply to comment #10) > Just to be completely clear - does this issue fix itslef in VDSM if I chmod > to 660, or do we corrupt the dc/sd? Looking at the exceptions and at the code it should fix itself but I wanted QA to verify it. (In reply to comment #11) > (In reply to comment #10) > > Just to be completely clear - does this issue fix itslef in VDSM if I chmod > > to 660, or do we corrupt the dc/sd? > > Looking at the exceptions and at the code it should fix itself but I wanted > QA to verify it. check what ? do you have a fix for us ? (In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #7) > > > Once the pool is marked as 3.1 in the engine the upgrade process is started > > > automatically when the spmStart is issued (expected domainVersion). If such > > > process keeps failing (on the master domain) it could prevent you from > > > re-activating the pool. Since the V3 upgrade is stuck on the acquireHostId > > > command I fear that we might start experiencing a selinux issue on rhel6 > > > too. Dafna could you try to temporarily disable selinux (setenforce 0) and > > > activate the pool again. Thanks. > > > > Federico, would you please elaborate on the reason you need this scenario > > reproduced again ? to me, its obvious that its related to Selinux and > > disabling it will workaround the problem, furthermore, we have an AVC for > > wdmd service on audit.log. > > It is not obvious, in fact the avc warnings that you reported are related to > the bootstrap and wdmd (and not to sanlock). > Anyway it is not related to selinux at all. I logged to the host and I did > some tests. Probably one of my (bugzilla) comments got lost (or not > submitted) but what I saw was that the metadata files (ids/leases) had > permissions 600 (vdsm:kvm) and sanlock wasn't able to access the files. What > I also wrote (and lost) was to change such permissions to 660 and activate > the pool again. Now we need to understand if we need to make this automatic > (are those the standard permissions in 3.0?). It's impossible to create domains under 3.0 in directories with permissions of 600 OR of 660. It fails like this: Thread-6655::ERROR::2012-11-01 15:31:01,794::storage_connection::302::Storage.ServerConnection::(__validateLocalConnection) Error during storage connection validation: Pe rmission settings on the specified path do not allow access to the storage. Verify permission settings on the specified storage path.: 'path = /blah' Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_connection.py", line 296, in __validateLocalConnection validateDirAccess(con['rp']) File "/usr/share/vdsm/storage/storage_connection.py", line 27, in validateDirAccess getProcPool().fileUtils.validateAccess(dirPath) File "/usr/share/vdsm/storage/processPool.py", line 33, in wrapper return self.runExternally(func, *args, **kwds) File "/usr/share/vdsm/storage/processPool.py", line 44, in runExternally return self._procPool.runExternally(*args, **kwargs) File "/usr/share/vdsm/storage/processPool.py", line 127, in runExternally raise err StorageServerAccessPermissionError: Permission settings on the specified path do not allow access to the storage. Verify permission settings on the specified storage path .: 'path = /blah' Checked on 4.9.6-42. moving to re-open - issue was found on vdsm 4.9.6-44 trying to upgrade storage pool from type NFS. we hit the following issue: Thread-499::ERROR::2012-11-27 15:30:10,778::task::853::TaskManager.Task::(_setError) Task=`2411593b-52a1-4a8c-82ea-33455722b390`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3177, in upgradeStoragePool pool._upgradePool(targetDomVersion) File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool self._convertDomain(self.masterDomain, str(targetDomVersion)) File "/usr/share/vdsm/storage/sp.py", line 1032, in _convertDomain domain.getRealDomain(), isMsd, targetFormat) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 282, in convert converter(repoPath, hostId, imageRepo, isMsd) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 99, in v3DomainConverter newClusterLock.acquireHostId(hostId, async=False) File "/usr/share/vdsm/storage/safelease.py", line 175, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('6713fb2c-38d8-4468-8b34-293d76d034ef', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')) I have added the following printouts in vdsm code: if domain.getStorageType() in sd.FILE_DOMAIN_TYPES: log.debug('### storageType=%s', domain.getStorageType() ) domain.setMetadataPermissions() and it appears that we are not entering the if. Created attachment 652770 [details]
vdsm log
proposed upstream: http://gerrit.ovirt.org/#/c/9517/1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html |
Created attachment 633292 [details] logs Description of problem: I upgraded from 3.0 setup to 3.1 (si22.1) and tried to upgrade my local domain. UI shows unexpected exception when we try to upgrade DC. Backend: 012-10-25 13:53:56,990 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-/0.0.0.0:8443-5) [399bd68b] Weird return value: Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 661 mMessage Cannot acquire host id: ('508b3e1a-7533-4bec-b27b-8196dc5be596', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')) vdsm: Thread-121::ERROR::2012-10-25 13:51:39,415::task::853::TaskManager.Task::(_setError) Task=`e480d5a0-3ca0-48d8-9234-f8a3cbd8e051`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3068, in upgradeStoragePool pool._upgradePool(targetDomVersion) File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 413, in _upgradePool self._convertDomain(self.masterDomain, str(targetDomVersion)) File "/usr/share/vdsm/storage/sp.py", line 1031, in _convertDomain domain.getRealDomain(), isMsd, targetFormat) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 252, in convert converter(repoPath, hostId, imageRepo, isMsd) File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 99, in v3DomainConverter newClusterLock.acquireHostId(hostId, async=False) File "/usr/share/vdsm/storage/safelease.py", line 175, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('508b3e1a-7533-4bec-b27b-8196dc5be596', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')) sanlock: 2012-10-25 13:51:38+0200 4914 [1929]: s1 lockspace 508b3e1a-7533-4bec-b27b-8196dc5be596:1:/rhev/data-center/mnt/_data_local/508b3e1a-7533-4bec-b27b-8196dc5be596/dom_md/ids:0 2012-10-25 13:51:38+0200 4914 [9395]: open error -13 /rhev/data-center/mnt/_data_local/508b3e1a-7533-4bec-b27b-8196dc5be596/dom_md/ids 2012-10-25 13:51:38+0200 4914 [9395]: s1 open_disk /rhev/data-center/mnt/_data_local/508b3e1a-7533-4bec-b27b-8196dc5be596/dom_md/ids error -13 2012-10-25 13:51:39+0200 4915 [1929]: s1 add_lockspace fail result -19 Version-Release number of selected component (if applicable): 3.0 to 22.1 vdsm-4.9.6-39.0.el6_3.x86_64 sanlock-2.3-4.el6_3.x86_64 How reproducible: 100% Steps to Reproduce: 1. create a local domain with 1 vm on 3.0 rhevm 2. upgrade rhevm to 3.1 3. yum update vdsm on host -> restart vdsm service (my vm was up) -> try to upgrade DC level to 3.1 Actual results: we get unexpected exception we also cannot remove the DC once this happens Expected results: we should succeed to upgrade or be blocked before operation starts Additional info: full logs (including audit log)