Bug 960952
Summary: | cannot run VM when vdsm not connected to pool ("Image path does not exist or cannot be accessed/created") | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> | ||||
Status: | CLOSED ERRATA | QA Contact: | Leonid Natapov <lnatapov> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.2.0 | CC: | abaron, acanan, acathrow, amureini, bazulay, cboyle, eedri, ewarszaw, hchiramm, iheim, jkt, jturner, lpeer, mkalinin, pablo.iranzo, scohen, yeylon | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | 3.3.0 | Flags: | scohen:
Triaged+
|
||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | is14 | Doc Type: | Bug Fix | ||||
Doc Text: |
When the connectivity to the storage was restored and the host was recovered, no virtual machine could be run. All of them failed to run with the error "Image path does not exist or cannot be accessed/created". This was caused by the run VM command being issued before the connect storage pool was completed. This was fixed by allowing the virtual machine to run without being connected to the pool.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-01-21 16:06:59 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 961038 | ||||||
Attachments: |
|
This is not a caching issue. connectStoragePool issued before connectStorageServer and therefore the pool is not connected then refreshStoragePool fails and prepare image fails too. Thread-1127::INFO::2013-05-08 13:51:07,311::logUtils::42::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '900d1c89-04ea-4c30-820e-ca3a86efb4e2'}]} Thread-1128::INFO::2013-05-08 13:51:07,349::logUtils::40::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', hostID=2, scsiKey='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', msdUUID='81ef11d0-4c0c-47b4-8953-d61a6af442d8', masterVersion=410, options=None) Thread-1140::INFO::2013-05-08 13:53:32,870::logUtils::40::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', conList=[{'connection': '10.35.64.10', 'iqn': 'Dafna-32-02', 'portal': '1', 'user': '', 'password': '******', 'id': '7b34aee1-1f7f-4932-949c-247a406a0a02', 'port': '3260'}], options=None) Thread-1140::INFO::2013-05-08 13:53:32,950::logUtils::42::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '7b34aee1-1f7f-4932-949c-247a406a0a02'}]} Thread-1141::INFO::2013-05-08 13:53:33,023::logUtils::40::dispatcher::(wrapper) Run and protect: refreshStoragePool(spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', msdUUID='38755249-4bb3-4841-bf5b-05f4a521514d', masterVersion=411, options=None) Thread-1169::INFO::2013-05-08 13:54:29,940::logUtils::40::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='d326916a-89b6-41c8-9c14-4b9cf8a1c979', spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', imgUUID='7770aad5-3488-4cf5-b8ca-9451b5b12767', volUUID='163ee5e3-5dae-4312-9157-0125f6f1074d', options=None) Thread-1169::INFO::2013-05-08 13:54:36,821::logUtils::42::dispatcher::(wrapper) Run and protect: getVolumeSize, Return response: {'truesize': '1073741824', 'apparentsize': '1073741824'} Thread-1169::INFO::2013-05-08 13:54:36,858::logUtils::40::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='d326916a-89b6-41c8-9c14-4b9cf8a1c979', spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', imgUUID='7770aad5-3488-4cf5-b8ca-9451b5b12767', volUUID='163ee5e3-5dae-4312-9157-0125f6f1074d') Thread-1169::ERROR::2013-05-08 13:54:37,321::blockVolume::403::Storage.Volume::(validateImagePath) Unexpected error Thread-1169::ERROR::2013-05-08 13:54:37,324::task::850::TaskManager.Task::(_setError) Task=`5f2d2e48-e79b-4aa4-98bc-8f00d4667e26`::Unexpected error Thread-1169::ERROR::2013-05-08 13:54:37,409::dispatcher::66::Storage.Dispatcher.Protect::(run) {'status': {'message': "Image path does not exist or cannot be accessed/created: ('/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/d326916a-89b6-41c8-9c14-4b9cf8a1c979/images/7770aad5-3488-4cf5-b8ca-9451b5b12767',)", 'code': 254}} Thread-1169::ERROR::2013-05-08 13:54:37,417::vm::700::vm.Vm::(_startUnderlyingVm) vmId=`205497a5-33cc-4f4a-b46e-7cd831615f9b`::The vm start process failed We can see that it is a transient state, after a few more attempts the host is connected to the pool, without intervention, and the image prepared: Thread-1227::INFO::2013-05-08 13:56:07,149::logUtils::40::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', hostID=2, scsi Key='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', msdUUID='38755249-4bb3-4841-bf5b-05f4a521514d', masterVersion=411, options=None) Thread-1227::ERROR::2013-05-08 13:56:14,345::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 38755249-4bb3-4841-bf5b-05f4a521514d Thread-1227::ERROR::2013-05-08 13:56:14,345::sdc::153::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 38755249-4bb3-4841-bf5b-05f4a521514d Thread-1227::DEBUG::2013-05-08 13:56:14,346::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-1227::DEBUG::2013-05-08 13:56:14,350::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\ "] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%1Dafna-32-011366808|1Dafna-32-021366808|1Dafna-32-031366808%\', \'r%.*%\' ] } globa l { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,at tr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 38755249-4bb3-4841-bf5b-05f4a521514d' (cwd None) Thread-1227::DEBUG::2013-05-08 13:56:14,674::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-1227::DEBUG::2013-05-08 13:56:15,408::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['LOCKPOLICY=', 'VERSION=3', 'TYPE=ISCSI', 'DESCRIPTION=Dafna-32-03', 'LOGBLKSIZE=512', 'VGUUID=4Ceo5k-vMud-sVKL-blqX-JwKo-ETN6-8ao4L9', 'LEASERETRIES=3', 'LEASETIMESEC=5', 'LOCKRENEWALINTERVALSEC=5', u'PV0=pv:1Dafna-32-031366808,uuid:0AYd8Y-R9mE-rLT6-QTOe-q3zo-OqCi-Edx4Uu,pestart:0,pecount:797,mapoffset:0', 'PHYBLKSIZE=512', 'SDUUID=38755249-4bb3-4841-bf5b-05f4a521514d', 'IOOPTIMEOUTSEC=1', 'CLASS=Data', 'POOL_UUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb', 'ROLE=Master', 'POOL_DESCRIPTION=iSCSI', 'MASTER_VERSION=411', u'POOL_DOMAINS=d326916a-89b6-41c8-9c14-4b9cf8a1c979:Active,81ef11d0-4c0c-47b4-8953-d61a6af442d8:Active,38755249-4bb3-4841-bf5b-05f4a521514d:Active,72ec1321-a114-451f-bee1-6790cbca1bc6:Active', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=0', '_SHA_CKSUM=16f8ee2cdf42df2d4a605f0b5d2b8dd6eb6ab322'] Thread-1227::DEBUG::2013-05-08 13:56:15,410::sp::1577::Storage.StoragePool::(updateMonitoringThreads) Storage Pool `7fd33b43-a9f4-4eb7-a885-e9583a929ceb` started monitoring domain `d326916a-89b6-41c8-9c14-4b9cf8a1c979` Thread-1227::INFO::2013-05-08 13:56:16,870::logUtils::42::dispatcher::(wrapper) Run and protect: connectStoragePool, Return response: True Thread-1252::INFO::2013-05-08 13:56:43,986::logUtils::40::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='d326916a-89b6-41c8-9c14-4b9cf8a1c979', spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', imgUUID='a49d10a6-38f3-40b5-aea3-5628357b1f2e', volUUID='0dec5c38-e585-4565-bde4-37842a741595') Thread-1252::INFO::2013-05-08 13:56:44,413::logUtils::42::dispatcher::(wrapper) Run and protect: prepareImage, Return response: {'path': '/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/d326916a-89b6-41c8-9c14-4b9cf8a1c979/images/a49d10a6-38f3-40b5-aea3-5628357b1f2e/0dec5c38-e585-4565-bde4-37842a741595', 'chain': [{'path': '/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/d326916a-89b6-41c8-9c14-4b9cf8a1c979/images/a49d10a6-38f3-40b5-aea3-5628357b1f2e/e886fb6e-f91f-46ae-a488-0f24e47f5731', 'domainID': 'd326916a-89b6-41c8-9c14-4b9cf8a1c979', 'volumeID': 'e886fb6e-f91f-46ae-a488-0f24e47f5731', 'imageID': 'a49d10a6-38f3-40b5-aea3-5628357b1f2e'}, {'path': '/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/d326916a-89b6-41c8-9c14-4b9cf8a1c979/images/a49d10a6-38f3-40b5-aea3-5628357b1f2e/0dec5c38-e585-4565-bde4-37842a741595', 'domainID': 'd326916a-89b6-41c8-9c14-4b9cf8a1c979', 'volumeID': '0dec5c38-e585-4565-bde4-37842a741595', 'imageID': 'a49d10a6-38f3-40b5-aea3-5628357b1f2e'}]} The VM was started before the pool connectivity was restored. In this case vdsm fails to prepareImage's (known, i.e. not a regression) and will be fixed in a future version. After the connectivity was restored the images were prepared. reported by customer, need to understand root cause, if vdsmd restart solves the problem, it means something is broken in the links. suggest to reconsider solving issue. *** Bug 967676 has been marked as a duplicate of this bug. *** (In reply to Haim from comment #7) > reported by customer, need to understand root cause, if vdsmd restart solves > the problem, it means something is broken in the links. > suggest to reconsider solving issue. As stated before, the run VM command was issued before the connect storage pool was completed. Since the image in the actual code is reached using the pool, the prepare image fails. The restart is unrelated, only that the pool is reconnected afterwards. vdsm-4.13.0-0.8.beta1.el6ev.x86_64. VMs were able to start after blocking connectivity from host to storage for ~30 min. This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html |
Created attachment 745209 [details] logs Description of problem: I blocked connectivity to the storage on my hsm host and left it for 30 minutes. after 30 minutes I restored the connectivity to the storage from the host, autorecovery restored the host and than I tried running vm's on it. all vm's failed to run with error "Image path does not exist or cannot be accessed/created", the host changes status to Error since it failed to run the vms. after I activated the host back I was able to run the vms on it. This issue also happened before when I manually removed orphan lvs from the one host - vm's that I ran in the second host failed to run with same error in one host but were able to run in the other until I ran lvs on the troubled host. since both hosts are using the same storage we can only assume that this is a cache issue in vdsm Version-Release number of selected component (if applicable): sf15 vdsm-4.10.2-17.0.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. in iscsi storage on 2 hosts cluster, create 3 domains 2. block connectivity to the storage from the hsm host and leave it for 30 minutes 3. remove the block from the host and wait for autoRecovery to activate it 4. run vm's on the hsm host Actual results: vm's fail to run with "Image path does not exist or cannot be accessed/created" and are re-run on the second host. since the storage was availble from both hosts and I was able to manually migrate the vms to the host after I changed its status to UP from Error, we can assume that this is a cache issue. Expected results: we should be able to run teh vms once the connectivity to the storage is restored and the host is recovered. Additional info: full logs Thread-1169::DEBUG::2013-05-08 13:54:37,413::vm::676::vm.Vm::(_startUnderlyingVm) vmId=`205497a5-33cc-4f4a-b46e-7cd831615f9b`::_ongoingCreations released Thread-1169::ERROR::2013-05-08 13:54:37,417::vm::700::vm.Vm::(_startUnderlyingVm) vmId=`205497a5-33cc-4f4a-b46e-7cd831615f9b`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 662, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 1447, in _run self.preparePaths(devices[vm.DISK_DEVICES]) File "/usr/share/vdsm/vm.py", line 721, in preparePaths drive['path'] = self.cif.prepareVolumePath(drive, self.id) File "/usr/share/vdsm/clientIF.py", line 275, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {'index': 0, 'iface': 'ide', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'poolID': '7fd33b43-a9f4-4eb7-a885-e9583a929ceb', 'volumeID': '163ee5e3-5dae-4312-9157-0125f6f1074d', 'apparentsize': '1073741824', 'imageID': '7770aad5-3488-4cf5-b8ca-9451b5b12767', 'specParams': {}, 'readonly': 'false', 'domainID': 'd326916a-89b6-41c8-9c14-4b9cf8a1c979', 'optional': 'false', 'deviceId': '7770aad5-3488-4cf5-b8ca-9451b5b12767', 'truesi ze': '1073741824', 'address': {' controller': '0', ' target': '0', 'unit': '0', ' bus': '0', ' type': 'drive'}, 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}