We're trying to delete an auto-generated live snapshot that has been created after migrating an online VM's storage to a different domain. oVirt version is 3.6.6 and VDSM version is 4.17.28. Most relevant log lines are: 2016-06-24 07:50:36,252 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-10) [799a22e3] Failed in 'MergeVDS' method 2016-06-24 07:50:36,256 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-7-thread-10) [799a22e3] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host2.domain.com command failed: Merge failed 2016-06-24 07:50:36,256 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-10) [799a22e3] Command 'MergeVDSCommand(HostName = host2.domain.com, MergeVDSCommandParameters:{runAsync='true', hostId='c31dca1a-e5bc-43f6-940f-6397e3ddbee4', vmId='7083832a-a1a2-42b7-961f-2e9c0dcd7e18', storagePoolId='fa155d43-4e68-486f-9f9d-ae3e3916cc4f', storageDomainId='9339780c-3667-4fef-aa13-9bec08957c5f', imageGroupId='65a0b0d4-5c96-4dd9-a31b-4d08e40a46a5', imageId='9eec9e8f-38db-4abf-b1c4-92fa9383f8b1', baseImageId='568b2f77-0ddf-4349-a45c-36fcb0edecb6', topImageId='9eec9e8f-38db-4abf-b1c4-92fa9383f8b1', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 2016-06-24 07:50:36,256 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-10) [799a22e3] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52) Also, relevant seems this part: jsonrpc.Executor/3::ERROR::2016-06-24 07:50:36,216::vm::4955::virt.vm::(merge) vmId=`7083832a-a1a2-42b7-961f-2e9c0dcd7e18`::Live merge failed (job: 3ea68e36-6d99-4af9-a54e-4c5b06df6a0f) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 4951, in merge flags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 668, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirtError: block copy still active: disk 'vda' already in active block job I'm attaching relevant logs (both for ovirt-engine and SPM's vdsm). As per Nir Soffer in the 'users' list: --- This means there was a previous attempt to merge, and the block job did not complete yet. Engine should detect that there is an active block job and wait until it completes. This is probably engine bug, please file a bug and attach complete vdsm and engine logs. --- If you need more info or the full log don't hesitate to ask.
Created attachment 1171987 [details] Engine + SPM VDSM logs
Nir, is that infra or storage?
dup of Bug 1344479 - Live Merge completes but the 'job' table entry in the database is not marked as finished ?
(In reply to Moran Goldboim from comment #3) > dup of Bug 1344479 - Live Merge completes but the 'job' table entry in the > database is not marked as finished ? No.
This BZ is related to the work done in live merge recovery area in 3.6.6: enable the user to retry live merge after failures. see BZ 1323629 for documentation. I am moving this BZ to ON_QA in order to QA to keep me honest here. Would like to verify the following: 1. prepare a long live merge, i.e. copy large amount of data 2. create a snapshot 3. delete that snapshot (while the VM is up) 4. stop vdsm while merge is running 5. wait for merge to complete (with error) on the engine 6. start vdsm 7. retry merge 8. assuming job is still running, merge will fail 9. wait for job to complete 10. retry merge, merge should succeed
Hi Ala, I followed the steps you provided and got ERROR in the engine: 2016-06-27 11:43:31,057 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] Failed in 'MergeVDS' method 2016-06-27 11:43:31,066 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-2) [5c46b46f] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command failed: Drive image file could not be found 2016-06-27 11:43:31,066 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=13, message=Drive image file could not be found]]' 2016-06-27 11:43:31,067 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] HostName = host_mixed_3 2016-06-27 11:43:31,068 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='true', hostId='f5b87851-6c15-4552-b349-cc4c297bc30b', vmId='8ee7b94f-b1a3-4f54-9794-37e8259371fe', storagePoolId='7a2d027d-f58b-4645-8339-99c324fcfcd6', storageDomainId='a3f7bd63-2ffd-40e5-be72-d44ea1534cb1', imageGroupId='4b97a780-5529-4187-aaa7-564b14c00cb6', imageId='0cb90c2e-3a2f-4000-9432-d82a3aab97cb', baseImageId='8c63eae4-43c5-4cb9-9242-cd6634571c6c', topImageId='0cb90c2e-3a2f-4000-9432-d82a3aab97cb', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 2016-06-27 11:43:31,070 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [5c46b46f] FINISH, MergeVDSCommand, log id: 450cd3bb 2016-06-27 11:43:31,071 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-2) [5c46b46f] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 (Failed with error imageErr and code 13) And from vdsm.log: jsonrpc.Executor/5::ERROR::2016-06-27 11:45:34,199::task::866::Storage.TaskManager.Task::(_setError) Task=`f445fe59-5 476-48db-87e8-130497810da9`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3162, in getVolumeInfo volUUID=volUUID).getInfo() File "/usr/share/vdsm/storage/sd.py", line 457, in produceVolume volUUID) File "/usr/share/vdsm/storage/blockVolume.py", line 80, in __init__ volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID) File "/usr/share/vdsm/storage/volume.py", line 187, in __init__ self.validate() File "/usr/share/vdsm/storage/blockVolume.py", line 88, in validate raise se.VolumeDoesNotExist(self.volUUID) VolumeDoesNotExist: Volume does not exist: ('0cb90c2e-3a2f-4000-9432-d82a3aab97cb',) new logs attached
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1172802 [details] new logs
Hi Raz, This exception is expected. This is what happened: We started a merge (of Snapshot 'snap' deletion for VM 'test_vm') that failed. Then, we retried merge but meanwhile, merge job *completed* (at storage/vdsm side) and we got that imageErr (indicating base image doesn't exists). The engine in 3.6.6, handles that error by continuing the flow and as can be seen in the log, the snapshot removed: 2016-06-27 11:45:49,192 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-1) [41b38968] Correlation ID: 41b38968, Job ID: ca381f80-90f9-414f-9701-843c31e5bb1d, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'snap' deletion for VM 'test_vm' has been completed. Can you please confirm that snapshot indeed removed after retrying live merge?
Yes, the snapshot removed eventually
According to comment #9 moving to VERIFIED