Description of problem: Live merge fails on vdsm with NotConnectedError. Version-Release number of selected component (if applicable): hosted engine How reproducible: Not always Steps to Reproduce: 1. Create a VM from template as thin with disk resides on iSCSI 2. Create and attach a preallocated disk to the VM 3. Start the VM 4. Create a snapshot to the VM 5. Delete the snapshot Actual results: Live merge fails with: 2018-05-05 07:53:44,889+0300 INFO (jsonrpc/7) [api.virt] START merge(drive={'imageID': '8f5ca85f-3432-43c7-9562-ad7a1eddfff1', 'volumeID': '99b0352b-9b27-470d-989f-a7d31a75dee4', 'domainID': 'e8a8e491-2487-46fb -afe9-00781198e966', 'poolID': 'f23d137e-ae43-4b1b-8a8b-f6aed029365f'}, baseVolUUID='ff21f234-72ac-49c5-a5fe-9a9965e0cb82', topVolUUID='99b0352b-9b27-470d-989f-a7d31a75dee4', bandwidth='0', jobUUID='18eaa245-0af a-48f5-bef0-3ad8be90f50a') from=::ffff:10.46.16.85,46024, flow_id=snapshots_delete_c9564133-b9b8-40dc, vmId=dc7149c8-abf9-44cc-bb7e-cf806715f201 (api:46) 2018-05-05 07:53:44,889+0300 ERROR (jsonrpc/7) [api] FINISH merge error=VM 'dc7149c8-abf9-44cc-bb7e-cf806715f201' was not defined yet or was undefined (api:132) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 736, in merge drive, baseVolUUID, topVolUUID, bandwidth, jobUUID) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5837, in merge chains = self._driveGetActualVolumeChain([drive]) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5976, in _driveGetActualVolumeChain self._updateDomainDescriptor() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5123, in _updateDomainDescriptor domainXML = self._dom.XMLDesc(0) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__ % self.vmid) NotConnectedError: VM 'dc7149c8-abf9-44cc-bb7e-cf806715f201' was not defined yet or was undefined Expected results: Live merge should succeed Additional info: engine.log: 2018-05-05 07:53:44,892+03 INFO [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [snapshots_delete_c9564133-b9b8-40dc] Running command: MergeCommand internal: true. Entities affected : ID: e8a8e491-2487-46fb-afe9-00781198e966 Type: Storage 2018-05-05 07:53:44,893+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [snapshots_delete_c9564133-b9b8-40dc] START, MergeVDSCommand(Hos tName = host_mixed_3, MergeVDSCommandParameters:{hostId='ef0a4bf8-9ca9-4cc8-9611-585296de6e47', vmId='dc7149c8-abf9-44cc-bb7e-cf806715f201', storagePoolId='f23d137e-ae43-4b1b-8a8b-f6aed029365f', storageDomainId= 'e8a8e491-2487-46fb-afe9-00781198e966', imageGroupId='63f78780-2063-448c-8b91-a0f6e74f725f', imageId='ede71ac0-2d4e-4fb8-be4b-5b083e88a19b', baseImageId='57a16a9b-5747-40b6-b62a-a177e2006c5e', topImageId='ede71a c0-2d4e-4fb8-be4b-5b083e88a19b', bandwidth='0'}), log id: 3a7f192 2018-05-05 07:53:44,890+03 ERROR [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-7) [snapshots_delete_c9564133-b9b8-40dc] Engine exception thrown while sending merge c ommand: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Gene ral Exception: ("VM 'dc7149c8-abf9-44cc-bb7e-cf806715f201' was not defined yet or was undefined",), code = 100 (Failed with error GeneralException and code 100) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2045) [bll.jar:] at org.ovirt.engine.core.bll.MergeCommand.executeCommand(MergeCommand.java:45) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1133) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1285) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1934) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:137) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:105) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1345) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:400) [bll.jar:]
Version-Release number of selected component (if applicable): vdsm-4.20.27.1-1.el7ev.x86_64 libvirt-3.9.0-14.el7_5.3.x86_64 qemu-img-rhev-2.10.0-21.el7_5.2.x86_64 sanlock-3.6.0-1.el7.x86_64 Red Hat Enterprise Linux Server 7.5 kernel 3.10.0-862.el7.x86_64 ovirt-engine-4.2.3.4-0.1.el7.noarch
I couldn't reproduce the issue on a "regular", i.e. not-hosted engine environment. I am not sure whether the is related to a hosted engine environment. Would appreciate if you can try the same scenario on a regular environment and update us with the results. Could you please upload the logs?
Couldn't reproduce on a non hosted engine environment so far.
Thanks Elad. Can you please upload the logs?
Created attachment 1432595 [details] logs
It seems that VM was stopped after live merge started, hence the VM wasn't found. From the logs: 1. Live merge started at 2018-05-05 07:53:42,366+03: 2018-05-05 07:53:42,366+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-8) [snapshots_delete_c9564133-b9b8-40dc] Lock Acquired to object 'EngineLock:{exclusiveLocks='[dc7149c8- abf9-44cc-bb7e-cf806715f201=VM]', sharedLocks=''}' 2. The VM was stopped at 2018-05-05 07:53:42,941+03: 2018-05-05 07:53:42,941+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (default task-9) [vms_syncAction_7b4e35b4-08c8-4189] Running command: StopVmCommand internal: false. Entities affected : ID: dc7149c8-abf9-44cc-bb7e-cf806715f201 Type: VMAction group STOP_VM with role type USER Do you remember shutting down the VM? Or was the VM shutdown as a side effect of something else?
This occurred as part of an automated test. The test doesn't shut down the VM intentionally so I guess the shut down is a side effect. The test includes live merge while there is not enough free space on the storage domain for it to be done.
In such cases, there are two options: 1. Perform cold merge 2. Start the VM again and retry live merge I think there is no bug here and suggest to close the bug. What do you think Elad?
Agreed with Elad that this isn't a bug.