Hide Forgot
Created attachment 1227035 [details] spm and engine logs Description of problem: When trying to perform a live merge (VM running on either SPM or HSM), the operation fails: 2016-12-01 23:28:24,616+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-6) [68218035] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='true', hostId='88d0d698-e962-4d4c-b333-3667a678c580', vmId='ea659a41-088f-4521-a09d-abe4a9802f73', storagePoolId='5ef2e0f0-1bba-45b0-ab2f-6c51ba0692f9', storageDomainId='e7826af8-fe1c-44af-8cef-7e7c7af67d5e', imageGroupId='30ee327a-e5e7-44be-b9aa-a0ee11916eab', imageId='bbb0f647-ebc0-4a2c-9b4e-340a799322e0', baseImageId='8672013b-a877-43b0-9d95-9379b53ae1dd', topImageId='bbb0f647-ebc0-4a2c-9b4e-340a799322e0', bandwidth='0'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues Following with: 2016-12-01 23:28:24,616+02 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-48) [68218035] Host 'host_mixed_3' is not responding. 2016-12-01 23:28:24,616+02 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-6) [68218035] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022) There is no error in vdsm and it never become not responding. Version-Release number of selected component (if applicable): ovirt-engine-4.1.0-0.0.master.20161126211319.gitae69c34.el7.centos.noarch vdsm-4.18.999-1020.git1ff41b1.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start a VM with existing snapshot 2. Remove the snapshot 3. Actual results: Explained above Expected results: the live merge flow should finish successfully Additional info:
Created attachment 1227036 [details] hsm and engine logs
Tentatively targetting to 4.1. Raz - does this reproduce in 4.0.z too?
Allon, In 4.0.z we have different bug, bug #1400137. I checked that the results are not the same before open this bug to 4.1
Reproduced by Ala and it is a duplicate of bug 1400137 *** This bug has been marked as a duplicate of bug 1400137 ***
Correction: while the patch attached fixes a part of bug 1400137 it is not a duplicate since bug 1400137 was affected by another bug in zstream. Reopening this bug to track the issue
This bug was caused by internal refactoring and affects unreleased (meaning no official release) software -> fixed in 4.1.0 beta. So it don't deserve doc_text.
Verified using automation - tier 1 and tier 2 passed on all storage types (nfs, iscsi, glusterfs)