| Summary: | Live merge failed on "timeout which can be caused by communication issues" | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Raz Tamir <ratamir> | ||||||
| Component: | BLL.Storage | Assignee: | Francesco Romani <fromani> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Raz Tamir <ratamir> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 4.1.0 | CC: | bugs, gklein, ratamir, tnisan, ylavi | ||||||
| Target Milestone: | ovirt-4.1.0-beta | Keywords: | Automation, Regression, Reopened | ||||||
| Target Release: | 4.1.0.2 | Flags: | rule-engine:
ovirt-4.1+
rule-engine: blocker+ tnisan: devel_ack+ |
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-02-01 14:37:26 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Created attachment 1227036 [details]
hsm and engine logs
Tentatively targetting to 4.1. Raz - does this reproduce in 4.0.z too? Allon, In 4.0.z we have different bug, bug #1400137. I checked that the results are not the same before open this bug to 4.1 Reproduced by Ala and it is a duplicate of bug 1400137 *** This bug has been marked as a duplicate of bug 1400137 *** Correction: while the patch attached fixes a part of bug 1400137 it is not a duplicate since bug 1400137 was affected by another bug in zstream. Reopening this bug to track the issue This bug was caused by internal refactoring and affects unreleased (meaning no official release) software -> fixed in 4.1.0 beta. So it don't deserve doc_text. Verified using automation - tier 1 and tier 2 passed on all storage types (nfs, iscsi, glusterfs) |
Created attachment 1227035 [details] spm and engine logs Description of problem: When trying to perform a live merge (VM running on either SPM or HSM), the operation fails: 2016-12-01 23:28:24,616+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-6) [68218035] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='true', hostId='88d0d698-e962-4d4c-b333-3667a678c580', vmId='ea659a41-088f-4521-a09d-abe4a9802f73', storagePoolId='5ef2e0f0-1bba-45b0-ab2f-6c51ba0692f9', storageDomainId='e7826af8-fe1c-44af-8cef-7e7c7af67d5e', imageGroupId='30ee327a-e5e7-44be-b9aa-a0ee11916eab', imageId='bbb0f647-ebc0-4a2c-9b4e-340a799322e0', baseImageId='8672013b-a877-43b0-9d95-9379b53ae1dd', topImageId='bbb0f647-ebc0-4a2c-9b4e-340a799322e0', bandwidth='0'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues Following with: 2016-12-01 23:28:24,616+02 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-48) [68218035] Host 'host_mixed_3' is not responding. 2016-12-01 23:28:24,616+02 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-6) [68218035] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022) There is no error in vdsm and it never become not responding. Version-Release number of selected component (if applicable): ovirt-engine-4.1.0-0.0.master.20161126211319.gitae69c34.el7.centos.noarch vdsm-4.18.999-1020.git1ff41b1.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start a VM with existing snapshot 2. Remove the snapshot 3. Actual results: Explained above Expected results: the live merge flow should finish successfully Additional info: