Bug 1400707 - Live merge failed on "timeout which can be caused by communication issues"
Summary: Live merge failed on "timeout which can be caused by communication issues"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: ovirt-4.1.0-beta
: 4.1.0.2
Assignee: Francesco Romani
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-01 21:47 UTC by Raz Tamir
Modified: 2017-02-01 14:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-01 14:37:26 UTC
oVirt Team: Storage
rule-engine: ovirt-4.1+
rule-engine: blocker+
tnisan: devel_ack+


Attachments (Terms of Use)
spm and engine logs (27.55 KB, application/x-gzip)
2016-12-01 21:47 UTC, Raz Tamir
no flags Details
hsm and engine logs (40.02 KB, application/x-gzip)
2016-12-01 21:48 UTC, Raz Tamir
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 67902 0 master MERGED vm: storage: fix _diskXMLGetVolumeChainInfo 2020-05-26 19:33:01 UTC
oVirt gerrit 68126 0 master MERGED vmxml: test surprising behaviour of find_first 2020-05-26 19:33:01 UTC

Description Raz Tamir 2016-12-01 21:47:57 UTC
Created attachment 1227035 [details]
spm and engine logs

Description of problem:
When trying to perform a live merge (VM running on either SPM or HSM), the operation fails:
2016-12-01 23:28:24,616+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-6) [68218035] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='true', hostId='88d0d698-e962-4d4c-b333-3667a678c580', vmId='ea659a41-088f-4521-a09d-abe4a9802f73', storagePoolId='5ef2e0f0-1bba-45b0-ab2f-6c51ba0692f9', storageDomainId='e7826af8-fe1c-44af-8cef-7e7c7af67d5e', imageGroupId='30ee327a-e5e7-44be-b9aa-a0ee11916eab', imageId='bbb0f647-ebc0-4a2c-9b4e-340a799322e0', baseImageId='8672013b-a877-43b0-9d95-9379b53ae1dd', topImageId='bbb0f647-ebc0-4a2c-9b4e-340a799322e0', bandwidth='0'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues

Following with:

2016-12-01 23:28:24,616+02 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-48) [68218035] Host 'host_mixed_3' is not responding.
2016-12-01 23:28:24,616+02 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-6) [68218035] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)

There is no error in vdsm and it never become not responding.




Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.0.master.20161126211319.gitae69c34.el7.centos.noarch
vdsm-4.18.999-1020.git1ff41b1.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start a VM with existing snapshot
2. Remove the snapshot
3.

Actual results:
Explained above


Expected results:
the live merge flow should finish successfully

Additional info:

Comment 1 Raz Tamir 2016-12-01 21:48:30 UTC
Created attachment 1227036 [details]
hsm and engine logs

Comment 2 Allon Mureinik 2016-12-02 00:56:44 UTC
Tentatively targetting to 4.1.
Raz - does this reproduce in 4.0.z too?

Comment 3 Raz Tamir 2016-12-02 09:28:29 UTC
Allon,
In 4.0.z we have different bug, bug #1400137.
I checked that the results are not the same before open this bug to 4.1

Comment 5 Tal Nisan 2016-12-08 15:01:39 UTC
Reproduced by Ala and it is a duplicate of bug 1400137

*** This bug has been marked as a duplicate of bug 1400137 ***

Comment 6 Tal Nisan 2016-12-08 15:09:33 UTC
Correction: while the patch attached fixes a part of bug 1400137 it is not a duplicate since bug 1400137 was affected by another bug in zstream.
Reopening this bug to track the issue

Comment 7 Francesco Romani 2016-12-14 16:41:46 UTC
This bug was caused by internal refactoring and affects unreleased (meaning no official release) software -> fixed in 4.1.0 beta.
So it don't deserve doc_text.

Comment 8 Raz Tamir 2017-01-02 11:51:15 UTC
Verified using automation - tier 1 and tier 2 passed on all storage types (nfs, iscsi, glusterfs)


Note You need to log in before you can comment on or make changes to this bug.