Bug 1128631

Summary: MergeVDSCommand fails when performing live snapshot deletion
Product: [Retired] oVirt Reporter: Raz Tamir <ratamir>
Component: ovirt-engine-coreAssignee: Adam Litke <alitke>
Status: CLOSED CURRENTRELEASE QA Contact: Ori Gofen <ogofen>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5CC: acanan, alitke, amureini, ecohen, gklein, gpadgett, iheim, ogofen, ratamir, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-engine-3.5.0_rc1.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:45:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log
none
vdsm log none

Description Raz Tamir 2014-08-11 09:03:12 UTC
Created attachment 925650 [details]
engine log

Description of problem:
When performing live snapshot deletion, an ERROR messages are shown in engine log (log attached).
The snapshot remains in lock state and the disks in that snapshot are in illegal status

** 2014-08-11 11:14:04,248 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-15) [cff2e7] Merging of snapshot 095be0b6-d82d-4138-b91b-a1e49abe40be images f30d23dc-4ac5-45ec-b39d-64fb35f4772c..fd890dce-4d33-48f4-bb51-4f05ece61825 failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.


Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Live delete snapshot
2.
3.

Actual results:
explained above

Expected results:


Additional info:

Comment 1 Allon Mureinik 2014-08-11 11:22:26 UTC
Please attach VDSM's log too

Comment 2 Raz Tamir 2014-08-11 12:03:59 UTC
Created attachment 925712 [details]
vdsm log

Comment 3 Adam Litke 2014-08-11 19:34:42 UTC
From the attached log I see the following important entries.  Live merge was called twice in rapid succession by engine.  Once called, vdsm properly handled both calls.  The host does not have a capable version of libvirt so vdsm returns an error code.  This smells like some sort of engine side race condition to me.  Please attach the engine.log as well.


Thread-65::DEBUG::2014-08-11 11:13:44,187::BindingXMLRPC::1127::vds::(wrapper) client [10.35.161.54]::call merge with ('42e19c61-73f6-4b8b-8478-07c0d0b6265a', {'domainID': 'ab1a30d2-6b93-4302-a028-a1f506f3b1da', 'volumeID': 'fd306b46-52b4-4715-a73f-a946d996adbf', 'poolID': 'f603339e-c4aa-474c-bb83-df768af662c8', 'imageID': '969ec9dc-a95e-4129-97f1-8465a1b79804'}, 'df6bc58a-105c-4964-9a56-caa268ebce5d', 'fd306b46-52b4-4715-a73f-a946d996adbf', '0', '0a93e33a-a4cc-4a0f-884f-233e85050d4b') {} flowID [80d09bd]


Thread-58::DEBUG::2014-08-11 11:13:44,212::BindingXMLRPC::1127::vds::(wrapper) client [10.35.161.54]::call merge with ('42e19c61-73f6-4b8b-8478-07c0d0b6265a', {'domainID': '80f91f5c-f5cc-4ae2-953e-b2ef2f812a7e', 'volumeID': 'fd890dce-4d33-48f4-bb51-4f05ece61825', 'poolID': 'f603339e-c4aa-474c-bb83-df768af662c8', 'imageID': '8773e3b2-fe5a-4388-a8e0-a8f9d5217d67'}, 'f30d23dc-4ac5-45ec-b39d-64fb35f4772c', 'fd890dce-4d33-48f4-bb51-4f05ece61825', '0', 'ea9ef3dd-ce79-4cc1-9396-2d6ba3c92f32') {} flowID [37537b98]


Thread-65::DEBUG::2014-08-11 11:13:44,342::vm::5566::vm.Vm::(merge) vmId=`42e19c61-73f6-4b8b-8478-07c0d0b6265a`::Starting merge with jobUUID='0a93e33a-a4cc-4a0f-884f-233e85050d4b'
Thread-65::ERROR::2014-08-11 11:13:44,342::vm::5575::vm.Vm::(merge) vmId=`42e19c61-73f6-4b8b-8478-07c0d0b6265a`::Libvirt missing VIR_DOMAIN_BLOCK_COMMIT_RELATIVE. Unable to perform live merge.
Thread-65::DEBUG::2014-08-11 11:13:44,343::BindingXMLRPC::1134::vds::(wrapper) return merge with {'status': {'message': 'Merge failed', 'code': 52}}


Thread-58::DEBUG::2014-08-11 11:13:44,398::vm::5566::vm.Vm::(merge) vmId=`42e19c61-73f6-4b8b-8478-07c0d0b6265a`::Starting merge with jobUUID='ea9ef3dd-ce79-4cc1-9396-2d6ba3c92f32'
Thread-58::ERROR::2014-08-11 11:13:44,399::vm::5575::vm.Vm::(merge) vmId=`42e19c61-73f6-4b8b-8478-07c0d0b6265a`::Libvirt missing VIR_DOMAIN_BLOCK_COMMIT_RELATIVE. Unable to perform live merge.
Thread-58::DEBUG::2014-08-11 11:13:44,399::BindingXMLRPC::1134::vds::(wrapper) return merge with {'status': {'message': 'Merge failed', 'code': 52}}

Comment 4 Greg Padgett 2014-08-11 20:09:07 UTC
This looks just like what I experienced with builds not containing a commit [1] merged on August 7, which fixes the locked snapshot issue.  Without it, Live Merge attempts won't converge, and will leave the disks illegal and snapshots locked.  For an error case like this (libvirt not capable), it would leave things in an inoperable state.

Can you retry with a build containing this commit?

[1] 209ec823a03dd5838eed3d711fd821d2a1aba9dd core: Live Merge command hangs

Comment 5 Raz Tamir 2014-08-12 07:03:09 UTC
Hi Adam,
engine.log attached

Comment 6 Allon Mureinik 2014-08-12 07:26:18 UTC
(In reply to Greg Padgett from comment #4)
> This looks just like what I experienced with builds not containing a commit
> [1] merged on August 7, which fixes the locked snapshot issue.  Without it,
> Live Merge attempts won't converge, and will leave the disks illegal and
> snapshots locked.  For an error case like this (libvirt not capable), it
> would leave things in an inoperable state.
> 
> Can you retry with a build containing this commit?
> 
> [1] 209ec823a03dd5838eed3d711fd821d2a1aba9dd core: Live Merge command hangs
Moving to MODIFIED based on this comment - there is no such build publicly available yet.

Raz - do you have the resources to give this a quick pre-integ run?

Comment 7 Raz Tamir 2014-08-24 14:39:59 UTC
Hi Allon,
This operation is blocked right now.
Is this the correct behaviour ?

Comment 8 Allon Mureinik 2014-08-24 19:47:37 UTC
(In reply to ratamir from comment #7)
> Hi Allon,
> This operation is blocked right now.
> Is this the correct behaviour ?
Depends on the host - can you share the versions of qemu/libvirt/vdsm please?

Comment 9 Raz Tamir 2014-08-25 06:34:19 UTC
- vdsm-4.16.2-1.gite8cba75.el6.x86_64

- qemu-img-rhev-0.12.1.2-2.415.el6_5.14.x86_64
- qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64
- qemu-kvm-rhev-tools-0.12.1.2-2.415.el6_5.14.x86_64

- libvirt-0.10.2-29.el6_5.10.x86_64

Comment 10 Adam Litke 2014-08-25 15:32:24 UTC
These versions of libvirt and qemu lack support for live merge so in this case it's working as designed.  See http://www.ovirt.org/Features/Live_Merge#IMPORTANT:_Special_environment_setup for information on testing the feature with Fedora 20 hosts.

Comment 11 Sandro Bonazzola 2014-10-17 12:45:01 UTC
oVirt 3.5 has been released and should include the fix for this issue.