Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1438850

Summary: LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot
Product: [oVirt] vdsm Reporter: Ala Hino <ahino>
Component: GeneralAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.17.28CC: ahino, alitke, amureini, bugs, creatmbox, eedri, jspanko, kgoldbla, mkalinin, mst, nsoffer, rabraham, stirabos, tnisan, ylavi
Target Milestone: ovirt-4.1.3Keywords: ZStream
Target Release: 4.19.16Flags: rule-engine: ovirt-4.1+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Testing completion of a live merge operation was incorrect, checking live merge progress value available via libvirt api which does not provide the status of a live merge operation. Consequence: Live merge was detected as completed before the operation was actually completed. Trying to finalize the merge operation failed repeatedly until the operation was actually completed, logging multiple errors during the process. Fix: Detect live merge completion using the libvirt xml. Result: Live merge operation will complete successfully without logging errors.
Story Points: ---
Clone Of: 1376580 Environment:
Last Closed: 2017-07-06 13:31:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1376580    
Bug Blocks: 1427184, 1447437    
Attachments:
Description Flags
Engine Log
none
io during failing live merge
none
VDSM Log
none
vdsm log 2nd error
none
Task State none

Comment 1 Ala Hino 2017-04-04 14:40:54 UTC
Created attachment 1268683 [details]
Engine Log

Comment 2 Ala Hino 2017-04-04 14:42:05 UTC
Created attachment 1268684 [details]
io during failing live merge

Comment 3 Ala Hino 2017-04-04 14:43:15 UTC
Created attachment 1268685 [details]
VDSM Log

Comment 4 Ala Hino 2017-04-04 14:43:52 UTC
Created attachment 1268690 [details]
vdsm log 2nd error

Comment 5 Ala Hino 2017-04-04 14:44:20 UTC
Created attachment 1268700 [details]
Task State

Comment 6 Ala Hino 2017-04-04 14:48:57 UTC
Fix of this BZ will be based on handling libvirt events when pivot is ready.

Comment 7 Ala Hino 2017-04-19 09:45:10 UTC
*** Bug 1441941 has been marked as a duplicate of this bug. ***

Comment 8 Nir Soffer 2017-05-09 10:10:44 UTC
The attached patch (http://gerrit.ovirt.org/75954) does not fix the issue of slow
merge, I don't think we can fix the case when the vm is doing lot of io so the
merge never converge. Maybe this issue should be handled in qemu.

The patch does fix the issue of detecting when a block job is ready. Previously we
thought that the only way to detect this is using libvirt events, but with this
patch using libvirt events is an optimization that we should consider for future 
version, but for 4.1 we can use xml detection.

With this patch we can resolve this bug in 4.1.3.

Comment 9 Nir Soffer 2017-05-09 22:35:13 UTC
Removed patches copied when the patch was cloned, they are not relevant to this
bug.

Comment 10 Ala Hino 2017-05-16 08:19:59 UTC
(In reply to Nir Soffer from comment #8)
> The attached patch (http://gerrit.ovirt.org/75954) does not fix the issue of
> slow
> merge, I don't think we can fix the case when the vm is doing lot of io so
> the
> merge never converge. Maybe this issue should be handled in qemu.
> 
> The patch does fix the issue of detecting when a block job is ready.
> Previously we
> thought that the only way to detect this is using libvirt events, but with
> this
> patch using libvirt events is an optimization that we should consider for
> future 
> version, but for 4.1 we can use xml detection.
> 
> With this patch we can resolve this bug in 4.1.3.

Ack

Comment 13 Elad 2017-06-05 10:51:46 UTC
Nir, Ala, are the reproductions steps similar to https://bugzilla.redhat.com/show_bug.cgi?id=1376580#c30 ?

Comment 14 Ala Hino 2017-06-05 11:12:17 UTC
The bugs are related to pivot behavior; however, they are different.
This one fixes the logic used to determine when disk is ready for pivot.
The other one, nicely handles a use case where we try to do pivot while the disk isn't ready for pivot.
Actually, with this fix, we shouldn't encounter the previous bug.

Reproducing the "disk is not for ready pivot" isn't trivial and, somehow, we never were able to reproduce.

I believe that if you simply run live merge, the merge will successfully complete.

Comment 15 Elad 2017-06-05 11:49:50 UTC
Thanks Ala.

We're constantly executing the live merge test plan [1]. The latest execution with [2] ended with 100% success.


[1]
https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/wiki/Storage/3_5_Storage_Live_Merge

[2]
vdsm-4.19.17-1.el7ev.x86_64
libvirt-daemon-2.0.0-10.el7_3.9.x86_64
rhevm-4.1.3.1-0.1.el7.noarch