Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1461101

Summary: [downstream clone - 4.1.3] [downstream clone] LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: vdsmAssignee: Ala Hino <ahino>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: ahino, alitke, amureini, bazulay, bugs, creatmbox, eedri, gshinar, jspanko, lsurette, mkalinin, ratamir, srevivo, tnisan, trichard, ycui, ykaul, ylavi
Target Milestone: ovirt-4.1.3Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Previously, the method used to test the completion of a live merge operation was incorrect; it checked the live merge progress value available from the libvirt API, which does not provide the status of a live merge operation. As a result, the live merge was detected as completed before the operation was actually completed. Trying to finalize the merge operation failed repeatedly until the operation was actually completed, logging multiple errors during the process. Now, live merge completion is detected using the libvirt XML, so the operation should complete successfully without logging errors.
Story Points: ---
Clone Of: 1427184 Environment:
Last Closed: 2017-07-27 18:03:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1427184    
Bug Blocks:    

Description rhev-integ 2017-06-13 13:59:34 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1427184 +++
======================================================================

+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1376580 +++
======================================================================

Description of problem:

A live merge of a disk snapshot is initiated on a running VM. The task aborts but engine sees the job still running.

Version-Release number of selected component (if applicable):

Ovirt 3.6.7
VDSM 4.17.28

How reproducible:

Don't know

Steps to Reproduce:
1. Start Snapshot live merge on running VM

Actual results:

Job fails. Tasks stays active

Thread-15824378::ERROR::2016-09-15 11:17:36,599::utils::739::root::(wrapper) Unhandled exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 736, in wrapper
    return f(*a, **kw)
  File "/usr/share/vdsm/virt/vm.py", line 5266, in run
    self.tryPivot()
  File "/usr/share/vdsm/virt/vm.py", line 5235, in tryPivot
    ret = self.vm._dom.blockJobAbort(self.drive.name, flags)
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort
    if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
libvirtError: Block Kopieren ist immer noch aktiv: Disk 'vda' noch nicht bereit für Pivot

Expected results:

Snapshot should finish successfully. If the error is correct engine should correctly determine the error state and cancel the running task.

Additional info:

Log files and screenshots attached

(Originally by mst)

(Originally by rhev-integ)

Comment 1 rhev-integ 2017-06-13 13:59:46 UTC
Created attachment 1201372 [details]
Engine Log

(Originally by mst)

(Originally by rhev-integ)

Comment 4 rhev-integ 2017-06-13 14:00:01 UTC
Created attachment 1201373 [details]
VDSM Log

(Originally by mst)

(Originally by rhev-integ)

Comment 5 rhev-integ 2017-06-13 14:00:10 UTC
Created attachment 1201374 [details]
Task State

(Originally by mst)

(Originally by rhev-integ)

Comment 6 rhev-integ 2017-06-13 14:00:18 UTC
Yesterday the error occured once again. This time a VM with 6 disk with a disk snapshot that contained 4 disks.

Result: task is running but vdsm shows abort traceback.

Very interesting for us, after the livemerge init qemu started some kind of high disk IO for more than 6 hours. So we assume, that something was doing merge operations but we do not know was really going on. See screenshot attached.

(Originally by mst)

(Originally by rhev-integ)

Comment 7 rhev-integ 2017-06-13 14:00:26 UTC
Created attachment 1210227 [details]
io during failing live merge

(Originally by mst)

(Originally by rhev-integ)

Comment 8 rhev-integ 2017-06-13 14:00:34 UTC
VDSM log of second error attached

(Originally by mst)

(Originally by rhev-integ)

Comment 9 rhev-integ 2017-06-13 14:00:41 UTC
Created attachment 1210247 [details]
vdsm log 2nd error

(Originally by mst)

(Originally by rhev-integ)

Comment 10 rhev-integ 2017-06-13 14:00:49 UTC
I count with customer the same

  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 372, in wrapper
    return f(*a, **kw)
  File "/usr/share/vdsm/virt/vm.py", line 4924, in run
    self.tryPivot()
  File "/usr/share/vdsm/virt/vm.py", line 4893, in tryPivot
    ret = self.vm._dom.blockJobAbort(self.drive.name, flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 917, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort
    if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
ckJobAbort
libvirtError: block copy still active: disk 'vda' not ready for pivot yet

VM with high disk I/O (hosting Mysql,Nagios) is not able to live migrate disks, job fails.

(Originally by Jaroslav Spanko)

(Originally by rhev-integ)

Comment 12 rhev-integ 2017-06-13 14:01:05 UTC
*** Bug 1406851 has been marked as a duplicate of this bug. ***

(Originally by Ala Hino)

(Originally by rhev-integ)

Comment 13 rhev-integ 2017-06-13 14:01:13 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

(Originally by Sandro Bonazzola)

(Originally by rhev-integ)

Comment 14 rhev-integ 2017-06-13 14:01:22 UTC
*** Bug 1419767 has been marked as a duplicate of this bug. ***

(Originally by Tal Nisan)

(Originally by rhev-integ)

Comment 17 rhev-integ 2017-06-13 14:01:43 UTC
Ala, the attached patch is merged to the 4.1 branch, but it doesn't seem to solve anything (just handle the erroneous situation better). Is the "real" fix on track for 4.1.2?

(Originally by Allon Mureinik)

Comment 18 rhev-integ 2017-06-13 14:01:51 UTC
Indeed, the merged patch doesn't fix the issue but better handles the expected exception.

The "real" fix is based on handling libvirt events. Actually, this BZ is identical to BZ 1438850 that is targeted to 4.2.

Close duplicate this BZ?

(Originally by Ala Hino)

Comment 19 rhev-integ 2017-06-13 14:01:58 UTC
(In reply to Ala Hino from comment #17)
> Close duplicate this BZ?
No.
This is a downstream clone used to hold the customer ticket.

(Originally by Allon Mureinik)

Comment 20 rhev-integ 2017-06-13 14:02:06 UTC
(In reply to Ala Hino from comment #17)
> Indeed, the merged patch doesn't fix the issue but better handles the
> expected exception.
> 
> The "real" fix is based on handling libvirt events. Actually, this BZ is
> identical to BZ 1438850 that is targeted to 4.2.
So, this was eventually solved for 4.1.3. Fixing target release to 4.1.3 so it can be included in ET.

(Originally by Allon Mureinik)

Comment 21 Allon Mureinik 2017-06-13 15:44:31 UTC
(In reply to rhev-integ from comment #20)
> (In reply to Ala Hino from comment #17)
> > Indeed, the merged patch doesn't fix the issue but better handles the
> > expected exception.
> > 
> > The "real" fix is based on handling libvirt events. Actually, this BZ is
> > identical to BZ 1438850 that is targeted to 4.2.
> So, this was eventually solved for 4.1.3. Fixing target release to 4.1.3 so
> it can be included in ET.
> 
> (Originally by Allon Mureinik)

Setting to ON_QA based on this.
Note that the work Kevin did on bug 1376580 should also verify this one, but up to QA staleholders whether they want to re-verify this or just mark as VERIFIED based on that one.

Comment 22 Kevin Alon Goldblatt 2017-06-21 13:28:09 UTC
Moving to VERIFIED based on Comment 21

Comment 28 errata-xmlrpc 2017-07-27 18:03:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1815