1427184 – [downstream clone] LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot

Bug 1427184 - [downstream clone] LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot

Summary: [downstream clone] LiveMerge fails with libvirtError: Block copy still active...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.2.0
Target Release:	4.2.0
Assignee:	Ala Hino
QA Contact:	Kevin Alon Goldblatt
Docs Contact:
URL:
Whiteboard:
Depends On:	1376580 1438850
Blocks:	1461101
TreeView+	depends on / blocked

Reported:	2017-02-27 14:35 UTC by rhev-integ
Modified:	2020-02-14 18:32 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:	Previously, the libvirt API would report live merges as complete before they were completed, resulting in errors. With this release, live merge progress is now detected using the libvirt xml, resulting in correct reporting of live merge completion status.
Clone Of:	1376580
Clones:	1461101 (view as bug list)
Environment:
Last Closed:	2018-05-15 17:50:23 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:1489	None	None	None	2018-05-15 17:52:06 UTC
oVirt gerrit	69318	master	MERGED	Live Merge: Handle block copy still active error	2020-07-30 14:54:13 UTC
oVirt gerrit	74283	None	MERGED	Live Merge: Handle block copy still active error	2020-07-30 14:54:13 UTC

Description rhev-integ 2017-02-27 14:35:58 UTC

+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1376580 +++
======================================================================

Description of problem:

A live merge of a disk snapshot is initiated on a running VM. The task aborts but engine sees the job still running.

Version-Release number of selected component (if applicable):

Ovirt 3.6.7
VDSM 4.17.28

How reproducible:

Don't know

Steps to Reproduce:
1. Start Snapshot live merge on running VM

Actual results:

Job fails. Tasks stays active

Thread-15824378::ERROR::2016-09-15 11:17:36,599::utils::739::root::(wrapper) Unhandled exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 736, in wrapper
    return f(*a, **kw)
  File "/usr/share/vdsm/virt/vm.py", line 5266, in run
    self.tryPivot()
  File "/usr/share/vdsm/virt/vm.py", line 5235, in tryPivot
    ret = self.vm._dom.blockJobAbort(self.drive.name, flags)
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort
    if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
libvirtError: Block Kopieren ist immer noch aktiv: Disk 'vda' noch nicht bereit für Pivot

Expected results:

Snapshot should finish successfully. If the error is correct engine should correctly determine the error state and cancel the running task.

Additional info:

Log files and screenshots attached

(Originally by mst)

Comment 1 rhev-integ 2017-02-27 14:36:10 UTC

Created attachment 1201372 [details]
Engine Log

(Originally by mst)

Comment 3 rhev-integ 2017-02-27 14:36:18 UTC

Created attachment 1201373 [details]
VDSM Log

(Originally by mst)

Comment 4 rhev-integ 2017-02-27 14:36:25 UTC

Created attachment 1201374 [details]
Task State

(Originally by mst)

Comment 5 rhev-integ 2017-02-27 14:36:33 UTC

Yesterday the error occured once again. This time a VM with 6 disk with a disk snapshot that contained 4 disks.

Result: task is running but vdsm shows abort traceback.

Very interesting for us, after the livemerge init qemu started some kind of high disk IO for more than 6 hours. So we assume, that something was doing merge operations but we do not know was really going on. See screenshot attached.

(Originally by mst)

Comment 6 rhev-integ 2017-02-27 14:36:40 UTC

Created attachment 1210227 [details]
io during failing live merge

(Originally by mst)

Comment 7 rhev-integ 2017-02-27 14:36:48 UTC

VDSM log of second error attached

(Originally by mst)

Comment 8 rhev-integ 2017-02-27 14:36:54 UTC

Created attachment 1210247 [details]
vdsm log 2nd error

(Originally by mst)

Comment 9 rhev-integ 2017-02-27 14:37:00 UTC

I count with customer the same

  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 372, in wrapper
    return f(*a, **kw)
  File "/usr/share/vdsm/virt/vm.py", line 4924, in run
    self.tryPivot()
  File "/usr/share/vdsm/virt/vm.py", line 4893, in tryPivot
    ret = self.vm._dom.blockJobAbort(self.drive.name, flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 917, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort
    if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
ckJobAbort
libvirtError: block copy still active: disk 'vda' not ready for pivot yet

VM with high disk I/O (hosting Mysql,Nagios) is not able to live migrate disks, job fails.

(Originally by Jaroslav Spanko)

Comment 11 rhev-integ 2017-02-27 14:37:14 UTC

*** Bug 1406851 has been marked as a duplicate of this bug. ***

(Originally by Ala Hino)

Comment 12 rhev-integ 2017-02-27 14:37:21 UTC

4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

(Originally by Sandro Bonazzola)

Comment 13 rhev-integ 2017-02-27 14:37:28 UTC

*** Bug 1419767 has been marked as a duplicate of this bug. ***

(Originally by Tal Nisan)

Comment 16 Allon Mureinik 2017-04-09 15:27:46 UTC

Ala, the attached patch is merged to the 4.1 branch, but it doesn't seem to solve anything (just handle the erroneous situation better). Is the "real" fix on track for 4.1.2?

Comment 17 Ala Hino 2017-04-09 19:21:38 UTC

Indeed, the merged patch doesn't fix the issue but better handles the expected exception.

The "real" fix is based on handling libvirt events. Actually, this BZ is identical to BZ 1438850 that is targeted to 4.2.

Close duplicate this BZ?

Comment 18 Allon Mureinik 2017-04-10 02:28:34 UTC

(In reply to Ala Hino from comment #17)
> Close duplicate this BZ?
No.
This is a downstream clone used to hold the customer ticket.

Comment 19 Allon Mureinik 2017-06-13 13:09:41 UTC

(In reply to Ala Hino from comment #17)
> Indeed, the merged patch doesn't fix the issue but better handles the
> expected exception.
> 
> The "real" fix is based on handling libvirt events. Actually, this BZ is
> identical to BZ 1438850 that is targeted to 4.2.
So, this was eventually solved for 4.1.3. Fixing target release to 4.1.3 so it can be included in ET.

Comment 21 Kevin Alon Goldblatt 2017-06-29 15:26:15 UTC

Verified with the following code:
-------------------------------------------
kovirt-engine-4.2.0-0.0.master.20170621095718.git8901d14.el7.centos.noarch
vdsm-4.20.1-66.git228c7be.el7.centos.x86_64

Verified with the following scenario:
-------------------------------------------
1. Create a VM with disk, install OS and write data
2. Create snap1
3. Write 500m new data
4. Create snap2
5. Write 500m new data
6. Create snap3
7. Write 2g of new data and delete snap2 during the write operation
>>>>> Snapshot is Deleted successfully, Live Merge is successful, writes completed successfully.

Moving to VERIFIED!

Comment 26 errata-xmlrpc 2018-05-15 17:50:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 27 Franta Kust 2019-05-16 13:04:12 UTC

BZ<2>Jira Resync

Note You need to log in before you can comment on or make changes to this bug.