+++ This bug is an upstream to downstream clone. The original bug is: +++ +++ bug 1376580 +++ ====================================================================== Description of problem: A live merge of a disk snapshot is initiated on a running VM. The task aborts but engine sees the job still running. Version-Release number of selected component (if applicable): Ovirt 3.6.7 VDSM 4.17.28 How reproducible: Don't know Steps to Reproduce: 1. Start Snapshot live merge on running VM Actual results: Job fails. Tasks stays active Thread-15824378::ERROR::2016-09-15 11:17:36,599::utils::739::root::(wrapper) Unhandled exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 736, in wrapper return f(*a, **kw) File "/usr/share/vdsm/virt/vm.py", line 5266, in run self.tryPivot() File "/usr/share/vdsm/virt/vm.py", line 5235, in tryPivot ret = self.vm._dom.blockJobAbort(self.drive.name, flags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self) libvirtError: Block Kopieren ist immer noch aktiv: Disk 'vda' noch nicht bereit für Pivot Expected results: Snapshot should finish successfully. If the error is correct engine should correctly determine the error state and cancel the running task. Additional info: Log files and screenshots attached (Originally by mst)
Created attachment 1201372 [details] Engine Log (Originally by mst)
Created attachment 1201373 [details] VDSM Log (Originally by mst)
Created attachment 1201374 [details] Task State (Originally by mst)
Yesterday the error occured once again. This time a VM with 6 disk with a disk snapshot that contained 4 disks. Result: task is running but vdsm shows abort traceback. Very interesting for us, after the livemerge init qemu started some kind of high disk IO for more than 6 hours. So we assume, that something was doing merge operations but we do not know was really going on. See screenshot attached. (Originally by mst)
Created attachment 1210227 [details] io during failing live merge (Originally by mst)
VDSM log of second error attached (Originally by mst)
Created attachment 1210247 [details] vdsm log 2nd error (Originally by mst)
I count with customer the same File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 372, in wrapper return f(*a, **kw) File "/usr/share/vdsm/virt/vm.py", line 4924, in run self.tryPivot() File "/usr/share/vdsm/virt/vm.py", line 4893, in tryPivot ret = self.vm._dom.blockJobAbort(self.drive.name, flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 917, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self) ckJobAbort libvirtError: block copy still active: disk 'vda' not ready for pivot yet VM with high disk I/O (hosting Mysql,Nagios) is not able to live migrate disks, job fails. (Originally by Jaroslav Spanko)
*** Bug 1406851 has been marked as a duplicate of this bug. *** (Originally by Ala Hino)
4.0.6 has been the last oVirt 4.0 release, please re-target this bug. (Originally by Sandro Bonazzola)
*** Bug 1419767 has been marked as a duplicate of this bug. *** (Originally by Tal Nisan)
Ala, the attached patch is merged to the 4.1 branch, but it doesn't seem to solve anything (just handle the erroneous situation better). Is the "real" fix on track for 4.1.2?
Indeed, the merged patch doesn't fix the issue but better handles the expected exception. The "real" fix is based on handling libvirt events. Actually, this BZ is identical to BZ 1438850 that is targeted to 4.2. Close duplicate this BZ?
(In reply to Ala Hino from comment #17) > Close duplicate this BZ? No. This is a downstream clone used to hold the customer ticket.
(In reply to Ala Hino from comment #17) > Indeed, the merged patch doesn't fix the issue but better handles the > expected exception. > > The "real" fix is based on handling libvirt events. Actually, this BZ is > identical to BZ 1438850 that is targeted to 4.2. So, this was eventually solved for 4.1.3. Fixing target release to 4.1.3 so it can be included in ET.
Verified with the following code: ------------------------------------------- kovirt-engine-4.2.0-0.0.master.20170621095718.git8901d14.el7.centos.noarch vdsm-4.20.1-66.git228c7be.el7.centos.x86_64 Verified with the following scenario: ------------------------------------------- 1. Create a VM with disk, install OS and write data 2. Create snap1 3. Write 500m new data 4. Create snap2 5. Write 500m new data 6. Create snap3 7. Write 2g of new data and delete snap2 during the write operation >>>>> Snapshot is Deleted successfully, Live Merge is successful, writes completed successfully. Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1489
BZ<2>Jira Resync