1456437 – [downstream clone - 4.1.3] "No such drive" error after pivot for one disk causes the engine to fail live merge for other disks

Bug 1456437 - [downstream clone - 4.1.3] "No such drive" error after pivot for one disk causes the engine to fail live merge for other disks

Summary: [downstream clone - 4.1.3] "No such drive" error after pivot for one disk cau...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	4.0.7
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.1.3
Target Release:	---
Assignee:	Ala Hino
QA Contact:	Kevin Alon Goldblatt
Docs Contact:
URL:
Whiteboard:
Depends On:	1447437
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-29 12:04 UTC by rhev-integ
Modified:	2021-06-10 12:25 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1447437
Environment:
Last Closed:	2017-06-25 09:00:30 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
server, engine, vdsm logs (540.71 KB, application/x-gzip) 2017-06-20 12:33 UTC, Kevin Alon Goldblatt	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3021651	None	None	None	2017-05-29 12:05:42 UTC
Red Hat Product Errata	RHEA-2017:1696	normal	SHIPPED_LIVE	VDSM bug fix and enhancement update 4.1.3	2017-07-06 11:25:09 UTC
oVirt gerrit	76645	None	MERGED	vm: Detect when a block job is ready	2020-08-04 01:09:42 UTC

Description rhev-integ 2017-05-29 12:04:29 UTC

+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1447437 +++
======================================================================

Description of problem:

During a live merge, after the pivot had completed for one of four disks, a LookupError ("No such drive") occurred while checking the block job, but then the merge completed successfully. However, the merges for two of the other disks failed on the engine due to "No such drive", even though this error was for another disk. These two merges completed successfully on the host. The resultant status was that the original leaf volumes for these two disks did not get removed.


Version-Release number of selected component (if applicable):

RHV 4.0.7
RHVH 4.0-7.1
   vdsm-4.18.24-3.el7
   libvirt-2.0.0-10.el7_3.5 


How reproducible:

Not.


Steps to Reproduce:
1.
2.
3.

Actual results:

Two "No such drive" LookupErrors occurred while checking a block job after the pivot had completed, causing the engine to "fail" merges for two other disks.


Expected results:

Even if LookupError occurs while checking the block job for one disk, the engine shouldn't fail live merges of other disks.

Additional info:

(Originally by Gordon Watson)

Comment 21 Kevin Alon Goldblatt 2017-06-15 11:31:27 UTC

Verified with the following code:
-------------------------------------------------------------
ovirt-engine-4.1.3-0.1.el7.noarch
rhevm-4.1.3-0.1.el7.noarch
vdsm-4.19.16-1.el7ev.x86_64

Verified with the the following scenario:
------------------------------------------------------------
1. Create VM_14 with 5 disks , thin and preallocated, block and nfs
2. Create 3 snapshots, snap14_1, snap14_2, snap14_3
3. Start the VM_14
4. Delete snap14_2 a few seconds later restart the vdsm on the host . The delete fails
5. Try deleting same snapshot snap14_2 >>>>> fails with "General command validation fail"

Moving to AMEND

Comment 22 Kevin Alon Goldblatt 2017-06-15 11:32:57 UTC

Verified with the following code:
-------------------------------------------------------------
ovirt-engine-4.1.3-0.1.el7.noarch
rhevm-4.1.3-0.1.el7.noarch
vdsm-4.19.16-1.el7ev.x86_64

Verified with the the following scenario:
------------------------------------------------------------
1. Create VM_14 with 5 disks , thin and preallocated, block and nfs
2. Create 3 snapshots, snap14_1, snap14_2, snap14_3
3. Start the VM_14
4. Delete snap14_2 a few seconds later restart the vdsm on the host . The delete fails
5. Try deleting same snapshot snap14_2 >>>>> fails with "General command validation fail"

Moving to ASSIGNED

Comment 24 Allon Mureinik 2017-06-15 12:20:20 UTC

(In reply to Kevin Alon Goldblatt from comment #22)
> Verified with the following code:
> -------------------------------------------------------------
> ovirt-engine-4.1.3-0.1.el7.noarch
> rhevm-4.1.3-0.1.el7.noarch
> vdsm-4.19.16-1.el7ev.x86_64
> 
> Verified with the the following scenario:
> ------------------------------------------------------------
> 1. Create VM_14 with 5 disks , thin and preallocated, block and nfs
> 2. Create 3 snapshots, snap14_1, snap14_2, snap14_3
> 3. Start the VM_14
> 4. Delete snap14_2 a few seconds later restart the vdsm on the host . The
> delete fails
> 5. Try deleting same snapshot snap14_2 >>>>> fails with "General command
> validation fail"
> 
> Moving to ASSIGNED

Logs?

Comment 25 Kevin Alon Goldblatt 2017-06-20 12:33:31 UTC

Created attachment 1289638 [details]
server, engine, vdsm logs

Logs added

Comment 26 Ala Hino 2017-06-21 13:44:33 UTC

Kevin,

The original bug describes a use case where in the engine log we see:

2017-04-25 19:58:21,515 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [53366d34] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=No such drive: '{'domainID': u'02a218cc-de64-4911-8d98-4128f6937c02', 'imageID': u'e6582210-11d1-4ddd-b9d2-

That might be caused due to the following error seen at Vdsm log:

libvirtError: block copy still active: disk 'vda' not ready for pivot yet

In the logs you attached, I see neither of these errors.

We may be hitting a different issue and if so, let's open a dedicated bug for it and provide a detailed and minimal set of steps to reproduce.

Comment 27 Kevin Alon Goldblatt 2017-06-22 16:55:53 UTC

(In reply to Ala Hino from comment #26)
> Kevin,
> 
> The original bug describes a use case where in the engine log we see:
> 
> 2017-04-25 19:58:21,515 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand]
> (pool-5-thread-2) [53366d34] Command
> 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value
> 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=No
> such drive: '{'domainID': u'02a218cc-de64-4911-8d98-4128f6937c02',
> 'imageID': u'e6582210-11d1-4ddd-b9d2-
> 
> That might be caused due to the following error seen at Vdsm log:
> 
> libvirtError: block copy still active: disk 'vda' not ready for pivot yet
> 
> In the logs you attached, I see neither of these errors.
> 
> We may be hitting a different issue and if so, let's open a dedicated bug
> for it and provide a detailed and minimal set of steps to reproduce.

Submitted new bug https://bugzilla.redhat.com/show_bug.cgi?id=1464214

Comment 28 Ala Hino 2017-06-25 09:00:30 UTC

As this bug depends on BZ 1438850 that was already modified, and as we believe root cause of the two BZs is the same (LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot), I am closing this bug.

Note You need to log in before you can comment on or make changes to this bug.