Bug 1456437

Summary: [downstream clone - 4.1.3] "No such drive" error after pivot for one disk causes the engine to fail live merge for other disks
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: vdsmAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0.7CC: ahino, amureini, bazulay, danken, gwatson, kgoldbla, lsurette, ratamir, rbalakri, Rhev-m-bugs, srevivo, tnisan, ycui, ykaul, ylavi
Target Milestone: ovirt-4.1.3Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1447437 Environment:
Last Closed: 2017-06-25 09:00:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1447437    
Bug Blocks:    
Attachments:
Description Flags
server, engine, vdsm logs none

Description rhev-integ 2017-05-29 12:04:29 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1447437 +++
======================================================================

Description of problem:

During a live merge, after the pivot had completed for one of four disks, a LookupError ("No such drive") occurred while checking the block job, but then the merge completed successfully. However, the merges for two of the other disks failed on the engine due to "No such drive", even though this error was for another disk. These two merges completed successfully on the host. The resultant status was that the original leaf volumes for these two disks did not get removed.


Version-Release number of selected component (if applicable):

RHV 4.0.7
RHVH 4.0-7.1
   vdsm-4.18.24-3.el7
   libvirt-2.0.0-10.el7_3.5 


How reproducible:

Not.


Steps to Reproduce:
1.
2.
3.

Actual results:

Two "No such drive" LookupErrors occurred while checking a block job after the pivot had completed, causing the engine to "fail" merges for two other disks.


Expected results:

Even if LookupError occurs while checking the block job for one disk, the engine shouldn't fail live merges of other disks.

Additional info:

(Originally by Gordon Watson)

Comment 21 Kevin Alon Goldblatt 2017-06-15 11:31:27 UTC
Verified with the following code:
-------------------------------------------------------------
ovirt-engine-4.1.3-0.1.el7.noarch
rhevm-4.1.3-0.1.el7.noarch
vdsm-4.19.16-1.el7ev.x86_64

Verified with the the following scenario:
------------------------------------------------------------
1. Create VM_14 with 5 disks , thin and preallocated, block and nfs
2. Create 3 snapshots, snap14_1, snap14_2, snap14_3
3. Start the VM_14
4. Delete snap14_2 a few seconds later restart the vdsm on the host . The delete fails
5. Try deleting same snapshot snap14_2 >>>>> fails with "General command validation fail"

Moving to AMEND

Comment 22 Kevin Alon Goldblatt 2017-06-15 11:32:57 UTC
Verified with the following code:
-------------------------------------------------------------
ovirt-engine-4.1.3-0.1.el7.noarch
rhevm-4.1.3-0.1.el7.noarch
vdsm-4.19.16-1.el7ev.x86_64

Verified with the the following scenario:
------------------------------------------------------------
1. Create VM_14 with 5 disks , thin and preallocated, block and nfs
2. Create 3 snapshots, snap14_1, snap14_2, snap14_3
3. Start the VM_14
4. Delete snap14_2 a few seconds later restart the vdsm on the host . The delete fails
5. Try deleting same snapshot snap14_2 >>>>> fails with "General command validation fail"

Moving to ASSIGNED

Comment 24 Allon Mureinik 2017-06-15 12:20:20 UTC
(In reply to Kevin Alon Goldblatt from comment #22)
> Verified with the following code:
> -------------------------------------------------------------
> ovirt-engine-4.1.3-0.1.el7.noarch
> rhevm-4.1.3-0.1.el7.noarch
> vdsm-4.19.16-1.el7ev.x86_64
> 
> Verified with the the following scenario:
> ------------------------------------------------------------
> 1. Create VM_14 with 5 disks , thin and preallocated, block and nfs
> 2. Create 3 snapshots, snap14_1, snap14_2, snap14_3
> 3. Start the VM_14
> 4. Delete snap14_2 a few seconds later restart the vdsm on the host . The
> delete fails
> 5. Try deleting same snapshot snap14_2 >>>>> fails with "General command
> validation fail"
> 
> Moving to ASSIGNED

Logs?

Comment 25 Kevin Alon Goldblatt 2017-06-20 12:33:31 UTC
Created attachment 1289638 [details]
server, engine, vdsm logs

Logs added

Comment 26 Ala Hino 2017-06-21 13:44:33 UTC
Kevin,

The original bug describes a use case where in the engine log we see:

2017-04-25 19:58:21,515 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [53366d34] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=No such drive: '{'domainID': u'02a218cc-de64-4911-8d98-4128f6937c02', 'imageID': u'e6582210-11d1-4ddd-b9d2-

That might be caused due to the following error seen at Vdsm log:

libvirtError: block copy still active: disk 'vda' not ready for pivot yet

In the logs you attached, I see neither of these errors.

We may be hitting a different issue and if so, let's open a dedicated bug for it and provide a detailed and minimal set of steps to reproduce.

Comment 27 Kevin Alon Goldblatt 2017-06-22 16:55:53 UTC
(In reply to Ala Hino from comment #26)
> Kevin,
> 
> The original bug describes a use case where in the engine log we see:
> 
> 2017-04-25 19:58:21,515 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand]
> (pool-5-thread-2) [53366d34] Command
> 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value
> 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=No
> such drive: '{'domainID': u'02a218cc-de64-4911-8d98-4128f6937c02',
> 'imageID': u'e6582210-11d1-4ddd-b9d2-
> 
> That might be caused due to the following error seen at Vdsm log:
> 
> libvirtError: block copy still active: disk 'vda' not ready for pivot yet
> 
> In the logs you attached, I see neither of these errors.
> 
> We may be hitting a different issue and if so, let's open a dedicated bug
> for it and provide a detailed and minimal set of steps to reproduce.

Submitted new bug https://bugzilla.redhat.com/show_bug.cgi?id=1464214

Comment 28 Ala Hino 2017-06-25 09:00:30 UTC
As this bug depends on BZ 1438850 that was already modified, and as we believe root cause of the two BZs is the same (LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot), I am closing this bug.