Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1447437

Summary:

"No such drive" error after pivot for one disk causes the engine to fail live merge for other disks

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Gordon Watson <gwatson>

Component:

ovirt-engine

Assignee:

Ala Hino <ahino>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Kevin Alon Goldblatt <kgoldbla>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.0.7

CC:

ahino, amureini, gwatson, kgoldbla, kshukla, lsurette, ratamir, rbalakri, Rhev-m-bugs, srevivo, tnisan, ykaul, ylavi

Target Milestone:

ovirt-4.2.0

Keywords:

ZStream

Target Release:

---

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1456437 (view as bug list)

Environment:

Last Closed:

2017-06-25 09:05:08 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1438850

Bug Blocks:

1456437

Attachments:

Description	Flags
server, engine, vdsm logs	none

Description Gordon Watson 2017-05-02 19:21:21 UTC

Description of problem:

During a live merge, after the pivot had completed for one of four disks, a LookupError ("No such drive") occurred while checking the block job, but then the merge completed successfully. However, the merges for two of the other disks failed on the engine due to "No such drive", even though this error was for another disk. These two merges completed successfully on the host. The resultant status was that the original leaf volumes for these two disks did not get removed.


Version-Release number of selected component (if applicable):

RHV 4.0.7
RHVH 4.0-7.1
   vdsm-4.18.24-3.el7
   libvirt-2.0.0-10.el7_3.5 


How reproducible:

Not.


Steps to Reproduce:
1.
2.
3.

Actual results:

Two "No such drive" LookupErrors occurred while checking a block job after the pivot had completed, causing the engine to "fail" merges for two other disks.


Expected results:

Even if LookupError occurs while checking the block job for one disk, the engine shouldn't fail live merges of other disks.

Additional info:

Comment 18 Kevin Alon Goldblatt 2017-06-20 12:58:12 UTC

Verified with the following code:
-------------------------------------------------------------
ovirt-engine-4.2.0-0.0.master.20170612192318.gitf773263.el7.centos.noarch
vdsm-4.20.0-1049.gite64d80e.el7.centos.x86_64

Verified with the the following scenario:
------------------------------------------------------------
1. Create VM2 with 5 disks , thin and preallocated, block and nfs
2. Create 3 snapshots, snapper1, snapper2, snapper3, snapper3 
3. Start the VM2
4. Delete snap14_2 a few seconds later restart the vdsm on the host . The delete fails
5. Try deleting same snapshot snap14_2 >>>>> Live Merge fails and snapshot remains locked after vdsm restarts

Actual Result:
----------------------
Live Merge fails and snapshot remains locked after vdsm restarts

Expected Result:
----------------------
Once VDSM restarts the snapshot should not be locked and allow another delete opperation

Moving to ASSIGNED

Comment 19 Kevin Alon Goldblatt 2017-06-20 13:00:40 UTC

(In reply to Kevin Alon Goldblatt from comment #18)
> Verified with the following code:
Correction!! Scenario was as follows

> -------------------------------------------------------------
> ovirt-engine-4.2.0-0.0.master.20170612192318.gitf773263.el7.centos.noarch
> vdsm-4.20.0-1049.gite64d80e.el7.centos.x86_64
> 
> Verified with the the following scenario:
> ------------------------------------------------------------
> 1. Create VM2 with 5 disks , thin and preallocated, block and nfs
> 2. Create 3 snapshots, snapper1, snapper2, snapper3, snapper3 
> 3. Start the VM2
> 4. Delete snapper2 a few seconds later restart the vdsm on the host . The
> delete fails
> 5. Try deleting same snapshot snapper2 >>>>> Live Merge fails and snapshot
> remains locked after vdsm restarts
> 
> Actual Result:
> ----------------------
> Live Merge fails and snapshot remains locked after vdsm restarts
> 
> Expected Result:
> ----------------------
> Once VDSM restarts the snapshot should not be locked and allow another
> delete opperation
> 
> Moving to ASSIGNED

Comment 20 Kevin Alon Goldblatt 2017-06-20 13:01:38 UTC

Created attachment 1289641 [details]
server, engine, vdsm logs

adding logs

Comment 21 Kevin Alon Goldblatt 2017-06-20 13:27:23 UTC

Engine.log:
-----------------
2017-06-20 15:28:24,321+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-14) [] EVENT_ID: USER_REMOVE_SNAPSHOT(342), Snapshot 'snapper2' deletion for VM 'vm2' was ini
tiated by admin@internal-authz.
2017-06-20 15:28:24,424+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE
ND'
2017-06-20 15:28:24,544+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE
ND'
2017-06-20 15:28:24,594+03 INFO  [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-6) [194425b9-62e1-4e8f-8283-61b991d62b7d] Running command: MergeExtendCommand internal: true. Entities affected :  I
D: ab217695-7e38-4684-90c7-9c356b941caa Type: Storage
2017-06-20 15:28:24,595+03 INFO  [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-6) [194425b9-62e1-4e8f-8283-61b991d62b7d] Base and top image sizes are the same; no image size update required
2017-06-20 15:28:24,683+03 INFO  [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Running command: MergeExtendCommand internal: true. Entities affected :  I
D: ab217695-7e38-4684-90c7-9c356b941caa Type: Storage
2017-06-20 15:28:24,683+03 INFO  [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-5-thread-7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Base and top image sizes are the same; no image size update required
2017-06-20 15:28:24,687+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE
ND'
2017-06-20 15:28:24,773+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler7) [194425b9-62e1-4e8f-8283-61b991d62b7d] Executing Live Merge command step 'EXTE
ND'
.
RESTARTED VDSM HERE!!!
.
2017-06-20 15:28:56,738+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (DefaultQuartzScheduler3) [194425b9-62e1-4e8f-8283-61b991d62b7d] Command 'RemoveSnapshotSingleDiskLive' (id: 'a85586e7-7c19-4b58-b49b-dd01d9750f0f') waiting on child command id: '0621d23d-3727-4342-a93a-7ad877d976a6' type:'MergeStatus' to complete


VDSM.LOG:
--------------------------
2017-06-20 15:28:26,734+0300 INFO  (jsonrpc/7) [virt.vm] (vmId='f6737775-556c-41e1-a28d-57bf4aa68870') Starting merge with jobUUID=u'fbe8a053-d8dc-4513-bef1-c772572daf3d', original chain=f19841f6-54ba-4cfd-9a20-af63194f4f70 < e4825cca-0ad1-4f5f-9eb3-401ccce36259 < b0980700-ecc2-456d-989e-6f6c24c949f9 < d1fb4a1e-57e6-4088-b1ff-40fab92f1209 < e98beb7e-211d-4e02-b97b-7a7c4c62d2a9 (top), disk='sda', base='sda[3]', top='sda[2]', bandwidth=0, flags=8 (vm:5086)
.
.
.

Comment 22 Ala Hino 2017-06-20 14:43:36 UTC

Kevin,

The original bug describes a use case where in the engine log we see:

2017-04-25 19:58:21,515 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-2) [53366d34] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=No such drive: '{'domainID': u'02a218cc-de64-4911-8d98-4128f6937c02', 'imageID': u'e6582210-11d1-4ddd-b9d2-

That might be caused due to the following error seen at Vdsm log:

libvirtError: block copy still active: disk 'vda' not ready for pivot yet

In the logs you attached, I see neither of these errors.

We may be hitting a different issue and if so, let's open a dedicated bug for it and provide a detailed and minimal set of steps to reproduce.

Comment 23 Ala Hino 2017-06-25 09:05:08 UTC

As this bug depends on BZ 1438850 that was already modified, and as we believe root cause of the two BZs is the same (LiveMerge fails with libvirtError: Block copy still active. Disk not ready for pivot), I am closing this bug.

Comment 27 Ala Hino 2017-09-12 10:39:16 UTC

No additional info is needed here