Bug 1215845

Summary: NPE when cloning a VM from snapshot WITHOUT "VirtIO-SCSI Enabled"
Product: Red Hat Enterprise Virtualization Manager Reporter: Anand Nande <anande>
Component: ovirt-engineAssignee: Amit Aviram <aaviram>
Status: CLOSED ERRATA QA Contact: lkuchlan <lkuchlan>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.0CC: aaviram, acanan, aefrat, amureini, anande, lsurette, michal.skrivanek, rbalakri, Rhev-m-bugs, slitmano, yeylon, ykaul, ylavi
Target Milestone: ovirt-3.6.0-rcKeywords: ZStream
Target Release: 3.6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1220282 (view as bug list) Environment:
Last Closed: 2016-03-09 21:05:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1226622    
Bug Blocks: 1177156, 1220282    
Attachments:
Description Flags
engine log none

Comment 1 Omer Frenkel 2015-04-28 07:13:53 UTC
Allon, can someone from your team take a look?
although it might be a result of the db restoration, the failure is around relatively recently-changed code around images processing.

Comment 2 Allon Mureinik 2015-04-28 07:29:18 UTC
Omer, yes, I agree, it definitely looks like this. Taking to storage to research. If we conclude this is not the issue, we'll move it to the appropriate team.

Amit - this seems like the code you recently changed:
2015-04-22 09:12:22,933 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler_Worker-46) Cleared all tasks of pool 283b8890-6387-4df9-b76d-65b07a22b74c.
2015-04-22 09:12:25,800 INFO  [org.ovirt.engine.core.bll.AddVmFromSnapshotCommand] (ajp-/127.0.0.1:8702-6) [73df0964] Lock Acquired to object EngineLock [exclusiveLocks= key: a21d6102-1e90-4735-88fa-4f7b0a158356 value: VM
key: R6_cportaldevl value: VM_NAME
, sharedLocks= ]
2015-04-22 09:12:26,015 ERROR [org.ovirt.engine.core.bll.AddVmFromSnapshotCommand] (ajp-/127.0.0.1:8702-6) [73df0964] Error during CanDoActionFailure.: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.AddVmFromSnapshotCommand.getDestintationDomainTypeFromDisk(AddVmFromSnapshotCommand.java:113) [bll.jar:]
        at org.ovirt.engine.core.bll.AddVmFromSnapshotCommand.adjustDisksImageConfiguration(AddVmFromSnapshotCommand.java:105) [bll.jar:]

Can you take a look please?

Comment 3 Eyal Edri 2015-04-28 11:23:39 UTC
moving to 3.5.4 due to capacity planning for 3.5.3.
if you believe this should remain in 3.5.3, please sync with pm/dev/qe and a full triple ack for it. also - ensure priority is set accordingly to the bug status.

Comment 4 Amit Aviram 2015-04-28 11:29:29 UTC
Hi, The scenario does not reproduce in our environment.. cloning the snapshot works for us.. 
Can you please supply the pgdump of your environment so we could check if the db is not corrupted?

Thanks.

Comment 5 Amit Aviram 2015-04-28 11:33:57 UTC
Sorry, changed the target release by mistake.

Comment 6 sefi litmanovich 2015-05-03 11:09:47 UTC
hey guys,

I had the same bug as well in my environment (vt 13.4) which didn't include any db restore at all.
further more, the bug reproduces only when trying to clone a vm from a snapshot without memory saved. when I created a live snapshot with memory, I was able to clone from that snapshot.
I can reproduce this always so let me know if you want to see this live.
I'll attach engine.log with both scenarios.

Comment 7 sefi litmanovich 2015-05-03 11:11:36 UTC
Created attachment 1021308 [details]
engine log

Comment 8 Aharon Canan 2015-05-04 11:38:43 UTC
Is it dup of https://bugzilla.redhat.com/show_bug.cgi?id=1201268 ?

Comment 9 Allon Mureinik 2015-05-04 15:06:33 UTC
(In reply to Aharon Canan from comment #8)
> Is it dup of https://bugzilla.redhat.com/show_bug.cgi?id=1201268 ?

No.
In bug 1201268 VDSM attempts to execute a qemu-img convert operation, and fails (possibly a dup of bug 1209034, pending confirmation).

This bug is about an NPE in the engine's logic of calculating what disk should be copied where, before reaching VDSM.

Comment 11 Amit Aviram 2015-05-10 07:31:55 UTC
After probing the issue a little bit, we have found the way to reproduce it:

In the "Clone" dialog, in "Resource Alloctaion" tab and under "Disks Allocation:"
"VirtIO-SCSI Enabled" should NOT be marked. currently it causes a NPE in the master branch as well.

Aharaon, verifying should be with this scenario, please ack.

Comment 13 Allon Mureinik 2015-05-19 12:55:35 UTC
*** Bug 1222717 has been marked as a duplicate of this bug. ***

Comment 15 lkuchlan 2015-05-31 08:35:49 UTC
Blocked from testing
https://bugzilla.redhat.com/show_bug.cgi?id=1226622

Comment 16 Michal Skrivanek 2015-06-05 13:44:36 UTC
lkuchlan, "Depends On X" means that in order to test *this* bug you need the bug X fixed first. "Blocks" is the opposite

Comment 17 lkuchlan 2015-06-07 11:39:44 UTC
Tested using:
ovirt-engine-3.6.0-0.0.master.20150519172219.git9a2e2b3.el6.noarch
vdsm-4.17.0-822.git9b11a18.el7.noarch

Verification instructions:
1. Create a cloning VM from a snapshot

Results:
Clone VM from a snapshot only works while the VM is NOT running

Comment 18 Amit Aviram 2015-06-15 11:45:40 UTC
Still looking into Anand remark, apparently the bug occurs in other cases as well. Maybe it worth opening a new bug for it.

Comment 19 Allon Mureinik 2015-06-15 12:16:23 UTC
(In reply to Amit Aviram from comment #18)
> Still looking into Anand remark, apparently the bug occurs in other cases as
> well. Maybe it worth opening a new bug for it.
We had one buggy flow we know is fixed (you fixed it and Liron K verified it).
If there's an additional issue, please open a different bugs for it.

Comment 22 errata-xmlrpc 2016-03-09 21:05:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html