Bug 1236061 - Vm becomes unusable (NPE) when restarting vdsm during snapshot creation
Summary: Vm becomes unusable (NPE) when restarting vdsm during snapshot creation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-3.6.0-rc3
: 3.6.0
Assignee: Shmuel Melamud
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks: 1274717
TreeView+ depends on / blocked
 
Reported: 2015-06-26 12:55 UTC by Carlos Mestre González
Modified: 2016-04-20 01:36 UTC (History)
16 users (show)

Fixed In Version: 3.6.0-11
Doc Type: Bug Fix
Doc Text:
Previously, when VDSM was restarted during VM snapshot creation, it sometimes corrupted the VM and made it unusable. This issue was resolved and now VM is correctly rolled back to the previous state, if snapshot creation is interrupted for any reason.
Clone Of:
: 1274717 (view as bug list)
Environment:
Last Closed: 2016-04-20 01:36:44 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (141.67 KB, text/plain)
2015-06-26 12:56 UTC, Carlos Mestre González
no flags Details
vdsm.log (568.38 KB, text/plain)
2015-06-26 12:56 UTC, Carlos Mestre González
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 43863 0 master MERGED core: Compensation of active snapshot ID change Never
oVirt gerrit 45479 0 ovirt-engine-3.6 MERGED core: Compensation of active snapshot ID change Never
oVirt gerrit 47750 0 ovirt-engine-3.5 MERGED core: Compensation of active snapshot ID change Never

Description Carlos Mestre González 2015-06-26 12:55:49 UTC
Description of problem:
I've been trying this scenario when manually restarting vdsm during a snapshot creation on different times during the process expecting the rollback, and *some times* the rollback doesn't work and leave the vm unusable (corrupted?). Trying to start the vm returns NPE.

Version-Release number of selected component (if applicable):
rhevm-3.5.3.1-1.4.el6ev.noarch

How reproducible:
25%

Steps to Reproduce:
1. create a vm with multiple disks of different types
2. add a snapshot to the vm (all disks)
3. when the engine.log shows the CreateAllSnapshotsFromVmCommand, restart the vdsm in the spm host

Actual results:
- Creation of the snapshots fails as expected, but the vm becomes unusable (cannot be started, create new snapshots), example:

2015-06-26 15:25:13,585 ERROR [org.ovirt.engine.core.bll.RunVmCommand] (ajp-/127.0.0.1:8702-1) [vms_syncAction_5a9b28e3-c382-4271] Command org.ovirt.engine.core.bll.RunVmCommand throw exception: java.lang.NullPointerException
    at org.ovirt.engine.core.bll.RunVmCommand.getMemoryFromSnapshot(RunVmCommand.java:154) [bll.jar:]


Expected results:
- Creation of the snapshots fails and the vm is working.

Additional info:
- Tested using NFS storage domains
- hypervisors RHEL 7.1 with:
vdsm-4.16.20-1.el7ev.x86_64
libvirt-1.2.8-16.el7_1.3.x86_64
qemu-img-rhev-2.1.2-23.el7_1.3.x86_64

Comment 1 Carlos Mestre González 2015-06-26 12:56:33 UTC
Created attachment 1043502 [details]
engine.log

Comment 2 Carlos Mestre González 2015-06-26 12:56:59 UTC
Created attachment 1043503 [details]
vdsm.log

Comment 3 Tal Nisan 2015-06-28 13:35:28 UTC
Seems that the NPE is in this line: 
cachedMemoryVolumeFromSnapshot = archSupportSnapshot && FeatureSupported.memorySnapshot(getVm().getVdsGroupCompatibilityVersion()) ?
getActiveSnapshot().getMemoryVolume() : StringUtils.EMPTY;

Thus I reckon that it's more of a virt-ish issue, Michal, can one of your guys have a look?

Comment 4 Michal Skrivanek 2015-06-28 13:44:53 UTC
Tal, We'll take a look, but it seems to me the actual snapshot is not aborted/reverted correctly. We can surely fix NPE, but it looks like the state of the VM is not correct, and that's more in your area

Comment 5 Tal Nisan 2015-07-06 11:12:39 UTC
Any insights Michal?

Comment 6 Nisim Simsolo 2015-09-22 07:57:17 UTC
Verified: rhevm-3.6.0-0.13.master.el6
vdsm-4.17.6-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-22.el7.x86_64
sanlock-3.2.4-1.el7.x86_64
libvirt-client-1.2.17-5.el7.x86_64

Scenario:
1. create a vm with multiple disks of different types
2. add a snapshot to the vm (all disks)
3. when the engine.log shows the CreateAllSnapshotsFromVmCommand, restart the vdsm in the spm host

Actual result:
VM remains locked until VDSM is running again

4. Wail till VM is available again and preview created snapshot.
5. commit snapshot and verify VM is running properly.


Note You need to log in before you can comment on or make changes to this bug.