1135775 – Failed to run VM in preview mode

Bug 1135775 - Failed to run VM in preview mode

Summary: Failed to run VM in preview mode

Keywords:
Status:	CLOSED DUPLICATE of bug 1056949
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-core
Sub Component:
Version:	3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.6.0
Assignee:	Arik
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:	virt
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-31 12:39 UTC by Kevin Alon Goldblatt
Modified:	2016-02-10 19:49 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-12-28 17:43:19 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
engine vdsm and server logs (453.91 KB, application/octet-stream) 2014-08-31 12:39 UTC, Kevin Alon Goldblatt	no flags	Details
View All

Description Kevin Alon Goldblatt 2014-08-31 12:39:52 UTC

Created attachment 933122 [details]
engine vdsm and server logs

Description of problem:
When previewing a snapshot from which some disks were permanently removed the VM fails to come up

Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch
vdsm-4.16.1-6.gita4a4614.el6.x86_64

How reproducible:
All the time

Steps to Reproduce:
1. Create VM with 1 disk 7gb, install OS and take snapshot
2. Add 2 new thin file disks, 3gb each, install files system and write some data
3. Create a second snapshot including all 3 disks
4. Delete Permanently the 2 small 3g thin disk from the VM >> Disks tab
5. Select the second snapshot from VM >> Snapshot tab and press the PREVIEW option (This snapshot should include only the 1 disk with the OS as the other disks were permanently removed)
6. When preview is ready start the VM >>>> >FAILS with "Wake up from hibernation failed"

Actual results:
The VM does not come up

Expected results:
The VM should come up with only the one disk 

Additional info:
FROM engine log>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>.

2014-08-31 14:42:35,849 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-33) [494b267] Correlation ID: 494b267, Job ID: c8d15472-3d9a-471b-95bc-15eee4d32f29, Call Stack: null, C
ustom Event ID: -1, Message: VM vm12 was started by admin (Host: nott-vds1).
2014-08-31 14:42:37,490 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-3) [6d3d84f7] START, DestroyVDSCommand(HostName = nott-vds1, HostId = 5238d90a-b9b2-4d96-a6f7-f5ab1b6afaee, vmId=9
d69d3a7-a175-4acc-b375-768274f38dd7, force=false, secondsToWait=0, gracefully=false, reason=), log id: 58e916e7
2014-08-31 14:42:37,557 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-3) [6d3d84f7] FINISH, DestroyVDSCommand, log id: 58e916e7
2014-08-31 14:42:37,575 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-3) [6d3d84f7] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm12 is down with 
error. Exit message: Wake up from hibernation failed.
2014-08-31 14:42:37,576 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-3) [6d3d84f7] Running on vds during rerun failed vm: null
2014-08-31 14:42:37,577 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-3) [6d3d84f7] VM vm12 (9d69d3a7-a175-4acc-b375-768274f38dd7) is running in db and not running in VDS nott-vds1
2014-08-31 14:42:37,583 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-3) [6d3d84f7] Rerun vm 9d69d3a7-a175-4acc-b375-768274f38dd7. Called from vds nott-vds1
2014-08-31 14:42:37,600 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-46) [6d3d84f7] Correlation ID: 494b267, Job ID: c8d15472-3d9a-471b-95bc-15eee4d32f29, Call Stack: null, 
Custom Event ID: -1, Message: Failed to run VM vm12 on Host nott-vds1.
2014-08-31 14:42:37,606 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-46) [6d3d84f7] Lock Acquired to object EngineLock [exclusiveLocks= key: 9d69d3a7-a175-4acc-b375-768274f38dd7 value: VM
, sharedLocks= ]
2014-08-31 14:42:37,700 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-8-thread-46) [6d3d84f7] START, IsVmDuringInitiatingVDSCommand( vmId = 9d69d3a7-a175-4acc-b375-768274f38dd7), log id: 11
388b70
2014-08-31 14:42:37,701 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (org.ovirt.thread.pool-8-thread-46) [6d3d84f7] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 11388b70
2014-08-31 14:42:37,711 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-46) [6d3d84f7] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__
RUN,VAR__TYPE__VM,SCHEDULING_NO_HOSTS
2014-08-31 14:42:37,712 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.pool-8-thread-46) [6d3d84f7] Lock freed to object EngineLock [exclusiveLocks= key: 9d69d3a7-a175-4acc-b375-768274f38dd7 value: VM
, sharedLocks= ]
2014-08-31 14:42:37,725 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-46) [6d3d84f7] Correlation ID: 494b267, Job ID: c8d15472-3d9a-471b-95bc-15eee4d32f29, Call Stack: null, 
Custom Event ID: -1, Message: Failed to run VM vm12 (User: admin).

Comment 1 Oved Ourfali 2014-09-01 08:12:24 UTC

Seems like you don't have available hosts to run this VM.
Can you verify that?

Comment 2 Omer Frenkel 2014-09-01 08:19:22 UTC

well not sure its a bug, you took live snapshot with memory, then you change the vm hardware and try to resume to the same state, i wouldn't expect it to work..

just to be sure, if you try to run it again (after it fails once), it will start, right?

Comment 3 Omer Frenkel 2014-09-01 13:02:19 UTC

thinking more on that, engine should identify that the configuration has changed and is different from the snapshot, and dont use the memory volumes, but a fresh start is needed

Comment 4 Kevin Alon Goldblatt 2014-09-18 14:53:23 UTC

The bug here is that when in Preview mode the VM failed to come up.
The disk that still remained was the OS, so it should have come up. Just because 2 other disks were deleted should not prevent the VM from coming up. The VM would not come up in the Preview state at all.

Comment 5 Kevin Alon Goldblatt 2014-09-18 14:54:59 UTC

The bug here is that when in Preview mode the VM failed to come up.
The disk that still remained was the OS, so it should have come up. Just because 2 other disks were deleted should not prevent the VM from coming up. The VM would not come up in the Preview state at all.

Comment 6 Arik 2014-12-25 14:21:19 UTC

(In reply to Omer Frenkel from comment #3)
I think there is a broader problem which is not related only to the memory:
Say you created a live-snapshot for a VM with 1 disk, no memory, and boot options that include only that disk. Then you remove this disk and add new disk. You will be able to run the VM, but you won't be able to run the VM after restoring the snapshot..

IMO it will be easier to solve it by handle it when removing a device from VM with snapshot. we have couple of options we can do when the user removes a device (either disk/network or other):
1. to warn him that previous snapshots might not work and continue as today
2. to tell the user that if he chooses to remove the device, previous snapshots will be removed.
3. to have a more advanced logic, for example: if the user removes non-bootable disk we'll remove the memory from all the snapshots with memory that include this disk.

#3 is possible for disks because of the relation between disks and (vm) snapshots. it will be more complex with networks for example.

I can solve this particular case by removing the memory from all snapshots which are related to one of the volumes of a deleted (detached?) disk, but it won't be general for network removal or the case I described above.
Allon, what do you think?

Comment 7 Allon Mureinik 2014-12-28 14:43:16 UTC

#3, IMHO is a waste of effort - we need to completely rework the concept of "boot disk/device" anyway, so I wouldn't put any effort in over-complicating it.

Frankly, I don't think anything should be done with this BZ - the fact that we allow modifying devices means that memory snapshots are invalidated. Shouldn't we just snpashot the devices too?

Comment 8 Arik 2014-12-28 15:52:22 UTC

(In reply to Allon Mureinik from comment #7)
> Shouldn't we just snpashot the devices too?
but what will it mean for disks? we are taking a snapshot for each disk, but once we remove the disk all those disk-snapshots are removed as well and all the previous snapshots (all snapshots, in particular snapshots with memory) might become invalid.

Comment 9 Allon Mureinik 2014-12-28 16:04:43 UTC

(In reply to Arik from comment #8)
> (In reply to Allon Mureinik from comment #7)
> > Shouldn't we just snpashot the devices too?
> but what will it mean for disks? we are taking a snapshot for each disk, but
> once we remove the disk all those disk-snapshots are removed as well and all
> the previous snapshots (all snapshots, in particular snapshots with memory)
> might become invalid.
Oh, I agree completely - this is a stupid behavior, created by necessity in 3.1.0.
In other words, you're saying we need to implement the RFE in bug 1056949 in order to solve this one, right?

Comment 10 Arik 2014-12-28 17:43:19 UTC

(In reply to Allon Mureinik from comment #9)
> (In reply to Arik from comment #8)
> > (In reply to Allon Mureinik from comment #7)
> > > Shouldn't we just snpashot the devices too?
> > but what will it mean for disks? we are taking a snapshot for each disk, but
> > once we remove the disk all those disk-snapshots are removed as well and all
> > the previous snapshots (all snapshots, in particular snapshots with memory)
> > might become invalid.
> Oh, I agree completely - this is a stupid behavior, created by necessity in
> 3.1.0.
> In other words, you're saying we need to implement the RFE in bug 1056949 in
> order to solve this one, right?

Exactly, I was not aware of bz 1056949.
Thanks

*** This bug has been marked as a duplicate of bug 1056949 ***

Note You need to log in before you can comment on or make changes to this bug.