Bug 1007427

Summary: vm disappeared after reboot
Product: [Retired] oVirt Reporter: Hans-Joachim <dd8ne>
Component: ovirt-engine-coreAssignee: Roy Golan <rgolan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.3CC: acathrow, iheim, jbw, michal.skrivanek, mst, oschreib, sbonazzo, yeylon, yzaslavs
Target Milestone: ---   
Target Release: 3.3.1   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-25 11:49:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 918494    

Description Hans-Joachim 2013-09-12 13:20:14 UTC
Description of problem:
After a reboot of both Engine + Host, all VMs disappered (even in the database)

Version-Release number of selected component (if applicable):
oVirt Engine Version: 3.3.0-2.el6 vdsm-4.12.1-2.el6
Storage Domain: iSCSI

How reproducible:
so far each time....

Steps to Reproduce:
1. Import VMs from EXPORT Domain
2. Powerdown Host
3. Powerdown Engine
4. Powerup Engine


Actual results:
all VMs definitions are lost

Expected results:
VMs definitions should 'survive' a reboot 

Additional info:
Disks are not affected

This happens even if the Host is up. In this case, there is a message:
Failed to import Vm VM1 to Data Center Default, Cluster Default

Comment 1 James Wilson 2013-09-13 10:21:02 UTC
I can concur.  This exact thing is happening to me.  I'm trying to import oVirt 3.2 VM's from an export domain.  The import is successful, to the point I can fire up the VM's.

Restart ovirt-engine and the VM is gone.  Prior to the latest ovirt-engine update, only the VM container was removed and the disk remained.  Since updating to ovirt-engine-3.3.0-3.el6.noarch a few moments ago, the container still gets removed, but now the disk is gone too!

CentOS 6.4 x86_64 / oVirt 3.3

Steps to Reproduce:

1) Import VM from an export domain into the cluster. 
2) Wait for Import successful message
3) View the machine in the VM overview of the cluster
(Machine can even be started successfully)
4) Stop and start ovirt-engine
5) Machine(s) are missing
6) Cluster displays failure message

On restarting ovirt-engine, the following is logged:

2013-09-13 11:17:51,886 INFO  [org.ovirt.engine.core.bll.ImportVmCommand] (pool-6-thread-3) Lock freed to object EngineLock [exclusiveLocks= key: lb.test.lan value: VM_NAME
, sharedLocks= key: 18f3477b-0b65-4255-a56c-379f2c1b326a value: REMOTE_VM
]
2013-09-13 11:17:52,065 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-3) Correlation ID: 32298b0a, Call Stack: null, Custom Event ID: -1, Message: Failed to import Vm lb.test.lan to Data Center local_datacenter, Cluster local_cluster
2013-09-13 1

Comment 2 James Wilson 2013-09-13 10:39:33 UTC
More information on a duplicate bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1007674

Comment 3 Yair Zaslavsky 2013-09-13 10:45:50 UTC
Already fixed.
I guess did not make it to 3.3.0-2.el6

Explanation -

We need to revisit transaction management in some of our commands.

What happened is that due to transactivity issue - for import vm command async tasks were created with vdsm_task_id which is the empty guid - there is a part in the code that puts the transaction in suspend, then there is a query to get the async task - since it was not commited, the query returns 0, and a new async task is inserted.

After commands were finished successfully, these left overs remained in db.

The user restarted engine. Async Task manager detected that there are tasks for import vm command. Since the vdsm task id is 0 (which is bad) it ran the end with failure treatment , which in turn erased the imported vms from db (as it should for "proper failures").

Comment 4 Yair Zaslavsky 2013-09-13 10:47:46 UTC
*** Bug 1007674 has been marked as a duplicate of this bug. ***

Comment 5 James Wilson 2013-09-13 11:01:38 UTC
Looking at http://gerrit.ovirt.org/#/c/17582/ - I imported a VM, manually modified the vdsmTaskIds entry, which was set to null.  Restarted ovirt-engine, and the VM remains.

This is also successful, if post import and before restarting overt-engine, the task is deleted completely from the async_tasks table.

The VM remains.

Will this patch make it into Beta?

Comment 6 Itamar Heim 2013-09-15 07:42:01 UTC
GA is due. this should make GA.
yair - was it cloned to 3.3 branch?
Ofer - fyi

Comment 7 Yair Zaslavsky 2013-09-15 07:44:17 UTC
Itamar, 
This was cloned to the 3.3 branch -

http://gerrit.ovirt.org/#/c/18959/

Comment 8 Sandro Bonazzola 2013-11-25 11:49:25 UTC
oVirt 3.3.1 has been released