Bug 1143042

Summary: Repeated error "Failed to create VM external-test" when starting new VM
Product: [Retired] oVirt Reporter: Adam Litke <alitke>
Component: ovirt-engine-coreAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5CC: alitke, amureini, ecohen, fromani, gklein, iheim, lsurette, michal.skrivanek, nsoffer, ofrenkel, oourfali, pkliczew, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:23:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1073943    
Attachments:
Description Flags
engine log during failure
none
vdsm log during failure none

Description Adam Litke 2014-09-17 19:32:29 UTC
Description of problem:

When starting a newly created VM, the webadmin repeatedly displays an error, "Failed to create VM external-test".  The VM appears to run normally.


Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140911091402.gite1c5ffd.fc20.noarch
vdsm-4.16.4-0.fc20.x86_64

How reproducible: Always on one system in my setup


Steps to Reproduce:
1. Create VM
2. Start VM

Actual results:
Error message appears repeatedly in webadmin


Expected results:
VM starts normally with no errors.


Additional info:

Comment 1 Adam Litke 2014-09-17 19:33:34 UTC
Created attachment 938617 [details]
engine log during failure

Comment 2 Adam Litke 2014-09-17 19:34:21 UTC
Created attachment 938618 [details]
vdsm log during failure

Comment 3 Adam Litke 2014-09-17 19:37:41 UTC
After this VM is started it seems that engine is not getting updates about the vm status.  I manually killed the VM and engine insists that it is still running even though vdsm log messages are indicating that the VM does not exist.

Comment 4 Nir Soffer 2014-09-17 21:57:12 UTC
(In reply to Adam Litke from comment #3)
> After this VM is started it seems that engine is not getting updates about
> the vm status.  I manually killed the VM and engine insists that it is still
> running even though vdsm log messages are indicating that the VM does not
> exist.

This may also be a refresh issue in the ui. Did you try to close the browser window and reconnect? Does it change the displayed status?

Comment 5 Oved Ourfali 2014-09-18 10:19:11 UTC
Isn't that a virt issue?
Omer  - can you have a look?

Comment 6 Francesco Romani 2014-09-18 11:09:40 UTC
I believe there is some correlation between this issue and another one I'm facing: https://bugzilla.redhat.com/show_bug.cgi?id=1143968

because it seems to me that the problem somehow is in the VDSM->Engine communication, after the startup of the VM.

Comment 7 Oved Ourfali 2014-09-21 07:05:11 UTC
Okay, I see two different issues here:
1. the VM called HostedEngine seems to exist already in the engine. However, it tries to import it again. The IDs seem to be different, so, Adam, is it possible that you had a dirty engine when testing it? As it doesn't find the ID from the hosted engine VM in the engine, however it is running in VDSM.
2. Now, you created and ran another VM. The request to get the HostedEngine VM details is still running, as it wasn't imported yet, but, it looks like we keep on returning ALL the running VMs, instead of only the hosted engine ones. So, we get your new VM as well, and we try to insert it into the database, which causes these errors.

So, the first issue seems to me like caused by a dirty environment, or at least and engine that already has HostedEngine VM in the database.
The second is the real issue here, that for some reason we request to get a specific list (in this case one) VM details, and we get them all (I didn't find any request that specifies a VM ID)... all requests look like:

Thread-34597::DEBUG::2014-09-17 15:20:44,689::__init__::467::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getVMFullList' in bridge with {}

Piotr - any serialization issue here?
Adam - anything to say about the environment?

Comment 8 Oved Ourfali 2014-09-21 07:57:43 UTC
Found the issue and posted the patch.
I still think there was an issue with the environment, but it helped us reveal another issue.

Comment 9 Adam Litke 2014-09-22 13:54:06 UTC
Just to provide the requested information about the environment requested by Oved in comment #7...

I ran hosted-engine-setup two times.  On the first attempt, it failed near the end of the process with a DNS name resolution issue.  Since I didn't want to have to reinstall the engine VM again, I copied the volume from storage and reran hosted-engine-setup a second time.  On this second time I overwrote the new VM disk wih the volume from the first run.

So I think you are right that we had a dirty environment.  Since the entire hosted-engine setup process takes so long to complete, it'd be nice if we had some robust resume logic where we could retry with a previously installed HostedEngine VM.

Comment 10 Sandro Bonazzola 2014-10-17 12:23:57 UTC
oVirt 3.5 has been released and should include the fix for this issue.