Bug 1143042 - Repeated error "Failed to create VM external-test" when starting new VM
Summary: Repeated error "Failed to create VM external-test" when starting new VM
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Piotr Kliczewski
QA Contact: Pavel Stehlik
URL:
Whiteboard: infra
Depends On:
Blocks: 1073943
TreeView+ depends on / blocked
 
Reported: 2014-09-17 19:32 UTC by Adam Litke
Modified: 2016-02-10 19:30 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-17 12:23:57 UTC
oVirt Team: Infra


Attachments (Terms of Use)
engine log during failure (12.85 MB, text/plain)
2014-09-17 19:33 UTC, Adam Litke
no flags Details
vdsm log during failure (9.74 MB, text/plain)
2014-09-17 19:34 UTC, Adam Litke
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 33042 master MERGED Ignored not empty collections Never
oVirt gerrit 33134 master ABANDONED core: fix withOptionalParameterAsList logic Never

Description Adam Litke 2014-09-17 19:32:29 UTC
Description of problem:

When starting a newly created VM, the webadmin repeatedly displays an error, "Failed to create VM external-test".  The VM appears to run normally.


Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140911091402.gite1c5ffd.fc20.noarch
vdsm-4.16.4-0.fc20.x86_64

How reproducible: Always on one system in my setup


Steps to Reproduce:
1. Create VM
2. Start VM

Actual results:
Error message appears repeatedly in webadmin


Expected results:
VM starts normally with no errors.


Additional info:

Comment 1 Adam Litke 2014-09-17 19:33:34 UTC
Created attachment 938617 [details]
engine log during failure

Comment 2 Adam Litke 2014-09-17 19:34:21 UTC
Created attachment 938618 [details]
vdsm log during failure

Comment 3 Adam Litke 2014-09-17 19:37:41 UTC
After this VM is started it seems that engine is not getting updates about the vm status.  I manually killed the VM and engine insists that it is still running even though vdsm log messages are indicating that the VM does not exist.

Comment 4 Nir Soffer 2014-09-17 21:57:12 UTC
(In reply to Adam Litke from comment #3)
> After this VM is started it seems that engine is not getting updates about
> the vm status.  I manually killed the VM and engine insists that it is still
> running even though vdsm log messages are indicating that the VM does not
> exist.

This may also be a refresh issue in the ui. Did you try to close the browser window and reconnect? Does it change the displayed status?

Comment 5 Oved Ourfali 2014-09-18 10:19:11 UTC
Isn't that a virt issue?
Omer  - can you have a look?

Comment 6 Francesco Romani 2014-09-18 11:09:40 UTC
I believe there is some correlation between this issue and another one I'm facing: https://bugzilla.redhat.com/show_bug.cgi?id=1143968

because it seems to me that the problem somehow is in the VDSM->Engine communication, after the startup of the VM.

Comment 7 Oved Ourfali 2014-09-21 07:05:11 UTC
Okay, I see two different issues here:
1. the VM called HostedEngine seems to exist already in the engine. However, it tries to import it again. The IDs seem to be different, so, Adam, is it possible that you had a dirty engine when testing it? As it doesn't find the ID from the hosted engine VM in the engine, however it is running in VDSM.
2. Now, you created and ran another VM. The request to get the HostedEngine VM details is still running, as it wasn't imported yet, but, it looks like we keep on returning ALL the running VMs, instead of only the hosted engine ones. So, we get your new VM as well, and we try to insert it into the database, which causes these errors.

So, the first issue seems to me like caused by a dirty environment, or at least and engine that already has HostedEngine VM in the database.
The second is the real issue here, that for some reason we request to get a specific list (in this case one) VM details, and we get them all (I didn't find any request that specifies a VM ID)... all requests look like:

Thread-34597::DEBUG::2014-09-17 15:20:44,689::__init__::467::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getVMFullList' in bridge with {}

Piotr - any serialization issue here?
Adam - anything to say about the environment?

Comment 8 Oved Ourfali 2014-09-21 07:57:43 UTC
Found the issue and posted the patch.
I still think there was an issue with the environment, but it helped us reveal another issue.

Comment 9 Adam Litke 2014-09-22 13:54:06 UTC
Just to provide the requested information about the environment requested by Oved in comment #7...

I ran hosted-engine-setup two times.  On the first attempt, it failed near the end of the process with a DNS name resolution issue.  Since I didn't want to have to reinstall the engine VM again, I copied the volume from storage and reran hosted-engine-setup a second time.  On this second time I overwrote the new VM disk wih the volume from the first run.

So I think you are right that we had a dirty environment.  Since the entire hosted-engine setup process takes so long to complete, it'd be nice if we had some robust resume logic where we could retry with a previously installed HostedEngine VM.

Comment 10 Sandro Bonazzola 2014-10-17 12:23:57 UTC
oVirt 3.5 has been released and should include the fix for this issue.


Note You need to log in before you can comment on or make changes to this bug.