Bug 1143042

Summary:

Repeated error "Failed to create VM external-test" when starting new VM

Product:

[Retired] oVirt

Reporter:

Adam Litke <alitke>

Component:

ovirt-engine-core

Assignee:

Piotr Kliczewski <pkliczew>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Pavel Stehlik <pstehlik>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.5

CC:

alitke, amureini, ecohen, fromani, gklein, iheim, lsurette, michal.skrivanek, nsoffer, ofrenkel, oourfali, pkliczew, rbalakri, yeylon

Target Milestone:

---

Target Release:

3.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

infra

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-10-17 12:23:57 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Infra

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1073943

Attachments:

Description	Flags
engine log during failure	none
vdsm log during failure	none

Description Adam Litke 2014-09-17 19:32:29 UTC

Description of problem:

When starting a newly created VM, the webadmin repeatedly displays an error, "Failed to create VM external-test".  The VM appears to run normally.


Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140911091402.gite1c5ffd.fc20.noarch
vdsm-4.16.4-0.fc20.x86_64

How reproducible: Always on one system in my setup


Steps to Reproduce:
1. Create VM
2. Start VM

Actual results:
Error message appears repeatedly in webadmin


Expected results:
VM starts normally with no errors.


Additional info:

Comment 1 Adam Litke 2014-09-17 19:33:34 UTC

Created attachment 938617 [details]
engine log during failure

Comment 2 Adam Litke 2014-09-17 19:34:21 UTC

Created attachment 938618 [details]
vdsm log during failure

Comment 3 Adam Litke 2014-09-17 19:37:41 UTC

After this VM is started it seems that engine is not getting updates about the vm status.  I manually killed the VM and engine insists that it is still running even though vdsm log messages are indicating that the VM does not exist.

Comment 4 Nir Soffer 2014-09-17 21:57:12 UTC

(In reply to Adam Litke from comment #3)
> After this VM is started it seems that engine is not getting updates about
> the vm status.  I manually killed the VM and engine insists that it is still
> running even though vdsm log messages are indicating that the VM does not
> exist.

This may also be a refresh issue in the ui. Did you try to close the browser window and reconnect? Does it change the displayed status?

Comment 5 Oved Ourfali 2014-09-18 10:19:11 UTC

Isn't that a virt issue?
Omer  - can you have a look?

Comment 6 Francesco Romani 2014-09-18 11:09:40 UTC

I believe there is some correlation between this issue and another one I'm facing: https://bugzilla.redhat.com/show_bug.cgi?id=1143968

because it seems to me that the problem somehow is in the VDSM->Engine communication, after the startup of the VM.

Comment 7 Oved Ourfali 2014-09-21 07:05:11 UTC

Okay, I see two different issues here:
1. the VM called HostedEngine seems to exist already in the engine. However, it tries to import it again. The IDs seem to be different, so, Adam, is it possible that you had a dirty engine when testing it? As it doesn't find the ID from the hosted engine VM in the engine, however it is running in VDSM.
2. Now, you created and ran another VM. The request to get the HostedEngine VM details is still running, as it wasn't imported yet, but, it looks like we keep on returning ALL the running VMs, instead of only the hosted engine ones. So, we get your new VM as well, and we try to insert it into the database, which causes these errors.

So, the first issue seems to me like caused by a dirty environment, or at least and engine that already has HostedEngine VM in the database.
The second is the real issue here, that for some reason we request to get a specific list (in this case one) VM details, and we get them all (I didn't find any request that specifies a VM ID)... all requests look like:

Thread-34597::DEBUG::2014-09-17 15:20:44,689::__init__::467::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getVMFullList' in bridge with {}

Piotr - any serialization issue here?
Adam - anything to say about the environment?

Comment 8 Oved Ourfali 2014-09-21 07:57:43 UTC

Found the issue and posted the patch.
I still think there was an issue with the environment, but it helped us reveal another issue.

Comment 9 Adam Litke 2014-09-22 13:54:06 UTC

Just to provide the requested information about the environment requested by Oved in comment #7...

I ran hosted-engine-setup two times.  On the first attempt, it failed near the end of the process with a DNS name resolution issue.  Since I didn't want to have to reinstall the engine VM again, I copied the volume from storage and reran hosted-engine-setup a second time.  On this second time I overwrote the new VM disk wih the volume from the first run.

So I think you are right that we had a dirty environment.  Since the entire hosted-engine setup process takes so long to complete, it'd be nice if we had some robust resume logic where we could retry with a previously installed HostedEngine VM.

Comment 10 Sandro Bonazzola 2014-10-17 12:23:57 UTC

oVirt 3.5 has been released and should include the fix for this issue.