Description of problem: When starting a newly created VM, the webadmin repeatedly displays an error, "Failed to create VM external-test". The VM appears to run normally. Version-Release number of selected component (if applicable): ovirt-engine-3.5.0-0.0.master.20140911091402.gite1c5ffd.fc20.noarch vdsm-4.16.4-0.fc20.x86_64 How reproducible: Always on one system in my setup Steps to Reproduce: 1. Create VM 2. Start VM Actual results: Error message appears repeatedly in webadmin Expected results: VM starts normally with no errors. Additional info:
Created attachment 938617 [details] engine log during failure
Created attachment 938618 [details] vdsm log during failure
After this VM is started it seems that engine is not getting updates about the vm status. I manually killed the VM and engine insists that it is still running even though vdsm log messages are indicating that the VM does not exist.
(In reply to Adam Litke from comment #3) > After this VM is started it seems that engine is not getting updates about > the vm status. I manually killed the VM and engine insists that it is still > running even though vdsm log messages are indicating that the VM does not > exist. This may also be a refresh issue in the ui. Did you try to close the browser window and reconnect? Does it change the displayed status?
Isn't that a virt issue? Omer - can you have a look?
I believe there is some correlation between this issue and another one I'm facing: https://bugzilla.redhat.com/show_bug.cgi?id=1143968 because it seems to me that the problem somehow is in the VDSM->Engine communication, after the startup of the VM.
Okay, I see two different issues here: 1. the VM called HostedEngine seems to exist already in the engine. However, it tries to import it again. The IDs seem to be different, so, Adam, is it possible that you had a dirty engine when testing it? As it doesn't find the ID from the hosted engine VM in the engine, however it is running in VDSM. 2. Now, you created and ran another VM. The request to get the HostedEngine VM details is still running, as it wasn't imported yet, but, it looks like we keep on returning ALL the running VMs, instead of only the hosted engine ones. So, we get your new VM as well, and we try to insert it into the database, which causes these errors. So, the first issue seems to me like caused by a dirty environment, or at least and engine that already has HostedEngine VM in the database. The second is the real issue here, that for some reason we request to get a specific list (in this case one) VM details, and we get them all (I didn't find any request that specifies a VM ID)... all requests look like: Thread-34597::DEBUG::2014-09-17 15:20:44,689::__init__::467::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getVMFullList' in bridge with {} Piotr - any serialization issue here? Adam - anything to say about the environment?
Found the issue and posted the patch. I still think there was an issue with the environment, but it helped us reveal another issue.
Just to provide the requested information about the environment requested by Oved in comment #7... I ran hosted-engine-setup two times. On the first attempt, it failed near the end of the process with a DNS name resolution issue. Since I didn't want to have to reinstall the engine VM again, I copied the volume from storage and reran hosted-engine-setup a second time. On this second time I overwrote the new VM disk wih the volume from the first run. So I think you are right that we had a dirty environment. Since the entire hosted-engine setup process takes so long to complete, it'd be nice if we had some robust resume logic where we could retry with a previously installed HostedEngine VM.
oVirt 3.5 has been released and should include the fix for this issue.