Bug 1148583 - [engine-backend] Create command is reported as failed although the VM was started successfully
Summary: [engine-backend] Create command is reported as failed although the VM was sta...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Francesco Romani
QA Contact: Gonza
URL:
Whiteboard: virt
: 1154393 1156049 1163073 (view as bug list)
Depends On: 1143968
Blocks: rhev35rcblocker
TreeView+ depends on / blocked
 
Reported: 2014-10-01 19:14 UTC by Elad
Modified: 2015-02-16 13:41 UTC (History)
21 users (show)

Fixed In Version: vdsm-4.16.7.6-1.el6ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-16 13:41:14 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine, vdsm and qemu logs (966.00 KB, application/x-gzip)
2014-10-01 19:14 UTC, Elad
no flags Details
logs-23.10 (812.62 KB, application/x-gzip)
2014-10-23 14:25 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 34147 0 master MERGED vm: return full copy of the conf data on status() Never
oVirt gerrit 34982 0 ovirt-3.5 MERGED vm: return full copy of the conf data on status() Never

Description Elad 2014-10-01 19:14:43 UTC
Created attachment 943187 [details]
engine, vdsm and qemu logs

Description of problem:
Start VM operation is reported as failed although the VM was started successfully. Took 3 minutes for the VM to actually start on the host 

Version-Release number of selected component (if applicable):
rhev3.5 vt4

Host: RHEL6.5
vdsm-4.16.5-2.el6ev.x86_64
libvirt-0.10.2-29.el6_5.12.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64

rhevm-3.5.0-0.13.beta.el6ev.noarch


How reproducible:
Not always

Steps to Reproduce:
1. On a local DC, with 1 host
2. Create a VM and start it.

Actual results:
Create command is called:

2014-10-01 21:48:52,463 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand]

VDSM gets the request:
Thread-11759::DEBUG::2014-10-01 21:48:54,789::__init__::467::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.create'

The qemu process starts on the host
From qemu log:
2014-10-01 18:48:54.888+0000: starting up

After 3 minutes, the operation is reported as failed by engine:

2014-10-01 21:51:52,700 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (org.ovirt.thread.pool-7-thread-37) [388a73f2] HostName = green-vdsc
2014-10-01 21:51:52,700 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (org.ovirt.thread.pool-7-thread-37) [388a73f2] Command CreateVDSCommand(HostName = green-vdsc, HostId = c14677f4-ffb6-4f08-b3b2-112f0c60e918, vmId=2cd9533d-c322-429e-acad-9d59fe34f76f, vm=VM [lo1]) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues'
2014-10-01 21:51:52,700 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (org.ovirt.thread.pool-7-thread-37) [388a73f2] FINISH, CreateVDSCommand, log id: 530b7000
2014-10-01 21:51:52,700 ERROR [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (org.ovirt.thread.pool-7-thread-37) [388a73f2] VDS::create Failed creating vm lo1 in vds = c14677f4-ffb6-4f08-b3b2-112f0c60e918 : green-vdsc error = org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues'

Expected results:
Engine shouldn't report the operation as failed if it succeeded.

Additional info: engine, vdsm and qemu logs

Comment 1 Elad 2014-10-07 05:47:56 UTC
CORRECTION:
Instead of -  "Took 3 minutes for the VM to actually start on the host"
It should be - "The VM was started immediately on the host"

Comment 2 Arik 2014-10-19 10:55:32 UTC
Martin, can you please take a look? I think it happens because of the following errors in VDSM:

Thread-11762::DEBUG::2014-10-01 21:48:55,465::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 74 edom: 10 level: 2 message: argument unsupported: QEMU driver does not support <metadata> elem
ent
Thread-11762::ERROR::2014-10-01 21:48:55,466::vm::373::vm.Vm::(_sampleCpuTune) vmId=`2cd9533d-c322-429e-acad-9d59fe34f76f`::libvirt does not support metadata

Comment 3 Martin Sivák 2014-10-20 09:19:08 UTC
No it isn't, this is just an annoying message that will go away in CentOS 6.6. We handle that gracefully (and this comes from the stats thread btw).

Comment 4 Michal Skrivanek 2014-10-21 08:06:21 UTC
seems like you lost the response from vdsm. The message is lost.
How come there is no connection error at least on the vdsm side? It looks like UDP…is it a proper json-rpc behavior?

Comment 5 Piotr Kliczewski 2014-10-21 08:55:58 UTC
The communication is based on TCP. Looking at the logs I can see that there is weird behavior in the engine side. The engine did nothing for 3 mins.

Can you please provide logs with engine debug enabled. Please set debug for org.ovirt logger.

Comment 6 Elad 2014-10-23 12:18:37 UTC
(In reply to Piotr Kliczewski from comment #5)
> The communication is based on TCP. Looking at the logs I can see that there
> is weird behavior in the engine side. The engine did nothing for 3 mins.
> 
> Can you please provide logs with engine debug enabled. Please set debug for
> org.ovirt logger.

I'm unable to reproduce the issue, therefore I can't provide the requested information

Comment 7 Piotr Kliczewski 2014-10-23 13:02:00 UTC
From provided information I am not able to understand what is the issue. Are you ok with closing this bug and reopening if we will have more information?

Comment 8 Michal Skrivanek 2014-10-23 14:21:41 UTC
makes sense. please reopen if it happens again. Check for engine's behavior, maybe short of disk space, maybe some other stress conditions, maybe network is flaky...

Comment 9 Elad 2014-10-23 14:25:34 UTC
Created attachment 949941 [details]
logs-23.10

We've encountered the bug again in other setup. Uploading the logs.
I don't have an upstream engine build from sources, so I can't check it in debug mode. 

Anyway, this is cerntenly a serius bug which has to be fixed.

2014-10-23 17:18:10,551 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (ajp-/127.0.0.1:8702-11) [3d1695e6] Command CreateVDSCommand(HostName = camel-vdsb, HostId = 0707087c-d7b0-4b12-8f73-fc8b8
3cd42e8, vmId=32e8a529-6e21-4b73-a803-e21dfca56cfd, vm=VM [vm_snap]) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication
 issues

Comment 10 Piotr Kliczewski 2014-10-24 11:53:09 UTC
I can see that it is failing but I am not really sure what caused the failure.
Today we merged jsonrpc version updated [1]. Thanks to this change we have number of fixes and improvements. Now when it fails we will see what was caused the failure thanks to [2]. Can you please test with the latest 3.5?


[1] http://gerrit.ovirt.org/#/c/34433/
[2] http://gerrit.ovirt.org/#/c/33888/

Comment 11 Omer Frenkel 2014-10-26 09:14:39 UTC
*** Bug 1156049 has been marked as a duplicate of this bug. ***

Comment 12 Omer Frenkel 2014-10-26 09:32:12 UTC
*** Bug 1154393 has been marked as a duplicate of this bug. ***

Comment 13 Omer Frenkel 2014-10-26 09:34:27 UTC
moving to infra, as seems like a general communication/infra issue
please move back to virt if you think otherwise

Comment 14 Piotr Kliczewski 2014-10-29 15:35:22 UTC
We noticed that response from vm.create is updated after returned which causes RuntimeError 'dictionary changed size during iteration' during marshaling to json.

Comment 15 Nir Soffer 2014-11-06 12:30:43 UTC
I think the correct fix is: http://gerrit.ovirt.org/34147

Comment 16 Piotr Kliczewski 2014-11-17 13:17:13 UTC
*** Bug 1163073 has been marked as a duplicate of this bug. ***

Comment 17 Francesco Romani 2014-11-24 08:19:28 UTC
patches merged on the relevant branches

Comment 18 Nisim Simsolo 2014-12-07 11:59:42 UTC
Verified many times using the next builds:
rhevm-3.5.0-0.21.el6ev.noarch

qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
libvirt-0.10.2-46.el6_6.2.x86_64
sanlock-2.8-1.el6.x86_64
vdsm-4.16.8.1-2.el6ev.x86_64


Note You need to log in before you can comment on or make changes to this bug.