Created attachment 938834 [details] scripts and logs of engine and vdsm Description of problem: As addendum to oVirt 3.5 test day (http://www.ovirt.org/OVirt_3.5_TestDay) I was running benchmarks against jsonrpc, using oVirt SDK, which in turn uses the rest API. One test involves starting 100 VMs. The creation fails for timeout: 2014-09-18 05:08:05,922 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] START, CreateVDSCommand(HostName = toji, HostId = 99cdddd1-b34c-45ef-966a-f1459e104284, vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, vm=VM [XS_C000]), log id: 3d0d1f2 2014-09-18 05:08:05,987 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand kvmEnable=true,keyboardLayout=en-us,nice=0,displayNetwork=ovirtmgmt,copyPasteEnable=true,timeOffset=0,transparentHugePages=true,vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4,acpiEnable=true,custom={device_0f11a461-7a40-45d7-a056-19a05c5dfe1edevice_abfbc445-8c08-4c24-b459-749ab7f64411=VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=abfbc445-8c08-4c24-b459-749ab7f64411, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=1}, managed=false, plugged=true, readOnly=false, deviceAlias=channel0, customProperties={}, snapshotId=null}, device_0f11a461-7a40-45d7-a056-19a05c5dfe1edevice_abfbc445-8c08-4c24-b459-749ab7f64411device_8d8ee7e2-cc0e-44ac-8102-cc53717fc740=VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=8d8ee7e2-cc0e-44ac-8102-cc53717fc740, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=2}, managed=false, plugged=true, readOnly=false, deviceAlias=channel1, customProperties={}, snapshotId=null}, device_0f11a461-7a40-45d7-a056-19a05c5dfe1edevice_abfbc445-8c08-4c24-b459-749ab7f64411device_8d8ee7e2-cc0e-44ac-8102-cc53717fc740device_223bdfb0-9c7a-4a06-80af-681c42bc7c64=VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=223bdfb0-9c7a-4a06-80af-681c42bc7c64, device=spicevmc, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=3}, managed=false, plugged=true, readOnly=false, deviceAlias=channel2, customProperties={}, snapshotId=null}, device_0f11a461-7a40-45d7-a056-19a05c5dfe1e=VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=0f11a461-7a40-45d7-a056-19a05c5dfe1e, device=ide, type=CONTROLLER, bootOrder=0, specParams={}, address={slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}, managed=false, plugged=true, readOnly=false, deviceAlias=ide0, customProperties={}, snapshotId=null}},spiceSslCipherSuite=DEFAULT,memSize=16,smp=1,emulatedMachine=rhel6.5.0,vmType=kvm,memGuaranteedSize=16,display=qxl,smartcardEnable=false,bootMenuEnable=false,spiceSecureChannels=smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard,smpCoresPerSocket=1,numaTune={mode=preferred},maxVCpus=16,devices=[{address={bus=0x00, domain=0x0000, type=pci, slot=0x02, function=0x0}, specParams={vram=32768, heads=1}, device=qxl, type=video, deviceId=938a44fb-3af7-4a35-adba-d490bdda8f54}, {shared=false, iface=ide, index=2, address={unit=0, bus=1, target=0, controller=0, type=drive}, specParams={path=}, path=, device=cdrom, type=disk, readonly=true, deviceId=0d35fe8f-98d2-4c4a-8315-51b75fbba576}, {shared=false, index=0, volumeID=851aeb5b-83fd-4135-94e4-5891a8269b9f, propagateErrors=off, format=cow, type=disk, iface=virtio, bootOrder=1, address={bus=0x00, domain=0x0000, type=pci, slot=0x05, function=0x0}, domainID=81d8571c-0871-471f-930b-08685a502d3c, imageID=6a204b39-bebc-48ce-8683-cea8681addc9, specParams={}, optional=false, device=disk, poolID=00000002-0002-0002-0002-000000000064, readonly=false, deviceId=6a204b39-bebc-48ce-8683-cea8681addc9}, {address={bus=0x00, domain=0x0000, type=pci, slot=0x06, function=0x0}, specParams={model=virtio}, device=memballoon, type=balloon, deviceId=f25d3601-3289-4c4e-aa0c-eb2816ca1901}, {index=0, model=virtio-scsi, address={bus=0x00, domain=0x0000, type=pci, slot=0x03, function=0x0}, specParams={}, device=scsi, type=controller, deviceId=9e330005-350e-48fa-b4b6-a6ee18188f4c}, {address={bus=0x00, domain=0x0000, type=pci, slot=0x04, function=0x0}, specParams={}, device=virtio-serial, type=controller, deviceId=29714f6f-4172-4900-aa12-64249611db12}],vmName=XS_C000,cpuType=SandyBridge,fileTransferEnable=true 2014-09-18 05:11:06,446 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] Command org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand return value org.ovirt.engine.core.vdsbroker.vdsbroker.OneVmReturnForXmlRpc@7435de93 2014-09-18 05:11:06,447 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] HostName = toji 2014-09-18 05:11:06,460 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] Command CreateVDSCommand(HostName = toji, HostId = 99cdddd1-b34c-45ef-966a-f1459e104284, vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, vm=VM [XS_C000]) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues' 2014-09-18 05:11:06,462 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] FINISH, CreateVDSCommand, log id: 3d0d1f2 2014-09-18 05:11:06,463 ERROR [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] VDS::create Failed creating vm XS_C000 in vds = 99cdddd1-b34c-45ef-966a-f1459e104284 : toji error = org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues' 2014-09-18 05:11:06,471 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) Connecting to /192.168.1.48 2014-09-18 05:11:06,472 INFO [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] FINISH, CreateVmVDSCommand, return: Down, log id: 3220ef08 2014-09-18 05:11:06,473 INFO [org.ovirt.engine.core.bll.RunVmCommand] (ajp--127.0.0.1-8702-2) [4bf1c540] Lock freed to object EngineLock [exclusiveLocks= key: 5ecc37cf-386f-4363-a9fd-7ed4312c60b4 value: VM On VDSM, however, creation succeeds (see the attached logs). If I switch the host to XML-RPC, everything works as expected Version-Release number of selected component (if applicable): # rpm -qa | grep ovirt-engine ovirt-engine-setup-plugin-ovirt-engine-common-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-backend-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-websocket-proxy-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-webadmin-portal-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-lib-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-extensions-api-impl-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-tools-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-setup-base-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-setup-plugin-websocket-proxy-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-dbscripts-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-setup-plugin-ovirt-engine-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-restapi-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-userportal-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-sdk-python-3.5.0.7-1.fc19.noarch ovirt-engine-setup-3.5.0-0.0.master.20140911085446.gite1c5ffd.fc19.noarch ovirt-engine-cli-3.5.0.5-0.1.20140908.git284eb0d.fc19.noarch TOji> 12:06:53 root [~]$ rpm -qa | grep vdsm vdsm-yajsonrpc-4.16.4-0.el6.noarch vdsm-python-4.16.4-0.el6.noarch vdsm-4.16.4-0.el6.x86_64 vdsm-python-zombiereaper-4.16.4-0.el6.noarch vdsm-cli-4.16.4-0.el6.noarch vdsm-jsonrpc-4.16.4-0.el6.noarch vdsm-debuginfo-4.16.4-0.el6.x86_64 vdsm-xmlrpc-4.16.4-0.el6.noarch How reproducible: Initially worked as expected, now it happens constantly on my environment, although at random points in the startup sequence. I can't fully rule out a misconfiguration of my setup. Steps to Reproduce: 1. create a VM with 16 MB ram, 1 cpu, 1 GB HDD 2. make it a template 3. create 100 VMs from it 4. start them sequentially through rest API See the attached scripts for reference. 'observe.py could be used as template' Actual results: At random points in the sequence, the sequence hangs for timeout error Expected results: All the VMs started correctly Additional info: VDSM runs on RHEL6.5 on bare metal Engine runs on a F19 virtual machine with bridged networking, through libvirt.
Created attachment 938835 [details] simpler script to mass start VMs sequentially
VDSM logs are on Rome time (UTC+1), while Engine is on NY time (UTC-5?). I apologize for the inconvenience.
I checked the logs and here is my observation: Vdsm replies to VM.create in few miliseconds but the engine timeouts not receiving answer for 3 minutes. It seems that we are having bottle neck in the engine code. My first guess is ResponseWorker. Francesco can you run jprofiler and take a snapshot(s) during your test?
(In reply to Piotr Kliczewski from comment #3) > Francesco can you run jprofiler and take a snapshot(s) during your test? Sure thing! there are instructions somewhere I can follow? I failed to find relevant references on the wiki
Created attachment 938860 [details] more logs in the same scenario/testcase, but engine's host timezone fixed fixed engine timezone to match hypervisor one: both on UTC+1 (Europe/Rome) around 13:09 I switched communication from JSON-rpc to XML-rpc. Worked as expected after that.
More observations: - using restapi, stop and status seems to work nicely as with xml-rpc - using webadmin, stop seems to work fine. more important: the same timeout behaviour happened randomly also using webadmin in my environment.
I am not aware of any instructions on the wiki but you can take a look here [1]. I am happy to assist. [1] https://www.ej-technologies.com/products/jprofiler/overview.html
Just to make sure I understand the timeouts: - on the client (the one runningthe API calls) did you get timeout ? - on the engine I see the timeout in the logs above. I wonder at what stage the API call to run VM is returned to the client (not sure this is related ... just making sure I understand the flow). Juan I assume the client gets a response after the engine gets the response from vdsm ? or before rthat ?
Francesco, Just to be on the safe side .... 1 - on what env did you run this test (how many hypervisors) ? 2 - what was the actual time it took the vdsm to spawn the VM (synchronious part of execution) ?
The RESTAPI invokes the RunVm command, and then it may wait or not depending on what the command returns and what the user requested. If the result of the command contains async tasks and the user requested async behavior then the RESTAPI will wait for the async tasks to complete, polling them once per second an unlimited number of times. If the result doesn't return async tasks or the user didn't request async behavior then the RESTAPI will return the result to the user intermediately after the RunVm command finishes.
(In reply to Juan Hernández from comment #10) > The RESTAPI invokes the RunVm command, and then it may wait or not depending > on what the command returns and what the user requested. If the result of > the command contains async tasks and the user requested async behavior then > the RESTAPI will wait for the async tasks to complete, polling them once per > second an unlimited number of times. > > If the result doesn't return async tasks or the user didn't request async > behavior then the RESTAPI will return the result to the user intermediately > after the RunVm command finishes. IIUC In this case does it means that the engine will communicate to VDSM while the client waits for response and will wait till the synchronized part of run VM will be executed and return a response to engine ?
It means that the RESTAPI will repeatedly call the GetTasksStatusesByTasksIDs query, while the client is waiting for a response. I don't know if that query involves communication to VDSM.
(In reply to Juan Hernández from comment #12) > It means that the RESTAPI will repeatedly call the > GetTasksStatusesByTasksIDs query, while the client is waiting for a > response. I don't know if that query involves communication to VDSM. Running a VM is not a defined async task ... but it is async by nature. Omer can you please answer the question in comment #11.
yes the client wait during the execute which is the sync part for run vm (not stateless) where engine calls create verb in vdsm.
Francesco - please answer the questions in comment #9
Sorry, I'm in PTO the whole week and I check mails sporidically... The issue I'm facing should be reproduceable with this simple script https://bugzilla.redhat.com/attachment.cgi?id=938835 Or maybe even with this one === cut here === #!/usr/bin/python import sys import time from ovirtsdk.xml import params from ovirtsdk.api import API api = API(url='http://192.168.1.64:8080/api', username='admin@internal', password='admin') if len(sys.argv) != 4: sys.stderr.write('usage: %s command tmpl start stop\n' % ( sys.argv[0])) sys.exit(1) tmpl = sys.argv[1] start = int(sys.argv[2]) stop = int(sys.argv[3]) for i in range(start, stop): name = '%s_C%03i' % (tmpl, i) print 'Starting VM: %s' % name try: api.vms.get(name).start() time.sleep(1) except Exception as exc: print (exc) === cut here === (both will require minimal editing to get right API parameters) and the information I'm adding here (In reply to Barak from comment #9) > Francesco, > > Just to be on the safe side .... > 1 - on what env did you run this test (how many hypervisors) ? The environment is: * Hypervisor host: CentOS7 running on AMD Athlon 5350 (http://www.cpu-world.com/CPUs/Jaguar/AMD-Athlon%205350%20-%20AD5350JAH44HM.html) 8 GB ram. Lastest VDSM from 4.16.x (RPM from last test day), the test of the virt stack is the QEMU stack. * Engine: last 3.5.0 RC (RPM from last test day) running on a virtual machine with 2 VCPU and 8 GB RAM. Real HW is a core i7 3770 16 GB RAM. Both guest and real host are up-to-date F19 Test were run against one hypervisor at time (of two configured). I had 200 VMs, each one 16 MB ram, 1 VCPU, 1 GB HDD on NFS, no OS installed. Test was run with the first 100 VMs. Please note that I sometimes saw failures even after starting the first couple of VMs. > 2 - what was the actual time it took the vdsm to spawn the VM (synchronious > part of execution) ? Considering the log I attached here: https://bugzilla.redhat.com/show_bug.cgi?id=1143968#c5 We can see that I received a timeout after a few calls. I'm attaching one example below but all of these look the same, returning -if I read correctly the logs in ~15ms. Thread-163::DEBUG::2014-09-18 13:00:25,714::__init__::467::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.create' in bridge with {u'vmParams': {u'acpiEnable': u'true', u'emulatedMachine': u'rhel6.5.0', u'vmId': u'5ecc37cf-386f-4363-a9fd-7ed4312c60b4', u'memGuaranteedSize': 16, u'transparentHugePages': u'true', u'spiceSslCipherSuite': u'DEFAULT', u'cpuType': u'SandyBridge', u'smp': u'1', u'numaTune': {u'mode': u'preferred'}, u'custom': {u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762ddevice_f53382bb-09ed-40c6-bf8f-9744df943ba1': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=f53382bb-09ed-40c6-bf8f-9744df943ba1, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=1}, managed=false, plugged=true, readOnly=false, deviceAlias=channel0, customProperties={}, snapshotId=null}', u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762ddevice_f53382bb-09ed-40c6-bf8f-9744df943ba1device_835cd3a8-6002-4590-9d18-9797bdd1afaf': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=835cd3a8-6002-4590-9d18-9797bdd1afaf, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=2}, managed=false, plugged=true, readOnly=false, deviceAlias=channel1, customProperties={}, snapshotId=null}', u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762ddevice_f53382bb-09ed-40c6-bf8f-9744df943ba1device_835cd3a8-6002-4590-9d18-9797bdd1afafdevice_addb1082-34f7-4c9f-ad55-863e33f26408': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=addb1082-34f7-4c9f-ad55-863e33f26408, device=spicevmc, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=3}, managed=false, plugged=true, readOnly=false, deviceAlias=channel2, customProperties={}, snapshotId=null}', u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762d': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=23a9f0c4-0b62-4ba6-98be-eeeb5162762d, device=ide, type=CONTROLLER, bootOrder=0, specParams={}, address={slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}, managed=false, plugged=true, readOnly=false, deviceAlias=ide0, customProperties={}, snapshotId=null}'}, u'vmType': u'kvm', u'memSize': 16, u'smpCoresPerSocket': u'1', u'vmName': u'XS_C000', u'nice': u'0', u'bootMenuEnable': u'false', u'copyPasteEnable': u'true', u'smartcardEnable': u'false', u'fileTransferEnable': u'true', u'keyboardLayout': u'en-us', u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', u'devices': [{u'device': u'qxl', u'specParams': {u'vram': u'32768', u'heads': u'1'}, u'type': u'video', u'deviceId': u'938a44fb-3af7-4a35-adba-d490bdda8f54', u'address': {u'slot': u'0x02', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}, {u'index': u'2', u'iface': u'ide', u'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'specParams': {u'path': u''}, u'readonly': u'true', u'deviceId': u'0d35fe8f-98d2-4c4a-8315-51b75fbba576', u'path': u'', u'device': u'cdrom', u'shared': u'false', u'type': u'disk'}, {u'index': 0, u'iface': u'virtio', u'format': u'cow', u'bootOrder': u'1', u'poolID': u'00000002-0002-0002-0002-000000000064', u'volumeID': u'851aeb5b-83fd-4135-94e4-5891a8269b9f', u'imageID': u'6a204b39-bebc-48ce-8683-cea8681addc9', u'specParams': {}, u'readonly': u'false', u'domainID': u'81d8571c-0871-471f-930b-08685a502d3c', u'optional': u'false', u'deviceId': u'6a204b39-bebc-48ce-8683-cea8681addc9', u'address': {u'slot': u'0x05', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}, {u'device': u'memballoon', u'specParams': {u'model': u'virtio'}, u'type': u'balloon', u'deviceId': u'f25d3601-3289-4c4e-aa0c-eb2816ca1901', u'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}, {u'index': u'0', u'specParams': {}, u'deviceId': u'9e330005-350e-48fa-b4b6-a6ee18188f4c', u'address': {u'slot': u'0x03', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device': u'virtio-serial', u'specParams': {}, u'type': u'controller', u'deviceId': u'29714f6f-4172-4900-aa12-64249611db12', u'address': {u'slot': u'0x04', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}], u'timeOffset': u'0', u'maxVCpus': u'16', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', u'display': u'qxl'}, u'vmID': u'5ecc37cf-386f-4363-a9fd-7ed4312c60b4'} Thread-163::DEBUG::2014-09-18 13:00:25,751::__init__::498::jsonrpc.JsonRpcServer::(_serveRequest) Return 'VM.create' in bridge with {u'acpiEnable': u'true', u'emulatedMachine': u'rhel6.5.0', 'pid': '0', u'memGuaranteedSize': 16, u'transparentHugePages': u'true', 'displaySecurePort': '-1', u'spiceSslCipherSuite': u'DEFAULT', u'cpuType': u'SandyBridge', u'smp': u'1', u'numaTune': {u'mode': u'preferred'}, u'custom': {u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762ddevice_f53382bb-09ed-40c6-bf8f-9744df943ba1': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=f53382bb-09ed-40c6-bf8f-9744df943ba1, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=1}, managed=false, plugged=true, readOnly=false, deviceAlias=channel0, customProperties={}, snapshotId=null}', u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762ddevice_f53382bb-09ed-40c6-bf8f-9744df943ba1device_835cd3a8-6002-4590-9d18-9797bdd1afaf': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=835cd3a8-6002-4590-9d18-9797bdd1afaf, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=2}, managed=false, plugged=true, readOnly=false, deviceAlias=channel1, customProperties={}, snapshotId=null}', u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762ddevice_f53382bb-09ed-40c6-bf8f-9744df943ba1device_835cd3a8-6002-4590-9d18-9797bdd1afafdevice_addb1082-34f7-4c9f-ad55-863e33f26408': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=addb1082-34f7-4c9f-ad55-863e33f26408, device=spicevmc, type=CHANNEL, bootOrder=0, specParams={}, address={bus=0, controller=0, type=virtio-serial, port=3}, managed=false, plugged=true, readOnly=false, deviceAlias=channel2, customProperties={}, snapshotId=null}', u'device_23a9f0c4-0b62-4ba6-98be-eeeb5162762d': u'VmDevice {vmId=5ecc37cf-386f-4363-a9fd-7ed4312c60b4, deviceId=23a9f0c4-0b62-4ba6-98be-eeeb5162762d, device=ide, type=CONTROLLER, bootOrder=0, specParams={}, address={slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}, managed=false, plugged=true, readOnly=false, deviceAlias=ide0, customProperties={}, snapshotId=null}'}, u'vmType': u'kvm', u'memSize': 16, u'smpCoresPerSocket': u'1', u'vmName': u'XS_C000', u'nice': u'0', 'status': 'WaitForLaunch', u'bootMenuEnable': u'false', u'vmId': u'5ecc37cf-386f-4363-a9fd-7ed4312c60b4', u'copyPasteEnable': u'true', 'displayIp': '192.168.1.48', 'displayPort': '-1', u'smartcardEnable': u'false', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', u'fileTransferEnable': u'true', 'nicModel': 'rtl8139,pv', u'keyboardLayout': u'en-us', u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', u'devices': [{u'specParams': {u'vram': u'32768', u'heads': u'1'}, 'deviceType': u'video', u'deviceId': u'938a44fb-3af7-4a35-adba-d490bdda8f54', u'address': {u'slot': u'0x02', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'qxl', u'type': u'video'}, {u'index': u'2', u'iface': u'ide', u'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'specParams': {u'path': u''}, u'readonly': u'true', 'deviceType': u'disk', u'deviceId': u'0d35fe8f-98d2-4c4a-8315-51b75fbba576', u'path': u'', u'device': u'cdrom', u'shared': u'false', u'type': u'disk'}, {u'index': 0, u'iface': u'virtio', 'reqsize': '0', u'format': u'cow', 'deviceType': u'disk', u'bootOrder': u'1', u'poolID': u'00000002-0002-0002-0002-000000000064', u'volumeID': u'851aeb5b-83fd-4135-94e4-5891a8269b9f', u'imageID': u'6a204b39-bebc-48ce-8683-cea8681addc9', u'specParams': {}, u'readonly': u'false', u'domainID': u'81d8571c-0871-471f-930b-08685a502d3c', u'optional': u'false', u'deviceId': u'6a204b39-bebc-48ce-8683-cea8681addc9', u'address': {u'slot': u'0x05', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}, {u'specParams': {u'model': u'virtio'}, 'deviceType': u'balloon', u'deviceId': u'f25d3601-3289-4c4e-aa0c-eb2816ca1901', u'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'memballoon', u'type': u'balloon'}, {u'index': u'0', u'specParams': {}, 'deviceType': u'controller', u'deviceId': u'9e330005-350e-48fa-b4b6-a6ee18188f4c', u'address': {u'slot': u'0x03', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'specParams': {}, 'deviceType': u'controller', u'deviceId': u'29714f6f-4172-4900-aa12-64249611db12', u'address': {u'slot': u'0x04', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'virtio-serial', u'type': u'controller'}], u'timeOffset': u'0', u'maxVCpus': u'16', 'clientIp': '', u'display': u'qxl'}
We're not sure the issue still happens. Piotr - please try to reproduce, and sync with Francesco. If needed, either provide a fix, or close it.
Will retry once I'm sure all the relevant patches for 3.5.x branch are are in both on Engine and on VDSM side.
I tired on my local env and do not see this issue. Can you please check with the latest master?
We noticed that response from vm.create is updated after returned which causes RuntimeError 'dictionary changed size during iteration' during marshaling to json.
turns out that json-rpc (and possibly recent changes needed for live merge) just exposed an old flow in the Vm creation path. Core issue is that we update Vm.conf in an unsafe manner, without properly locking Vm.conf. Eventually, this was spotted into the VDSM logs: Thread-281::ERROR::2014-10-28 16:31:20,971::__init__::244::root::(sendReply) Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 242, in sendReply encodedObjects.append(response.encode()) File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 140, in encode return json.dumps(res, 'utf-8') File "/usr/lib64/python2.6/json/__init__.py", line 237, in dumps **kw).encode(obj) File "/usr/lib64/python2.6/json/encoder.py", line 367, in encode chunks = list(self.iterencode(o)) File "/usr/lib64/python2.6/json/encoder.py", line 309, in _iterencode for chunk in self._iterencode_dict(o, markers): File "/usr/lib64/python2.6/json/encoder.py", line 275, in _iterencode_dict for chunk in self._iterencode(value, markers): File "/usr/lib64/python2.6/json/encoder.py", line 309, in _iterencode for chunk in self._iterencode_dict(o, markers): File "/usr/lib64/python2.6/json/encoder.py", line 275, in _iterencode_dict for chunk in self._iterencode(value, markers): File "/usr/lib64/python2.6/json/encoder.py", line 306, in _iterencode for chunk in self._iterencode_list(o, markers): File "/usr/lib64/python2.6/json/encoder.py", line 204, in _iterencode_list for chunk in self._iterencode(value, markers): File "/usr/lib64/python2.6/json/encoder.py", line 309, in _iterencode for chunk in self._iterencode_dict(o, markers): File "/usr/lib64/python2.6/json/encoder.py", line 247, in _iterencode_dict for key, value in items: RuntimeError: dictionary changed size during iteration We don't know yet which part of the creation part is changing the Vm.conf. The nesting of the calls suggests the device is indeed the root cause.
(In reply to Francesco Romani from comment #21) > turns out that json-rpc (and possibly recent changes needed for live merge) > just exposed an old flow in the Vm creation path. s/flow/flaw/ Preliminary verification suggests that http://gerrit.ovirt.org/#/c/34599 solves the issue - or at very least makes it much much less frequent. However 34599 is not really (yet) suitable for merging as it moves potentially expensive storage calls in the sync part.
is bug 1154389 duplicate of this one?
(In reply to Omer Frenkel from comment #23) > is bug 1154389 duplicate of this one? seems related but not sure is the same issue. Needs separate investigation.
Just to update this, I've seen this issue in one of our automation storage tests creating only 5 vms in vt12, hasn't run in vt13.
This is an automated message: This bug should be fixed in oVirt 3.5.1 RC1, moving to QA
oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report.