Description of problem: On scale deployment. There is no correlation between the engine to vdsm engine saying this (VM test_0093 was started by admin@internal (Host: host01-rack06.scale.openstack.engineering.redhat.com): 2014-05-29 11:50:08,357 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-4-thread-4) [551fce46] START, CreateVDSCommand(HostName = host01-rack06.scale.openstack.engineering.redhat.com, HostId = c9f64077-87fe-45b5-80d9-1f133d38ce7b, vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, vm=VM [test_0093]), log id: d4befe7 2014-05-29 11:50:12,463 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-4-thread-4) [551fce46] org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand spiceSslCipherSuite=DEFAULT,memSize=256,kvmEnable=true,smp=1,vmType=kvm,emulatedMachine=rhel6.5.0,keyboardLayout=en-us,memGuaranteedSize=1024,pitReinjection=false,nice=0,display=qxl,smartcardEnable=false,smpCoresPerSocket=1,spiceSecureChannels=smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard,timeOffset=0,transparentHugePages=true,vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053,devices=[Ljava.util.HashMap;@258e5204,acpiEnable=true,vmName=test_0093,cpuType=SandyBridge,custom={device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9device_7b881cce-b72d-4d3c-a1df-61dab04ebe99device_7fb377d4-0a53-43c7-9a4a-f794d5c10a9adevice_8007dc3d-30ee-4391-b544-27e8992f3d77=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=8007dc3d-30ee-4391-b544-27e8992f3d77, device=virtio-serial, type=CONTROLLER, bootOrder=0, specParams={}, address={bus=0x00, domain=0x0000, type=pci, slot=0x05, function=0x0}, managed=false, plugged=true, readOnly=false, deviceAlias=virtio-serial0, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dc=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=164795dd-1798-42bc-96b8-54b1d1e9e8dc, device=spicevmc, type=CHANNEL, bootOrder=0, specParams={}, address={port=3, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, deviceAlias=channel2, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9device_7b881cce-b72d-4d3c-a1df-61dab04ebe99=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=7b881cce-b72d-4d3c-a1df-61dab04ebe99, device=ide, type=CONTROLLER, bootOrder=0, specParams={}, address={bus=0x00, domain=0x0000, type=pci, slot=0x01, function=0x1}, managed=false, plugged=true, readOnly=false, deviceAlias=ide0, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9device_7b881cce-b72d-4d3c-a1df-61dab04ebe99device_7fb377d4-0a53-43c7-9a4a-f794d5c10a9a=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=7fb377d4-0a53-43c7-9a4a-f794d5c10a9a, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={port=2, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, deviceAlias=channel1, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=78974fc0-0bbe-4e6b-a483-f4ac2271daa9, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={port=1, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, deviceAlias=channel0, customProperties={}, snapshotId=null}} 2014-05-29 11:50:12,464 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-4-thread-4) [551fce46] FINISH, CreateVDSCommand, log id: d4befe7 2014-05-29 11:50:12,472 INFO [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-4-thread-4) [551fce46] FINISH, CreateVmVDSCommand, return: WaitForLaunch, log id: 3e1c4f87 2014-05-29 11:50:12,472 INFO [org.ovirt.engine.core.bll.RunVmCommand] (pool-4-thread-4) [551fce46] Lock freed to object EngineLock [exclusiveLocks= key: a284c1f1-36f9-4e26-94bc-f0b464b75053 value: VM , sharedLocks= ] 2014-05-29 11:50:12,480 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-4) [551fce46] Correlation ID: 551fce46, Job ID: efde582c-bae3-443c-aa35-44197541d219, Call Stack: null, Custom Event ID: -1, Message: VM test_0093 was started by admin@internal (Host: host01-rack06.scale.openstack.engineering.redhat.com). In vdsm side: there is no kvm process with the same vm name (ps -ef |grep -i 'test_0093'). vdsm.log just report the xmlrpc of that vm. I have also checked the UI, and he display the vm as 'Waiting to launch' over 33 min. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.build 100 vms and running them. Actual results: after ~50 vms running, the rest of the vms is not running and waiting for launch over 33 min. Expected results: time required. Additional info: also the host displayed the vm as running(as we can see at the engine log, the engine think that vm is active). refer to bug https://bugzilla.redhat.com/show_bug.cgi?id=1102664
WaitForLaunch means that VDSM was asked to start the VM, but no report about the VM has came back yet. So it is expected that the numbers do not agree.
Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3085, in getVolumeSize apparentsize = str(volClass.getVSize(dom, imgUUID, volUUID, bs=1)) File "/usr/share/vdsm/storage/fileVolume.py", line 418, in getVSize return int(sdobj.oop.os.stat(volPath).st_size / bs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 312, in callCrabRPCFunction raise Exception("No free file handlers in pool") Exception: No free file handlers in pool I very much suspect this (repeated) exception to be the reason for the VMs to stay in WaitForLaunch for such a long time.
Im CCing storage people as this is outside of the scope of SLA team.
Fede/Nir, weren't we working on removing this?
100 % reproduced while maintenance host with 100+ vms. still under investigation. adding logs and findings ASAP.
Created attachment 901478 [details] vdsm.log grep for 'No free file handlers in pool'
(In reply to Allon Mureinik from comment #4) > Fede/Nir, weren't we working on removing this? I was not working on removing this.
This is a limit of the current implementation - there are up to 10 file handlers per domain, and if you are trying to do too many operation concurrently you will get this failure. Will be fixed by bug 1083771.
Bug verified on: - RHEV-M 3.5.0-0.22.el6ev - RHEL - 6Server - 6.6.0.2.el6 - libvirt-0.10.2-46.el6_6.1 - vdsm-4.16.7.6-1.el6ev It didn't reproduce. Moved to verified.