Bug 1102701
| Summary: | [Scale] - there no correlation about the vm status between the engine to vdsm | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Eldad Marciano <emarcian> | ||||
| Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Yuri Obshansky <yobshans> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.3.0 | CC: | amureini, bazulay, fsimonce, gklein, iheim, lpeer, michal.skrivanek, nsoffer, scohen, tnisan, ybronhei, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.5.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | vt1.3, 4.16.0-1.el6_5 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-02-16 13:41:11 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1083771 | ||||||
| Bug Blocks: | 1142923, 1156165 | ||||||
| Attachments: |
|
||||||
WaitForLaunch means that VDSM was asked to start the VM, but no report about the VM has came back yet. So it is expected that the numbers do not agree. Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 857, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 3085, in getVolumeSize
apparentsize = str(volClass.getVSize(dom, imgUUID, volUUID, bs=1))
File "/usr/share/vdsm/storage/fileVolume.py", line 418, in getVSize
return int(sdobj.oop.os.stat(volPath).st_size / bs)
File "/usr/share/vdsm/storage/remoteFileHandler.py", line 312, in callCrabRPCFunction
raise Exception("No free file handlers in pool")
Exception: No free file handlers in pool
I very much suspect this (repeated) exception to be the reason for the VMs to stay in WaitForLaunch for such a long time.
Im CCing storage people as this is outside of the scope of SLA team. Fede/Nir, weren't we working on removing this? 100 % reproduced while maintenance host with 100+ vms. still under investigation. adding logs and findings ASAP. Created attachment 901478 [details]
vdsm.log
grep for 'No free file handlers in pool'
(In reply to Allon Mureinik from comment #4) > Fede/Nir, weren't we working on removing this? I was not working on removing this. This is a limit of the current implementation - there are up to 10 file handlers per domain, and if you are trying to do too many operation concurrently you will get this failure. Will be fixed by bug 1083771. Bug verified on: - RHEV-M 3.5.0-0.22.el6ev - RHEL - 6Server - 6.6.0.2.el6 - libvirt-0.10.2-46.el6_6.1 - vdsm-4.16.7.6-1.el6ev It didn't reproduce. Moved to verified. |
Description of problem: On scale deployment. There is no correlation between the engine to vdsm engine saying this (VM test_0093 was started by admin@internal (Host: host01-rack06.scale.openstack.engineering.redhat.com): 2014-05-29 11:50:08,357 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-4-thread-4) [551fce46] START, CreateVDSCommand(HostName = host01-rack06.scale.openstack.engineering.redhat.com, HostId = c9f64077-87fe-45b5-80d9-1f133d38ce7b, vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, vm=VM [test_0093]), log id: d4befe7 2014-05-29 11:50:12,463 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-4-thread-4) [551fce46] org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand spiceSslCipherSuite=DEFAULT,memSize=256,kvmEnable=true,smp=1,vmType=kvm,emulatedMachine=rhel6.5.0,keyboardLayout=en-us,memGuaranteedSize=1024,pitReinjection=false,nice=0,display=qxl,smartcardEnable=false,smpCoresPerSocket=1,spiceSecureChannels=smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard,timeOffset=0,transparentHugePages=true,vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053,devices=[Ljava.util.HashMap;@258e5204,acpiEnable=true,vmName=test_0093,cpuType=SandyBridge,custom={device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9device_7b881cce-b72d-4d3c-a1df-61dab04ebe99device_7fb377d4-0a53-43c7-9a4a-f794d5c10a9adevice_8007dc3d-30ee-4391-b544-27e8992f3d77=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=8007dc3d-30ee-4391-b544-27e8992f3d77, device=virtio-serial, type=CONTROLLER, bootOrder=0, specParams={}, address={bus=0x00, domain=0x0000, type=pci, slot=0x05, function=0x0}, managed=false, plugged=true, readOnly=false, deviceAlias=virtio-serial0, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dc=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=164795dd-1798-42bc-96b8-54b1d1e9e8dc, device=spicevmc, type=CHANNEL, bootOrder=0, specParams={}, address={port=3, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, deviceAlias=channel2, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9device_7b881cce-b72d-4d3c-a1df-61dab04ebe99=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=7b881cce-b72d-4d3c-a1df-61dab04ebe99, device=ide, type=CONTROLLER, bootOrder=0, specParams={}, address={bus=0x00, domain=0x0000, type=pci, slot=0x01, function=0x1}, managed=false, plugged=true, readOnly=false, deviceAlias=ide0, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9device_7b881cce-b72d-4d3c-a1df-61dab04ebe99device_7fb377d4-0a53-43c7-9a4a-f794d5c10a9a=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=7fb377d4-0a53-43c7-9a4a-f794d5c10a9a, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={port=2, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, deviceAlias=channel1, customProperties={}, snapshotId=null}, device_164795dd-1798-42bc-96b8-54b1d1e9e8dcdevice_78974fc0-0bbe-4e6b-a483-f4ac2271daa9=VmDevice {vmId=a284c1f1-36f9-4e26-94bc-f0b464b75053, deviceId=78974fc0-0bbe-4e6b-a483-f4ac2271daa9, device=unix, type=CHANNEL, bootOrder=0, specParams={}, address={port=1, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, deviceAlias=channel0, customProperties={}, snapshotId=null}} 2014-05-29 11:50:12,464 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (pool-4-thread-4) [551fce46] FINISH, CreateVDSCommand, log id: d4befe7 2014-05-29 11:50:12,472 INFO [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-4-thread-4) [551fce46] FINISH, CreateVmVDSCommand, return: WaitForLaunch, log id: 3e1c4f87 2014-05-29 11:50:12,472 INFO [org.ovirt.engine.core.bll.RunVmCommand] (pool-4-thread-4) [551fce46] Lock freed to object EngineLock [exclusiveLocks= key: a284c1f1-36f9-4e26-94bc-f0b464b75053 value: VM , sharedLocks= ] 2014-05-29 11:50:12,480 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-4) [551fce46] Correlation ID: 551fce46, Job ID: efde582c-bae3-443c-aa35-44197541d219, Call Stack: null, Custom Event ID: -1, Message: VM test_0093 was started by admin@internal (Host: host01-rack06.scale.openstack.engineering.redhat.com). In vdsm side: there is no kvm process with the same vm name (ps -ef |grep -i 'test_0093'). vdsm.log just report the xmlrpc of that vm. I have also checked the UI, and he display the vm as 'Waiting to launch' over 33 min. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.build 100 vms and running them. Actual results: after ~50 vms running, the rest of the vms is not running and waiting for launch over 33 min. Expected results: time required. Additional info: also the host displayed the vm as running(as we can see at the engine log, the engine think that vm is active). refer to bug https://bugzilla.redhat.com/show_bug.cgi?id=1102664