Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1073551

Summary:	No indication if not all pre-started VMs in pool get started
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Petr Beňas <pbenas>
Component:	ovirt-engine	Assignee:	Martin Sivák <msivak>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Nisim Simsolo <nsimsolo>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.4.0	CC:	dfediuck, gklein, iheim, lpeer, mavital, michal.skrivanek, ofrenkel, pstehlik, rbalakri, Rhev-m-bugs, sherold, yeylon
Target Milestone:	---	Keywords:	Performance
Target Release:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	virt
Fixed In Version:	ovirt-engine-3.5	Doc Type:	Bug Fix
Doc Text:	Tehre was no indication when prestarted VMs failed to start. Now an indication appears in events/tasks tabs.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-02-17 08:29:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Petr Beňas 2014-03-06 16:52:49 UTC

Description of problem:
Editation of prestarted VMs and saving the pool configuration does not behave as expected

Version-Release number of selected component (if applicable):
av2

How reproducible:


Steps to Reproduce:
1. create a pool from template with 50 vms. 
2. edit the pool and set 50 vms prestarted. 
3. wait
4. edit the pool again and set 0 vms prestarted

Actual results:
Only 18 of 50 vms get started, although reported host cpu nor memory usage exceedes 20%. Then for two VMs error "Failed to complete starting of " is reported, without providing any reason why the engine was not able to start the VM. No attempt of starting more VMs is performed by the engine for next 30 mins. 
Afgter step 4, all the 18 running VMs are still running. 

Expected results:
Engine attempts to start all the 50 VMs. If starting fails due to insufficient host resources, it's reported. When number of prestarted VMs is set to zero, all the prestarted running VMs are stoppend. 

Additional info:

Comment 1 Michal Skrivanek 2014-03-09 12:08:39 UTC

logs?

Comment 2 Petr Beňas 2014-03-10 08:05:41 UTC

Created attachment 872593 [details]
engine.log, related parts at the end

Comment 4 Michal Skrivanek 2014-03-25 09:06:30 UTC

Petr, the existing VMs should not be stopped
How long did you wait when you changed prestarted to 0? Likely the original VMs were not yet started hence you get just a few of them prestarted…

Comment 5 Petr Beňas 2014-03-25 11:09:55 UTC

I waited about 30 minutes, all VMs were started for the first 10 minutes, then notnig was happening for the next 20 minutes.

Comment 6 Omer Frenkel 2014-03-25 15:29:59 UTC

ok its clear from the log that after 18 vms started, there were no more resources to start any more vms, although the engine try to start more every 5 mins:

2014-03-06 17:01:54,164 INFO  [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-3) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 50 prestarted Vms, attempting to prestart 5 Vms
..
2014-03-06 17:19:58,405 INFO  [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-3) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 32 prestarted Vms, attempting to prestart 5 Vms
2014-03-06 17:19:58,456 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler_Worker-3) Candidate host rhel6 (35845df9-45db-4eab-b9e4-270217cd52b2) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory
..
2014-03-06 17:24:58,614 INFO  [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-29) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 32 prestarted Vms, attempting to prestart 5 Vms

until the pre-started count decreased,
and as mentioned in comment 4
we should not stop any running vms

do you think your setup should have been able to run more than these vms?

Comment 7 Petr Beňas 2014-03-25 15:57:29 UTC

Ok, if we don't stop VMs when the prestarted VMs count is decreased, it's ok. 

No idea how many VM's should the setup handle, but I'd expect an alert message letting me know I cannot have more VMs. Ideally, it would give a reason (CPU, mem, ...).

Comment 8 Michal Skrivanek 2014-03-26 08:34:53 UTC

you didn't see anything in the event log? (stating insufficient memory as the reason)

Comment 9 Petr Beňas 2014-03-26 11:09:40 UTC

No, I've rechecked this in av4. Set prestarted VM count to 30 on the same setup and got two "Failed to complete starting of VM". No reason given. Now I've got 12 VMs of the pool running plus one which was already running before this test. Last VM was started about a hour ago, current host usage is 20% mem and 2% CPU, but I expect the CPU usage was much higher during the boot of many VMs. 

I see in the engine.log that VMs could not be started because host was filtered out because of memory, but there is no such message in the WebUI event log.

Comment 10 Michal Skrivanek 2014-07-24 12:15:14 UTC

smells like SLA, Martin please check

Comment 11 Martin Betak 2014-07-31 10:53:57 UTC

SLA indeed. Currently when there is a scheduling failure you only get this line in the logs indicating there was some problem:

2014-03-06 17:29:59,005 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler_Worker-7) Candidate host rhel6 (35845df9-45db-4eab-b9e4-270217cd52b2) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory

It would be better if the SchedulingManager would emit standard audit log events, so the administrator would directly see what is the problem.

Comment 12 Martin Sivák 2014-08-27 09:22:41 UTC

This is definitely not SLA. SLA correctly says that there are not enough resources. VM Pool management is 100% virt :)

SchedulingManager is separated from AuditLog by design. The consumer of scheduling results is responsible for logging stuff.

But I wrote the VmPoolMonitor patch for you.

Comment 14 Nisim Simsolo 2014-11-26 15:03:54 UTC

Verified, in case of pre-started VM failed, an indication appears in events/tasks tabs.
Verified using:
rhevm-3.5.0-0.21.el6ev.noarch
sanlock-2.8-1.el6.x86_64
libvirt-0.10.2-46.el6_6.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
vdsm-4.16.7.5-1.el6ev.x86_64

Comment 16 Omer Frenkel 2015-02-17 08:29:07 UTC

RHEV-M 3.5.0 has been released