Bug 1073551 - No indication if not all pre-started VMs in pool get started
Summary: No indication if not all pre-started VMs in pool get started
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.5.0
Assignee: Martin Sivák
QA Contact: Nisim Simsolo
URL:
Whiteboard: virt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-06 16:52 UTC by Petr Beňas
Modified: 2015-02-17 08:29 UTC (History)
12 users (show)

Fixed In Version: ovirt-engine-3.5
Doc Type: Bug Fix
Doc Text:
Tehre was no indication when prestarted VMs failed to start. Now an indication appears in events/tasks tabs.
Clone Of:
Environment:
Last Closed: 2015-02-17 08:29:07 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 32029 0 master MERGED engine: Add an AuditLog message when VM cannot be prestarted Never
oVirt gerrit 32687 0 ovirt-engine-3.5 MERGED engine: Add an AuditLog message when VM cannot be prestarted Never

Description Petr Beňas 2014-03-06 16:52:49 UTC
Description of problem:
Editation of prestarted VMs and saving the pool configuration does not behave as expected

Version-Release number of selected component (if applicable):
av2

How reproducible:


Steps to Reproduce:
1. create a pool from template with 50 vms. 
2. edit the pool and set 50 vms prestarted. 
3. wait
4. edit the pool again and set 0 vms prestarted

Actual results:
Only 18 of 50 vms get started, although reported host cpu nor memory usage exceedes 20%. Then for two VMs error "Failed to complete starting of " is reported, without providing any reason why the engine was not able to start the VM. No attempt of starting more VMs is performed by the engine for next 30 mins. 
Afgter step 4, all the 18 running VMs are still running. 

Expected results:
Engine attempts to start all the 50 VMs. If starting fails due to insufficient host resources, it's reported. When number of prestarted VMs is set to zero, all the prestarted running VMs are stoppend. 

Additional info:

Comment 1 Michal Skrivanek 2014-03-09 12:08:39 UTC
logs?

Comment 2 Petr Beňas 2014-03-10 08:05:41 UTC
Created attachment 872593 [details]
engine.log, related parts at the end

Comment 4 Michal Skrivanek 2014-03-25 09:06:30 UTC
Petr, the existing VMs should not be stopped
How long did you wait when you changed prestarted to 0? Likely the original VMs were not yet started hence you get just a few of them prestarted…

Comment 5 Petr Beňas 2014-03-25 11:09:55 UTC
I waited about 30 minutes, all VMs were started for the first 10 minutes, then notnig was happening for the next 20 minutes.

Comment 6 Omer Frenkel 2014-03-25 15:29:59 UTC
ok its clear from the log that after 18 vms started, there were no more resources to start any more vms, although the engine try to start more every 5 mins:

2014-03-06 17:01:54,164 INFO  [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-3) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 50 prestarted Vms, attempting to prestart 5 Vms
..
2014-03-06 17:19:58,405 INFO  [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-3) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 32 prestarted Vms, attempting to prestart 5 Vms
2014-03-06 17:19:58,456 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler_Worker-3) Candidate host rhel6 (35845df9-45db-4eab-b9e4-270217cd52b2) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory
..
2014-03-06 17:24:58,614 INFO  [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-29) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 32 prestarted Vms, attempting to prestart 5 Vms

until the pre-started count decreased,
and as mentioned in comment 4
we should not stop any running vms

do you think your setup should have been able to run more than these vms?

Comment 7 Petr Beňas 2014-03-25 15:57:29 UTC
Ok, if we don't stop VMs when the prestarted VMs count is decreased, it's ok. 

No idea how many VM's should the setup handle, but I'd expect an alert message letting me know I cannot have more VMs. Ideally, it would give a reason (CPU, mem, ...).

Comment 8 Michal Skrivanek 2014-03-26 08:34:53 UTC
you didn't see anything in the event log? (stating insufficient memory as the reason)

Comment 9 Petr Beňas 2014-03-26 11:09:40 UTC
No, I've rechecked this in av4. Set prestarted VM count to 30 on the same setup and got two "Failed to complete starting of VM". No reason given. Now I've got 12 VMs of the pool running plus one which was already running before this test. Last VM was started about a hour ago, current host usage is 20% mem and 2% CPU, but I expect the CPU usage was much higher during the boot of many VMs. 

I see in the engine.log that VMs could not be started because host was filtered out because of memory, but there is no such message in the WebUI event log.

Comment 10 Michal Skrivanek 2014-07-24 12:15:14 UTC
smells like SLA, Martin please check

Comment 11 Martin Betak 2014-07-31 10:53:57 UTC
SLA indeed. Currently when there is a scheduling failure you only get this line in the logs indicating there was some problem:

2014-03-06 17:29:59,005 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler_Worker-7) Candidate host rhel6 (35845df9-45db-4eab-b9e4-270217cd52b2) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory

It would be better if the SchedulingManager would emit standard audit log events, so the administrator would directly see what is the problem.

Comment 12 Martin Sivák 2014-08-27 09:22:41 UTC
This is definitely not SLA. SLA correctly says that there are not enough resources. VM Pool management is 100% virt :)

SchedulingManager is separated from AuditLog by design. The consumer of scheduling results is responsible for logging stuff.

But I wrote the VmPoolMonitor patch for you.

Comment 14 Nisim Simsolo 2014-11-26 15:03:54 UTC
Verified, in case of pre-started VM failed, an indication appears in events/tasks tabs.
Verified using:
rhevm-3.5.0-0.21.el6ev.noarch
sanlock-2.8-1.el6.x86_64
libvirt-0.10.2-46.el6_6.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
vdsm-4.16.7.5-1.el6ev.x86_64

Comment 16 Omer Frenkel 2015-02-17 08:29:07 UTC
RHEV-M 3.5.0 has been released


Note You need to log in before you can comment on or make changes to this bug.