Bug 1073551
| Summary: | No indication if not all pre-started VMs in pool get started | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Petr Beňas <pbenas> |
| Component: | ovirt-engine | Assignee: | Martin Sivák <msivak> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Nisim Simsolo <nsimsolo> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.4.0 | CC: | dfediuck, gklein, iheim, lpeer, mavital, michal.skrivanek, ofrenkel, pstehlik, rbalakri, Rhev-m-bugs, sherold, yeylon |
| Target Milestone: | --- | Keywords: | Performance |
| Target Release: | 3.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | virt | ||
| Fixed In Version: | ovirt-engine-3.5 | Doc Type: | Bug Fix |
| Doc Text: |
Tehre was no indication when prestarted VMs failed to start.
Now an indication appears in events/tasks tabs.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-02-17 08:29:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Petr Beňas
2014-03-06 16:52:49 UTC
logs? Created attachment 872593 [details]
engine.log, related parts at the end
Petr, the existing VMs should not be stopped How long did you wait when you changed prestarted to 0? Likely the original VMs were not yet started hence you get just a few of them prestarted… I waited about 30 minutes, all VMs were started for the first 10 minutes, then notnig was happening for the next 20 minutes. ok its clear from the log that after 18 vms started, there were no more resources to start any more vms, although the engine try to start more every 5 mins: 2014-03-06 17:01:54,164 INFO [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-3) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 50 prestarted Vms, attempting to prestart 5 Vms .. 2014-03-06 17:19:58,405 INFO [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-3) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 32 prestarted Vms, attempting to prestart 5 Vms 2014-03-06 17:19:58,456 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler_Worker-3) Candidate host rhel6 (35845df9-45db-4eab-b9e4-270217cd52b2) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory .. 2014-03-06 17:24:58,614 INFO [org.ovirt.engine.core.bll.VmPoolMonitor] (DefaultQuartzScheduler_Worker-29) VmPool e21c20c0-4b0d-4d90-bac2-9b6d16933eed is missing 32 prestarted Vms, attempting to prestart 5 Vms until the pre-started count decreased, and as mentioned in comment 4 we should not stop any running vms do you think your setup should have been able to run more than these vms? Ok, if we don't stop VMs when the prestarted VMs count is decreased, it's ok. No idea how many VM's should the setup handle, but I'd expect an alert message letting me know I cannot have more VMs. Ideally, it would give a reason (CPU, mem, ...). you didn't see anything in the event log? (stating insufficient memory as the reason) No, I've rechecked this in av4. Set prestarted VM count to 30 on the same setup and got two "Failed to complete starting of VM". No reason given. Now I've got 12 VMs of the pool running plus one which was already running before this test. Last VM was started about a hour ago, current host usage is 20% mem and 2% CPU, but I expect the CPU usage was much higher during the boot of many VMs. I see in the engine.log that VMs could not be started because host was filtered out because of memory, but there is no such message in the WebUI event log. smells like SLA, Martin please check SLA indeed. Currently when there is a scheduling failure you only get this line in the logs indicating there was some problem: 2014-03-06 17:29:59,005 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler_Worker-7) Candidate host rhel6 (35845df9-45db-4eab-b9e4-270217cd52b2) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory It would be better if the SchedulingManager would emit standard audit log events, so the administrator would directly see what is the problem. This is definitely not SLA. SLA correctly says that there are not enough resources. VM Pool management is 100% virt :) SchedulingManager is separated from AuditLog by design. The consumer of scheduling results is responsible for logging stuff. But I wrote the VmPoolMonitor patch for you. Verified, in case of pre-started VM failed, an indication appears in events/tasks tabs. Verified using: rhevm-3.5.0-0.21.el6ev.noarch sanlock-2.8-1.el6.x86_64 libvirt-0.10.2-46.el6_6.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 vdsm-4.16.7.5-1.el6ev.x86_64 RHEV-M 3.5.0 has been released |