Bug 1430285
| Summary: | [scale] - core.bll.RunVmCommandBase.delay locked the cluster for long time. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Eldad Marciano <emarcian> | ||||||
| Component: | Backend.Core | Assignee: | Andrej Krejcir <akrejcir> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | eberman | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 4.1.1.2 | CC: | akrejcir, apinnick, bugs, dfediuck, eberman, michal.skrivanek, msivak, rgolan, stirabos | ||||||
| Target Milestone: | ovirt-4.1.4 | Keywords: | Performance | ||||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.1+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Release Note | |||||||
| Doc Text: |
In order to run VMs on hosts with limited memory resources, the cluster scheduling policy should be set to "evenly_distributed", with "maxFreeMemoryForOverUtilized = 99". This configuration enables the engine to schedule VMs on hosts with more free memory, for better distribution.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-07-28 14:20:22 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Eldad Marciano
2017-03-08 09:41:14 UTC
This is inside the scheduler. Discussed that with msivak recently, there are some proposals how to improve that Reducing severity, as the scenario is quite corner case. first , workaround: - set config value SchedulerAllowOverBooking to 'True' and optionally tweak the SchedulerOverBookingThreshold to N which is close to the size of the bulk you want to run When *over-committing* we can hit a serial delay * H * V where H is num of UP hosts overcommitted and and V is the number of vms we want to run. Here it summed up to an hour, but even on smaller scale, it could add up to minutes. There is a design decision that takes a lock at the cluster level, to ensure we have consistency while making a scheduling decision. This is by design and will not change. For emergencies or temporary high demand, we included an override option, which should only be used for a short amount of time (Monday morning effect): Edit cluster -> Scheduling policy -> Optimize for Speed We've seen this in the past with Bug 1149701. Is there something new now? (In reply to Doron Fediuck from comment #6) > There is a design decision that takes a lock at the cluster level, to ensure > we have consistency while making a scheduling decision. > This is by design and will not change. For emergencies or temporary high > demand, we included an override option, which should only be used for a > short amount of time (Monday morning effect): > Edit cluster -> Scheduling policy -> Optimize for Speed > > We've seen this in the past with Bug 1149701. Is there something new now? Well, may be someone from SLA or virt can tell more precisely. it looks like the same bug, or at least as you mention the same cluster look. but this case is different now, and more related to RunVmCommandBase.delay code the delay method called in some cases as I talked to Roy, In our case when a host suffer from memory starvation the "RunVmCommand" will call the delay method. Andrej, can we please refactor this so we wait only once per scheduling run and only when no host was left in the memory policy unit? So if there is at least one usable host returned from that policy unit, we won't wait. Otherwise we wait, refresh host memory information and try again, but only once. Created attachment 1299689 [details]
50 bulks of "create vms" results
50 bulks of "create vms" results
Created attachment 1299718 [details]
logs
|