Red Hat Bugzilla – Bug 1430285
[scale] - core.bll.RunVmCommandBase.delay locked the cluster for long time.
Last modified: 2017-07-28 10:20:22 EDT
Description of problem: In part of the scale out nested scenarios we found the engine locked for long time specially for running vms. we identify the issue by very long response time of starting vms (around 1hour for start vm api call). the following code core.bll.RunVmCommandBase.delay will take the cluster lock for at least 3 sec, otherwise 10sec. In case we have more vms that waiting for start in the queue it will increase the lock time. see the following thread dump: [root@master-vds9 ~]# cat /tmp/jstck.log |grep -c 'SchedulingManager.lockCluster' 94 we have 94 threads that wait for the lock in order to start vm. the delay code is specially for extreme case, when the host have memory starvation. this is exactly our case, since we dealing with nested hosts, each host have 1.4GB and we trying to run 3 vms in size of 300mb. this will trigger the RunVmCommandBase.delay. Version-Release number of selected component (if applicable): rhevm-4.1-1.02 How reproducible: 100% Steps to Reproduce: 1. run vms in bulk of 50 or sequentially add more vms to run . 2. make sure those vms will run on top of poor host (memory wise). Actual results: very long response time for vm start. Expected results: stable and reasonable response time for vm start Additional info:
This is inside the scheduler. Discussed that with msivak recently, there are some proposals how to improve that
Reducing severity, as the scenario is quite corner case.
first , workaround: - set config value SchedulerAllowOverBooking to 'True' and optionally tweak the SchedulerOverBookingThreshold to N which is close to the size of the bulk you want to run When *over-committing* we can hit a serial delay * H * V where H is num of UP hosts overcommitted and and V is the number of vms we want to run. Here it summed up to an hour, but even on smaller scale, it could add up to minutes.
There is a design decision that takes a lock at the cluster level, to ensure we have consistency while making a scheduling decision. This is by design and will not change. For emergencies or temporary high demand, we included an override option, which should only be used for a short amount of time (Monday morning effect): Edit cluster -> Scheduling policy -> Optimize for Speed We've seen this in the past with Bug 1149701. Is there something new now?
(In reply to Doron Fediuck from comment #6) > There is a design decision that takes a lock at the cluster level, to ensure > we have consistency while making a scheduling decision. > This is by design and will not change. For emergencies or temporary high > demand, we included an override option, which should only be used for a > short amount of time (Monday morning effect): > Edit cluster -> Scheduling policy -> Optimize for Speed > > We've seen this in the past with Bug 1149701. Is there something new now? Well, may be someone from SLA or virt can tell more precisely. it looks like the same bug, or at least as you mention the same cluster look. but this case is different now, and more related to RunVmCommandBase.delay code the delay method called in some cases as I talked to Roy, In our case when a host suffer from memory starvation the "RunVmCommand" will call the delay method.
Andrej, can we please refactor this so we wait only once per scheduling run and only when no host was left in the memory policy unit? So if there is at least one usable host returned from that policy unit, we won't wait. Otherwise we wait, refresh host memory information and try again, but only once.
Created attachment 1299689 [details] 50 bulks of "create vms" results 50 bulks of "create vms" results
Created attachment 1299718 [details] logs