clone job doesn't work, so doing that manually
+++ This bug was initially created as a clone of Bug #1361028 +++
Description of problem:
When VMs get non responding. It can happen that in some cases the executor tasks queue get full and exception TooManyTasks is raised. This causes the operation not being scheduled any more.
Version-Release number of selected component (if applicable):
Under heavy load and isue with qemu responsivness
Steps to Reproduce:
Not completely clear
All VMs are marked as non-responding
The task is scheduled as soon as there is some space in the tasks queue is
vdsm.Scheduler::ERROR::2016-07-16 16:15:28,745::schedule::213::Scheduler::(_execute) Unhandled exception in <bound method Operation._try_to_dispatch of <virt.periodic.Operation object at 0x45ea390>>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/schedule.py", line 211, in _execute
File "/usr/share/vdsm/virt/periodic.py", line 190, in _try_to_dispatch
File "/usr/share/vdsm/virt/periodic.py", line 197, in _dispatch
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 101, in dispatch
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 256, in put
Send `func' to Executor to be run as soon as possible.
self._call = None
The exception comes from
So self._step() is not executed and the operation is not scheduled
cloned to zstream bug 1364925
Verified with 184.108.40.206-0.1
OS Version:RHEL - 7.2 - 9.el7_2.1
OS Description:Red Hat Enterprise Linux Server 7.2 (Maipo)
Kernel Version:3.10.0 - 327.30.1.el7.x86_64
KVM Version:2.3.0 - 31.el7_2.21
SPICE Version:0.12.4 - 15.el7_2.2
1. Modified /etc/vdsm/vdsm.conf and added the lines:
periodic_workers = 1
periodic_task_per_worker = 1
2. restarted vdsm.
3. Created a vm pool with 8 vms.
4. Started all 8 vms.
vdsm.Scheduler::WARNING::2016-09-11 11:22:02,721::periodic::211::virt.periodic.Operation::(_dispatch) could not run <vdsm.virt.sampling.VMBulkSampler object at 0x21ed290>, executor queue full
In the engine some of the vms are set to 'not responding' state for a short amount of time and when the previous tasks are done the vms are started as well until eventually all are up as expected.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.