Bug 1490810 - print task list to log in TooManyTasks issue
Summary: print task list to log in TooManyTasks issue
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.1.7
: 4.19.33
Assignee: Francesco Romani
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-12 09:35 UTC by Oved Ourfali
Modified: 2017-11-13 12:25 UTC (History)
5 users (show)

Fixed In Version: v4.19.33
Doc Type: Enhancement
Doc Text:
Previously, while VDSM performed periodic monitoring and maintenance tasks, operations would occasionally become too slow or even become blocked. In this case, the internal queue became full and the periodic operations were no longer performed. A "TooManyTasks" warning appeared in the log files at a maximum rate of once every 10 seconds. In this release, if VDSM cannot perform its periodic operations, in addition to issuing a warning in the log files, VDSM also dumps the contents of the queue into the logs.
Clone Of:
Environment:
Last Closed: 2017-11-13 12:25:58 UTC
oVirt Team: Virt
Embargoed:
tjelinek: ovirt-4.1?
tjelinek: planning_ack?
tjelinek: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 81624 0 master MERGED virt: dump executor state when ResourceExhausted 2021-02-09 23:04:34 UTC
oVirt gerrit 81788 0 ovirt-4.1 MERGED virt: dump executor state when ResourceExhausted 2021-02-09 23:04:38 UTC

Description Oved Ourfali 2017-09-12 09:35:56 UTC
We saw several occurrences of TooManyTasks issue on VDSM.
It can be useful to have a list of tasks in case of such an error.

Comment 1 Tomas Jelinek 2017-09-13 10:47:18 UTC
it seems both useful and easy enough for 4.1.7, targeting accordingly.

Comment 2 Francesco Romani 2017-10-17 07:35:11 UTC
To reproduce:
First, let's make more likely that Vdsm will see TooManyTasks. We can achieve this artificially reducing the resources available to the Vdsm executor, for example using those crazy settings in vdsm configuration:

[sampling]
periodic_workers = 1
periodic_task_per_worker = 1
max_workers = 2

Again, those settings are legal but absolutely crazy out of testing/development environment; then restart Vdsm.

Now Vdsm should see TooManyTasks error very frequently. One needs to run any workload, watch out for timeouts (much more likely).

After some time just open the Vdsm logs and check for lines like

2017-09-14 15:03:46,225+0200 WARN (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x35b6f90>, executor queue full (periodic:241) 2017-09-14 15:03:46,225+0200 WARN (vdsm.Scheduler) [virt.periodic.Operation] executor state: <Executor periodic workers=1 max_workers=2 <TaskQueue periodic max_tasks=1 tasks(1)=deque([<Task discardable <Operation action=<function <lambda> at 0x7f801cded140> at 0x34b5210> timeout=7.5, duration=0 at 0x3b15290>]) at 0x371ac90> at 0x371add0> 

the "deque=..." shows the current task list.

+++

[1] Usually the values are:

periodic_workers = N*M -> N numbers of cores in the box, M = multiplication factor usually 1 or 2. Values M > 1 are overhead, but sometimes needed


periodic_task_per_worker = 100

max_workers = P * K -> P = value of "periodic_workers", K = multiplication factor much like M above.

The value of "M" and "K" is eurhystic, depends on the setup, there is no "good" value here.

Comment 3 Israel Pinto 2017-10-17 07:46:24 UTC
Verify with:
vdsm-4.19.34-1.el7ev.x86_64

Did the steps at comment 2:
vdsm log:
[root@puma43 vdsm]# grep  virt.periodic.Operation vdsm.log 
2017-10-17 10:43:13,304+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.NumaInfoMonitor'> at 0x3552c10>, executor queue full (periodic:239)
2017-10-17 10:43:13,304+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x3552c90>, executor queue full (periodic:239)
2017-10-17 10:43:13,305+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <vdsm.virt.sampling.VMBulkSampler object at 0x3552d10>, executor queue full (periodic:239)
2017-10-17 10:43:13,305+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x3552fd0>, executor queue full (periodic:239)
2017-10-17 10:43:13,306+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <vdsm.virt.sampling.HostMonitor object at 0x3558090>, executor queue full (periodic:239)
2017-10-17 10:43:13,306+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:43:28,308+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:43:28,309+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x3552c90>, executor queue full (periodic:239)
2017-10-17 10:43:43,302+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x3552fd0>, executor queue full (periodic:239)
2017-10-17 10:43:43,313+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x3552c90>, executor queue full (periodic:239)
2017-10-17 10:43:58,316+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:44:13,300+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x3552fd0>, executor queue full (periodic:239)
2017-10-17 10:44:13,300+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <vdsm.virt.sampling.VMBulkSampler object at 0x3552d10>, executor queue full (periodic:239)
2017-10-17 10:44:13,311+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:44:28,315+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)


Note You need to log in before you can comment on or make changes to this bug.