Bug 1490810

Summary: print task list to log in TooManyTasks issue
Product: [oVirt] vdsm Reporter: Oved Ourfali <oourfali>
Component: CoreAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.20.4CC: bugs, eheftman, ipinto, lveyde, tjelinek
Target Milestone: ovirt-4.1.7Flags: tjelinek: ovirt-4.1?
tjelinek: planning_ack?
tjelinek: devel_ack+
rule-engine: testing_ack+
Target Release: 4.19.33   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.19.33 Doc Type: Enhancement
Doc Text:
Previously, while VDSM performed periodic monitoring and maintenance tasks, operations would occasionally become too slow or even become blocked. In this case, the internal queue became full and the periodic operations were no longer performed. A "TooManyTasks" warning appeared in the log files at a maximum rate of once every 10 seconds. In this release, if VDSM cannot perform its periodic operations, in addition to issuing a warning in the log files, VDSM also dumps the contents of the queue into the logs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-13 12:25:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oved Ourfali 2017-09-12 09:35:56 UTC
We saw several occurrences of TooManyTasks issue on VDSM.
It can be useful to have a list of tasks in case of such an error.

Comment 1 Tomas Jelinek 2017-09-13 10:47:18 UTC
it seems both useful and easy enough for 4.1.7, targeting accordingly.

Comment 2 Francesco Romani 2017-10-17 07:35:11 UTC
To reproduce:
First, let's make more likely that Vdsm will see TooManyTasks. We can achieve this artificially reducing the resources available to the Vdsm executor, for example using those crazy settings in vdsm configuration:

[sampling]
periodic_workers = 1
periodic_task_per_worker = 1
max_workers = 2

Again, those settings are legal but absolutely crazy out of testing/development environment; then restart Vdsm.

Now Vdsm should see TooManyTasks error very frequently. One needs to run any workload, watch out for timeouts (much more likely).

After some time just open the Vdsm logs and check for lines like

2017-09-14 15:03:46,225+0200 WARN (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x35b6f90>, executor queue full (periodic:241) 2017-09-14 15:03:46,225+0200 WARN (vdsm.Scheduler) [virt.periodic.Operation] executor state: <Executor periodic workers=1 max_workers=2 <TaskQueue periodic max_tasks=1 tasks(1)=deque([<Task discardable <Operation action=<function <lambda> at 0x7f801cded140> at 0x34b5210> timeout=7.5, duration=0 at 0x3b15290>]) at 0x371ac90> at 0x371add0> 

the "deque=..." shows the current task list.

+++

[1] Usually the values are:

periodic_workers = N*M -> N numbers of cores in the box, M = multiplication factor usually 1 or 2. Values M > 1 are overhead, but sometimes needed


periodic_task_per_worker = 100

max_workers = P * K -> P = value of "periodic_workers", K = multiplication factor much like M above.

The value of "M" and "K" is eurhystic, depends on the setup, there is no "good" value here.

Comment 3 Israel Pinto 2017-10-17 07:46:24 UTC
Verify with:
vdsm-4.19.34-1.el7ev.x86_64

Did the steps at comment 2:
vdsm log:
[root@puma43 vdsm]# grep  virt.periodic.Operation vdsm.log 
2017-10-17 10:43:13,304+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.NumaInfoMonitor'> at 0x3552c10>, executor queue full (periodic:239)
2017-10-17 10:43:13,304+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x3552c90>, executor queue full (periodic:239)
2017-10-17 10:43:13,305+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <vdsm.virt.sampling.VMBulkSampler object at 0x3552d10>, executor queue full (periodic:239)
2017-10-17 10:43:13,305+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x3552fd0>, executor queue full (periodic:239)
2017-10-17 10:43:13,306+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <vdsm.virt.sampling.HostMonitor object at 0x3558090>, executor queue full (periodic:239)
2017-10-17 10:43:13,306+0300 WARN  (MainThread) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:43:28,308+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:43:28,309+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x3552c90>, executor queue full (periodic:239)
2017-10-17 10:43:43,302+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x3552fd0>, executor queue full (periodic:239)
2017-10-17 10:43:43,313+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x3552c90>, executor queue full (periodic:239)
2017-10-17 10:43:58,316+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:44:13,300+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x3552fd0>, executor queue full (periodic:239)
2017-10-17 10:44:13,300+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <vdsm.virt.sampling.VMBulkSampler object at 0x3552d10>, executor queue full (periodic:239)
2017-10-17 10:44:13,311+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)
2017-10-17 10:44:28,315+0300 WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <function <lambda> at 0x20c6668>, executor queue full (periodic:239)