Bug 1512547 - HostMonitor stops collecting data when TooManyTasks occurs
Summary: HostMonitor stops collecting data when TooManyTasks occurs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.1
: ---
Assignee: Francesco Romani
QA Contact: mlehrer
URL:
Whiteboard:
Depends On:
Blocks: 1544681
TreeView+ depends on / blocked
 
Reported: 2017-11-13 13:27 UTC by Roman Hodain
Modified: 2021-06-10 13:36 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
The host monitoring task will now correctly report host statistics after the vdsm task execution queue becomes full under extreme load.
Clone Of:
: 1544681 (view as bug list)
Environment:
Last Closed: 2018-05-15 17:52:46 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1419856 0 high CLOSED Ovirt 4.0.6 guests 'Not Responding' 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 3239511 0 None None None 2017-11-14 05:14:20 UTC
Red Hat Product Errata RHEA-2018:1489 0 None None None 2018-05-15 17:53:52 UTC
oVirt gerrit 86440 0 master MERGED tests: periodic: add test for exclusive operation 2021-02-09 23:06:18 UTC
oVirt gerrit 86441 0 master MERGED virt: periodic: reschedule exclusive operations 2021-02-09 23:06:18 UTC
oVirt gerrit 86531 0 ovirt-4.1 MERGED tests: periodic: add test for exclusive operation 2021-02-09 23:06:18 UTC
oVirt gerrit 86532 0 ovirt-4.1 MERGED virt: periodic: reschedule exclusive operations 2021-02-09 23:06:18 UTC
oVirt gerrit 86533 0 ovirt-4.1 MERGED lib: executor: max_workers may be None 2021-02-09 23:06:18 UTC

Internal Links: 1419856

Description Roman Hodain 2017-11-13 13:27:18 UTC
Description of problem:
sampling.HostMonitor is an exclusive operation which prevents this operation from rescheduling when the operation gets stuck. Bu in case that the operation fails not because of time out, but because of the executor queue is full the operation is not rescheduled and the stats. are not being collected. Rescheduling this operation in this case is legit approach even if it is and exclusive operation as it is clear that the operation is not running.

Version-Release number of selected component (if applicable):
vdsm-4.19.31-1

How reproducible:
100%

Steps to Reproduce:
1. Make the executor queue full

Actual results:
HostMonitor is not rescheduled 

Expected results:
HostMonitor is rescheduled 

Additional info:

Error message:
WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full (periodic:235)

code:
lib/vdsm/virt/periodic.py:
...
    def _dispatch(self):                                                         
        """                                                                      
        Send `func' to Executor to be run as soon as possible.                   
        """                                                                      
        self._call = None                                                        
        try:                                                                     
            self._executor.dispatch(self, self._timeout, discard=self._discard)  
        except executor.TooManyTasks:                                            
            self._log.warning('could not run %s, executor queue full',           
                              self._func)                                        
        finally:                                                                 
            if not self._exclusive:                                              
                self._reschedule()

The opration should get rescheduled when executor.TooManyTasks is caught.

Comment 2 Michal Skrivanek 2017-11-14 05:45:55 UTC
logs? Why is it stuck?

Comment 3 Germano Veit Michel 2017-11-14 05:52:59 UTC
(In reply to Michal Skrivanek from comment #2)
> logs? Why is it stuck?

Let me answer because Roman is not online yet.

It's not stuck. We think it fails to run (TooManyTasks) and is not rescheduled again because this one is set with 'exclusive=True' which only reschedules on __call__ , and not in _dispatch as it fails the if on the finally in case it gets an exception.

We dont have much logs except:

WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full (periodic:235)

And the fact that all host stats are outdated in vdsm-client Host getStats.

Comment 4 Germano Veit Michel 2017-11-14 06:03:24 UTC
 Operation( 
     sampling.HostMonitor(cif=cif),
     [...]
     exclusive=True <--- HostMonitor uses exclusive=true since BZ 1419856
 )

Something like this:

    def _dispatch(self):
        """                                                                      
        Send `func' to Executor to be run as soon as possible.                   
        """                                                                      
        self._call = None                                                        
        try:                                                                     
            self._executor.dispatch(self, self._timeout, discard=self._discard)  
        except executor.TooManyTasks:                                            
            self._log.warning('could not run %s, executor queue full',           
                              self._func)                                        
+           if self._exclusive:                                              
+               self._reschedule()
        finally:                                                                 
            if not self._exclusive:                                              
                self._reschedule()

Or change the finally... Does it make sense?

Comment 5 Gajanan 2017-11-22 12:44:13 UTC
Customer wants this bug to be fixed in RHV 4.1 , can this be released in Z stream ?

Comment 12 Francesco Romani 2018-01-15 11:22:30 UTC
(In reply to Germano Veit Michel from comment #3)
> (In reply to Michal Skrivanek from comment #2)
> > logs? Why is it stuck?
> 
> Let me answer because Roman is not online yet.
> 
> It's not stuck. We think it fails to run (TooManyTasks) and is not
> rescheduled again because this one is set with 'exclusive=True' which only
> reschedules on __call__ , and not in _dispatch as it fails the if on the
> finally in case it gets an exception.
> 
> We dont have much logs except:
> 
> WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run
> <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full
> (periodic:235)
> 
> And the fact that all host stats are outdated in vdsm-client Host getStats.

Hi,

strange you don't have more information, because we backported a fix for this:
https://bugzilla.redhat.com/show_bug.cgi?id=1490810
with this patch:
https://gerrit.ovirt.org/#/c/81788/
so you should be able to see which operations depleted the executor pool.

Comment 13 Francesco Romani 2018-01-15 11:27:59 UTC
(In reply to Germano Veit Michel from comment #4)
>  Operation( 
>      sampling.HostMonitor(cif=cif),
>      [...]
>      exclusive=True <--- HostMonitor uses exclusive=true since BZ 1419856
>  )
> 
> Something like this:
> 
>     def _dispatch(self):
>         """                                                                 
> 
>         Send `func' to Executor to be run as soon as possible.              
> 
>         """                                                                 
> 
>         self._call = None                                                   
> 
>         try:                                                                
> 
>             self._executor.dispatch(self, self._timeout,
> discard=self._discard)  
>         except executor.TooManyTasks:                                       
> 
>             self._log.warning('could not run %s, executor queue full',      
> 
>                               self._func)                                   
> 
> +           if self._exclusive:                                              
> +               self._reschedule()
>         finally:                                                            
> 
>             if not self._exclusive:                                         
> 
>                 self._reschedule()
> 
> Or change the finally... Does it make sense?

We made the HostMonitor exclusive for a reason: https://gerrit.ovirt.org/#/c/73133/ so we need a change to accomodate both cases.
One things that bothers me is why the executor got depleted in the first place, there is something wrong there.

Both the VM periodic operations and the Host monitoring insist on the same executor, I'm afraid we could have some interference here.

Besides, there are knobs we can use to tune the executor behaviour without code changes, in vdsm.conf:

[sampling]
periodic_workers: Number of worker threads to serve the periodic tasks (default 4)
periodic_task_per_worker: Max number of tasks which can be queued on workers. (default 100)
max_workers: Maximum number of worker threads to serve the periodic tasks 
            'at the same time. (default 30)


Have you tried tuning those parameters? Could you share the results?

Comment 14 Francesco Romani 2018-01-15 11:29:12 UTC
(In reply to Francesco Romani from comment #12)
> (In reply to Germano Veit Michel from comment #3)
> > (In reply to Michal Skrivanek from comment #2)
> > > logs? Why is it stuck?
> > 
> > Let me answer because Roman is not online yet.
> > 
> > It's not stuck. We think it fails to run (TooManyTasks) and is not
> > rescheduled again because this one is set with 'exclusive=True' which only
> > reschedules on __call__ , and not in _dispatch as it fails the if on the
> > finally in case it gets an exception.
> > 
> > We dont have much logs except:
> > 
> > WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run
> > <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full
> > (periodic:235)
> > 
> > And the fact that all host stats are outdated in vdsm-client Host getStats.
> 
> Hi,
> 
> strange you don't have more information, because we backported a fix for
> this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1490810
> with this patch:
> https://gerrit.ovirt.org/#/c/81788/
> so you should be able to see which operations depleted the executor pool.

nevermind, it was fixed in 4.1.7, but this bug is reported in 4.1.6 :(
If you update to a more recent version of Vdsm, however, we can have a better chance to understand the issue. Is this possible, maybe on a canary box?

Comment 16 Germano Veit Michel 2018-01-16 04:37:27 UTC
Tried to reproduce this on our labs. Even overloading the host I cant get the executor flooded. Tons of VMBulkSampler and DriveWatermarkMonitor being discarded though, like the customer saw.

I will ask the customer to upgrade so we can have a look at the state of his executor too.

Comment 18 Francesco Romani 2018-01-16 09:11:38 UTC
(In reply to Germano Veit Michel from comment #16)
> Tried to reproduce this on our labs. Even overloading the host I cant get
> the executor flooded. Tons of VMBulkSampler and DriveWatermarkMonitor being
> discarded though, like the customer saw.
> 
> I will ask the customer to upgrade so we can have a look at the state of his
> executor too.

Thanks, this will help.

However, HostMonitor - even with the exclusive flag - is supposed to be self-healing, so it should get back to normal automatically once the host is no more overloaded. This is worth investigating, I'll have a look myself.

Comment 19 Francesco Romani 2018-01-16 16:25:55 UTC
(In reply to Germano Veit Michel from comment #4)
>  Operation( 
>      sampling.HostMonitor(cif=cif),
>      [...]
>      exclusive=True <--- HostMonitor uses exclusive=true since BZ 1419856
>  )
> 
> Something like this:
> 
>     def _dispatch(self):
>         """                                                                 
> 
>         Send `func' to Executor to be run as soon as possible.              
> 
>         """                                                                 
> 
>         self._call = None                                                   
> 
>         try:                                                                
> 
>             self._executor.dispatch(self, self._timeout,
> discard=self._discard)  
>         except executor.TooManyTasks:                                       
> 
>             self._log.warning('could not run %s, executor queue full',      
> 
>                               self._func)                                   
> 
> +           if self._exclusive:                                              
> +               self._reschedule()
>         finally:                                                            
> 
>             if not self._exclusive:                                         
> 
>                 self._reschedule()
> 
> Or change the finally... Does it make sense?

OK, after a careful ispection of the code is just a bug. Exclusive operations are not correctly rescheduled if they happen to  find the executor queue full.
The fix is in this area, but I'm afraid it will need more work. Will work on something.

Comment 20 Germano Veit Michel 2018-01-16 22:58:23 UTC
Thanks Francesco, that's what I thought, it needs to be rescheduled.

Clearing the needinfo, let me know if you need anything else. I'll open a different BZ if something fishy shows up on the executor dump (if we receive it).

Comment 26 Germano Veit Michel 2018-02-02 03:16:19 UTC
Hi Francesco and Michal,

Just for completeness of this BZ.

What fills the executor on these mass VM migrations/startups is DriveWatermarkMonitor. 400 out of 400 tasks ;)

Once the load stabilizes (CPU not a 100% anymore), things go back to normal and DriveWatermarkMonitor seems to complete in time, not filling the executor anymore.

So I understand:
1) Mass migrations/startups -> Host overloaded
2) Pile up DriveWatermarkMonitor (which is a quite frequent operation, host is overloaded, they cant complete).

The only thing that worries me is that we have 2 DriveWatermarkMonitor queued for each VM in the executor. I understand this is scheduled every 2s, so timeout should be 2s/2 = 1s. So how do we get 2 stuck operations for each VM if 1s before running a new one the previous one should have expired? If it werent for this, instead of 400 tasks we would have half of that. Because all VMs have 2 DriveWatermarkMonitor stuck. 

Or is this just a sign that the executor workers are starving too? Let me know if you want a new BZ for this. I'm not sure.

Thanks

Comment 27 Michal Skrivanek 2018-02-02 07:49:41 UTC
> 2) Pile up DriveWatermarkMonitor (which is a quite frequent operation, host is overloaded, they cant complete).

yeah, that's not surprising.
Perhaps we do not really need new BZ, we moved to high watermark events in 4.2 and no longer poll that heavily. If you have such an overloaded system, could you try 4.2 beta?

Comment 28 Francesco Romani 2018-02-02 15:36:39 UTC
(In reply to Germano Veit Michel from comment #26)
> Hi Francesco and Michal,
> 
> Just for completeness of this BZ.
> 
> What fills the executor on these mass VM migrations/startups is
> DriveWatermarkMonitor. 400 out of 400 tasks ;)

On migrations we should suspend that.
On startup it may be indeed caused by overloaded host. Well, that also applies for migrations, but should be less severe.

> Once the load stabilizes (CPU not a 100% anymore), things go back to normal
> and DriveWatermarkMonitor seems to complete in time, not filling the
> executor anymore.
> 
> So I understand:
> 1) Mass migrations/startups -> Host overloaded
> 2) Pile up DriveWatermarkMonitor (which is a quite frequent operation, host
> is overloaded, they cant complete).

Yes, every 2s per VM per disk. No wonder the load is high. We have greatly improved in 4.2 as Michal said, however.

> The only thing that worries me is that we have 2 DriveWatermarkMonitor
> queued for each VM in the executor. I understand this is scheduled every 2s,
> so timeout should be 2s/2 = 1s. So how do we get 2 stuck operations for each
> VM if 1s before running a new one the previous one should have expired? If
> it werent for this, instead of 400 tasks we would have half of that. Because
> all VMs have 2 DriveWatermarkMonitor stuck. 

This is indeed interesting, the most likely explanation is that libvirtd is responding (eg not stuck), but slow. THis fits with the hypotesis of severly overloaded host.

Comment 29 Germano Veit Michel 2018-02-04 22:57:27 UTC
This is Production env, so I'll update them regrading upgrading to 4.2 as soon as GA is out.

Thanks for your comments!

Comment 31 Daniel Gur 2018-02-25 12:39:36 UTC
Hello Michael
Could you please provide Validation instructions for this bug
We where thinking:
Having Host1 and Host2.
Start VMs on Host1
Same time start migration of VMs from Host1 to Host2
or
Same time start migration of VMs from Host2 to Host1

Comment 32 Francesco Romani 2018-03-06 10:31:36 UTC
(In reply to Daniel Gur from comment #31)
> Hello Michael
> Could you please provide Validation instructions for this bug
> We where thinking:
> Having Host1 and Host2.
> Start VMs on Host1
> Same time start migration of VMs from Host1 to Host2
> or
> Same time start migration of VMs from Host2 to Host1

Migration is not really critical for this bug. I'd start reducing the periodic workers:

/etc/vdsm/vdsm.conf (or configlet under vdsm.conf.d)

[sampling]
periodic_workers = 1

(restart vdsmd)

this will make more likely to fill the executor queue.
Then run some vms, the more the better, and make sure periodic operations is slow; blocking access to storage should help for this task.

Comment 33 guy chen 2018-03-20 09:31:46 UTC
Bug was verified with the following :

Lab topology :
1 host
1 engine
4.2 build : 4.2.2-0.1
200 VMS on the host

Scenario :

Restarted 200 VMS on the host, host was overloaded with 100% CPU and executor queue full ERROR's on the VDSM log.
After VMS started to get back up and CPU lowered down on the server error's stopped and monitoring of the server was resumed.

Comment 38 errata-xmlrpc 2018-05-15 17:52:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 39 Franta Kust 2019-05-16 13:09:38 UTC
BZ<2>Jira Resync

Comment 40 Daniel Gur 2019-08-28 13:15:21 UTC
sync2jira

Comment 41 Daniel Gur 2019-08-28 13:20:24 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.