Bug 1544681 - [downstream clone - 4.1.10] HostMonitor stops collecting data when TooManyTasks occurs
Summary: [downstream clone - 4.1.10] HostMonitor stops collecting data when TooManyTas...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.1.10
: ---
Assignee: Francesco Romani
QA Contact: Daniel Gur
URL:
Whiteboard:
Depends On: 1512547
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-13 09:38 UTC by rhev-integ
Modified: 2021-06-10 14:37 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Previously if the VDSM task execution queue became full under extreme load, the host monitoring task was never rescheduled again, so the host statistics were no longer updated. This issue has been fixed in this update.
Clone Of: 1512547
Environment:
Last Closed: 2018-03-20 16:38:36 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3239511 0 None None None 2018-02-13 09:41:45 UTC
Red Hat Product Errata RHEA-2018:0563 0 None None None 2018-03-20 16:39:43 UTC
oVirt gerrit 86440 0 master MERGED tests: periodic: add test for exclusive operation 2020-12-03 14:21:13 UTC
oVirt gerrit 86441 0 master MERGED virt: periodic: reschedule exclusive operations 2020-12-03 14:21:11 UTC
oVirt gerrit 86531 0 ovirt-4.1 MERGED tests: periodic: add test for exclusive operation 2020-12-03 14:21:11 UTC
oVirt gerrit 86532 0 ovirt-4.1 MERGED virt: periodic: reschedule exclusive operations 2020-12-03 14:21:38 UTC
oVirt gerrit 86533 0 ovirt-4.1 MERGED lib: executor: max_workers may be None 2020-12-03 14:21:39 UTC

Description rhev-integ 2018-02-13 09:38:55 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1512547 +++
======================================================================

Description of problem:
sampling.HostMonitor is an exclusive operation which prevents this operation from rescheduling when the operation gets stuck. Bu in case that the operation fails not because of time out, but because of the executor queue is full the operation is not rescheduled and the stats. are not being collected. Rescheduling this operation in this case is legit approach even if it is and exclusive operation as it is clear that the operation is not running.

Version-Release number of selected component (if applicable):
vdsm-4.19.31-1

How reproducible:
100%

Steps to Reproduce:
1. Make the executor queue full

Actual results:
HostMonitor is not rescheduled 

Expected results:
HostMonitor is rescheduled 

Additional info:

Error message:
WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full (periodic:235)

code:
lib/vdsm/virt/periodic.py:
...
    def _dispatch(self):                                                         
        """                                                                      
        Send `func' to Executor to be run as soon as possible.                   
        """                                                                      
        self._call = None                                                        
        try:                                                                     
            self._executor.dispatch(self, self._timeout, discard=self._discard)  
        except executor.TooManyTasks:                                            
            self._log.warning('could not run %s, executor queue full',           
                              self._func)                                        
        finally:                                                                 
            if not self._exclusive:                                              
                self._reschedule()

The opration should get rescheduled when executor.TooManyTasks is caught.

(Originally by Roman Hodain)

Comment 3 rhev-integ 2018-02-13 09:39:07 UTC
logs? Why is it stuck?

(Originally by michal.skrivanek)

Comment 4 rhev-integ 2018-02-13 09:39:13 UTC
(In reply to Michal Skrivanek from comment #2)
> logs? Why is it stuck?

Let me answer because Roman is not online yet.

It's not stuck. We think it fails to run (TooManyTasks) and is not rescheduled again because this one is set with 'exclusive=True' which only reschedules on __call__ , and not in _dispatch as it fails the if on the finally in case it gets an exception.

We dont have much logs except:

WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full (periodic:235)

And the fact that all host stats are outdated in vdsm-client Host getStats.

(Originally by Germano Veit Michel)

Comment 5 rhev-integ 2018-02-13 09:39:19 UTC
 Operation( 
     sampling.HostMonitor(cif=cif),
     [...]
     exclusive=True <--- HostMonitor uses exclusive=true since BZ 1419856
 )

Something like this:

    def _dispatch(self):
        """                                                                      
        Send `func' to Executor to be run as soon as possible.                   
        """                                                                      
        self._call = None                                                        
        try:                                                                     
            self._executor.dispatch(self, self._timeout, discard=self._discard)  
        except executor.TooManyTasks:                                            
            self._log.warning('could not run %s, executor queue full',           
                              self._func)                                        
+           if self._exclusive:                                              
+               self._reschedule()
        finally:                                                                 
            if not self._exclusive:                                              
                self._reschedule()

Or change the finally... Does it make sense?

(Originally by Germano Veit Michel)

Comment 6 rhev-integ 2018-02-13 09:39:24 UTC
Customer wants this bug to be fixed in RHV 4.1 , can this be released in Z stream ?

(Originally by Gajanan Chakkarwar)

Comment 13 rhev-integ 2018-02-13 09:40:02 UTC
(In reply to Germano Veit Michel from comment #3)
> (In reply to Michal Skrivanek from comment #2)
> > logs? Why is it stuck?
> 
> Let me answer because Roman is not online yet.
> 
> It's not stuck. We think it fails to run (TooManyTasks) and is not
> rescheduled again because this one is set with 'exclusive=True' which only
> reschedules on __call__ , and not in _dispatch as it fails the if on the
> finally in case it gets an exception.
> 
> We dont have much logs except:
> 
> WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run
> <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full
> (periodic:235)
> 
> And the fact that all host stats are outdated in vdsm-client Host getStats.

Hi,

strange you don't have more information, because we backported a fix for this:
https://bugzilla.redhat.com/show_bug.cgi?id=1490810
with this patch:
https://gerrit.ovirt.org/#/c/81788/
so you should be able to see which operations depleted the executor pool.

(Originally by Francesco Romani)

Comment 14 rhev-integ 2018-02-13 09:40:08 UTC
(In reply to Germano Veit Michel from comment #4)
>  Operation( 
>      sampling.HostMonitor(cif=cif),
>      [...]
>      exclusive=True <--- HostMonitor uses exclusive=true since BZ 1419856
>  )
> 
> Something like this:
> 
>     def _dispatch(self):
>         """                                                                 
> 
>         Send `func' to Executor to be run as soon as possible.              
> 
>         """                                                                 
> 
>         self._call = None                                                   
> 
>         try:                                                                
> 
>             self._executor.dispatch(self, self._timeout,
> discard=self._discard)  
>         except executor.TooManyTasks:                                       
> 
>             self._log.warning('could not run %s, executor queue full',      
> 
>                               self._func)                                   
> 
> +           if self._exclusive:                                              
> +               self._reschedule()
>         finally:                                                            
> 
>             if not self._exclusive:                                         
> 
>                 self._reschedule()
> 
> Or change the finally... Does it make sense?

We made the HostMonitor exclusive for a reason: https://gerrit.ovirt.org/#/c/73133/ so we need a change to accomodate both cases.
One things that bothers me is why the executor got depleted in the first place, there is something wrong there.

Both the VM periodic operations and the Host monitoring insist on the same executor, I'm afraid we could have some interference here.

Besides, there are knobs we can use to tune the executor behaviour without code changes, in vdsm.conf:

[sampling]
periodic_workers: Number of worker threads to serve the periodic tasks (default 4)
periodic_task_per_worker: Max number of tasks which can be queued on workers. (default 100)
max_workers: Maximum number of worker threads to serve the periodic tasks 
            'at the same time. (default 30)


Have you tried tuning those parameters? Could you share the results?

(Originally by Francesco Romani)

Comment 15 rhev-integ 2018-02-13 09:40:13 UTC
(In reply to Francesco Romani from comment #12)
> (In reply to Germano Veit Michel from comment #3)
> > (In reply to Michal Skrivanek from comment #2)
> > > logs? Why is it stuck?
> > 
> > Let me answer because Roman is not online yet.
> > 
> > It's not stuck. We think it fails to run (TooManyTasks) and is not
> > rescheduled again because this one is set with 'exclusive=True' which only
> > reschedules on __call__ , and not in _dispatch as it fails the if on the
> > finally in case it gets an exception.
> > 
> > We dont have much logs except:
> > 
> > WARN  (vdsm.Scheduler) [virt.periodic.Operation] could not run
> > <vdsm.virt.sampling.HostMonitor object at 0x215f450>, executor queue full
> > (periodic:235)
> > 
> > And the fact that all host stats are outdated in vdsm-client Host getStats.
> 
> Hi,
> 
> strange you don't have more information, because we backported a fix for
> this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1490810
> with this patch:
> https://gerrit.ovirt.org/#/c/81788/
> so you should be able to see which operations depleted the executor pool.

nevermind, it was fixed in 4.1.7, but this bug is reported in 4.1.6 :(
If you update to a more recent version of Vdsm, however, we can have a better chance to understand the issue. Is this possible, maybe on a canary box?

(Originally by Francesco Romani)

Comment 17 rhev-integ 2018-02-13 09:40:24 UTC
Tried to reproduce this on our labs. Even overloading the host I cant get the executor flooded. Tons of VMBulkSampler and DriveWatermarkMonitor being discarded though, like the customer saw.

I will ask the customer to upgrade so we can have a look at the state of his executor too.

(Originally by Germano Veit Michel)

Comment 19 rhev-integ 2018-02-13 09:40:35 UTC
(In reply to Germano Veit Michel from comment #16)
> Tried to reproduce this on our labs. Even overloading the host I cant get
> the executor flooded. Tons of VMBulkSampler and DriveWatermarkMonitor being
> discarded though, like the customer saw.
> 
> I will ask the customer to upgrade so we can have a look at the state of his
> executor too.

Thanks, this will help.

However, HostMonitor - even with the exclusive flag - is supposed to be self-healing, so it should get back to normal automatically once the host is no more overloaded. This is worth investigating, I'll have a look myself.

(Originally by Francesco Romani)

Comment 20 rhev-integ 2018-02-13 09:40:40 UTC
(In reply to Germano Veit Michel from comment #4)
>  Operation( 
>      sampling.HostMonitor(cif=cif),
>      [...]
>      exclusive=True <--- HostMonitor uses exclusive=true since BZ 1419856
>  )
> 
> Something like this:
> 
>     def _dispatch(self):
>         """                                                                 
> 
>         Send `func' to Executor to be run as soon as possible.              
> 
>         """                                                                 
> 
>         self._call = None                                                   
> 
>         try:                                                                
> 
>             self._executor.dispatch(self, self._timeout,
> discard=self._discard)  
>         except executor.TooManyTasks:                                       
> 
>             self._log.warning('could not run %s, executor queue full',      
> 
>                               self._func)                                   
> 
> +           if self._exclusive:                                              
> +               self._reschedule()
>         finally:                                                            
> 
>             if not self._exclusive:                                         
> 
>                 self._reschedule()
> 
> Or change the finally... Does it make sense?

OK, after a careful ispection of the code is just a bug. Exclusive operations are not correctly rescheduled if they happen to  find the executor queue full.
The fix is in this area, but I'm afraid it will need more work. Will work on something.

(Originally by Francesco Romani)

Comment 21 rhev-integ 2018-02-13 09:40:46 UTC
Thanks Francesco, that's what I thought, it needs to be rescheduled.

Clearing the needinfo, let me know if you need anything else. I'll open a different BZ if something fishy shows up on the executor dump (if we receive it).

(Originally by Germano Veit Michel)

Comment 27 rhev-integ 2018-02-13 09:41:18 UTC
Hi Francesco and Michal,

Just for completeness of this BZ.

What fills the executor on these mass VM migrations/startups is DriveWatermarkMonitor. 400 out of 400 tasks ;)

Once the load stabilizes (CPU not a 100% anymore), things go back to normal and DriveWatermarkMonitor seems to complete in time, not filling the executor anymore.

So I understand:
1) Mass migrations/startups -> Host overloaded
2) Pile up DriveWatermarkMonitor (which is a quite frequent operation, host is overloaded, they cant complete).

The only thing that worries me is that we have 2 DriveWatermarkMonitor queued for each VM in the executor. I understand this is scheduled every 2s, so timeout should be 2s/2 = 1s. So how do we get 2 stuck operations for each VM if 1s before running a new one the previous one should have expired? If it werent for this, instead of 400 tasks we would have half of that. Because all VMs have 2 DriveWatermarkMonitor stuck. 

Or is this just a sign that the executor workers are starving too? Let me know if you want a new BZ for this. I'm not sure.

Thanks

(Originally by Germano Veit Michel)

Comment 28 rhev-integ 2018-02-13 09:41:23 UTC
> 2) Pile up DriveWatermarkMonitor (which is a quite frequent operation, host is overloaded, they cant complete).

yeah, that's not surprising.
Perhaps we do not really need new BZ, we moved to high watermark events in 4.2 and no longer poll that heavily. If you have such an overloaded system, could you try 4.2 beta?

(Originally by michal.skrivanek)

Comment 29 rhev-integ 2018-02-13 09:41:28 UTC
(In reply to Germano Veit Michel from comment #26)
> Hi Francesco and Michal,
> 
> Just for completeness of this BZ.
> 
> What fills the executor on these mass VM migrations/startups is
> DriveWatermarkMonitor. 400 out of 400 tasks ;)

On migrations we should suspend that.
On startup it may be indeed caused by overloaded host. Well, that also applies for migrations, but should be less severe.

> Once the load stabilizes (CPU not a 100% anymore), things go back to normal
> and DriveWatermarkMonitor seems to complete in time, not filling the
> executor anymore.
> 
> So I understand:
> 1) Mass migrations/startups -> Host overloaded
> 2) Pile up DriveWatermarkMonitor (which is a quite frequent operation, host
> is overloaded, they cant complete).

Yes, every 2s per VM per disk. No wonder the load is high. We have greatly improved in 4.2 as Michal said, however.

> The only thing that worries me is that we have 2 DriveWatermarkMonitor
> queued for each VM in the executor. I understand this is scheduled every 2s,
> so timeout should be 2s/2 = 1s. So how do we get 2 stuck operations for each
> VM if 1s before running a new one the previous one should have expired? If
> it werent for this, instead of 400 tasks we would have half of that. Because
> all VMs have 2 DriveWatermarkMonitor stuck. 

This is indeed interesting, the most likely explanation is that libvirtd is responding (eg not stuck), but slow. THis fits with the hypotesis of severly overloaded host.

(Originally by Francesco Romani)

Comment 30 rhev-integ 2018-02-13 09:41:34 UTC
This is Production env, so I'll update them regrading upgrading to 4.2 as soon as GA is out.

Thanks for your comments!

(Originally by Germano Veit Michel)

Comment 32 Daniel Gur 2018-03-14 07:37:35 UTC
(In reply to Daniel Gur from comment #31) from cloned bug 4.2 Bug 1512547 
> Hello Michael
> Could you please provide Validation instructions for this bug
> We where thinking:
> Having Host1 and Host2.
> Start VMs on Host1
> Same time start migration of VMs from Host1 to Host2
> or
> Same time start migration of VMs from Host2 to Host1

Migration is not really critical for this bug. I'd start reducing the periodic workers:

/etc/vdsm/vdsm.conf (or configlet under vdsm.conf.d)

[sampling]
periodic_workers = 1

(restart vdsmd)

this will make more likely to fill the executor queue.
Then run some vms, the more the better, and make sure periodic operations is slow; blocking access to storage should help for this task.

Comment 34 guy chen 2018-03-18 12:08:38 UTC
Bug was verified with the following :

Lab topology :
1 host
1 engine
Latest 4.1 build : RHV BUILD REPORT: [BUILD VERSION: 4.1.10-6] [TARGET MILESTONE: ovirt-4.1.10] [14-03-2018]
150 VMS on the host

Scenario :

Restarted 150 VMS on the host, host was overloaded with 100% CPU and executor queue full ERROR's on the VDSM log.
After VMS started to get back up and CPU lowered down on the server error's stopped and monitoring of the server was resumed.

Comment 37 errata-xmlrpc 2018-03-20 16:38:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0563

Comment 38 Franta Kust 2019-05-16 13:06:49 UTC
BZ<2>Jira Resync

Comment 39 Daniel Gur 2019-08-28 13:13:40 UTC
sync2jira

Comment 40 Daniel Gur 2019-08-28 13:17:54 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.