Bug 1475971 - getAllVmIoTunePolicies can get blocked making executor queue full and host non responsive
Summary: getAllVmIoTunePolicies can get blocked making executor queue full and host n...
Keywords:
Status: CLOSED DUPLICATE of bug 1443654
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.3
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-27 16:24 UTC by nijin ashok
Modified: 2020-09-10 11:03 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-28 08:31:27 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)

Description nijin ashok 2017-07-27 16:24:52 UTC
Description of problem:

Currently getAllVmIoTunePolicies is not executed as a periodic task. But if any of the storage domain accessing these VMs go down , these calls can get blocked  for a long time. This can make the tasks not getting served from the engine as the all worker threads are equipped with getAllVmIoTunePolicies. The host will not be able to process the request from the manager making host non responsive. This even happens if the ISO storage domain go away if VMs are having CD's attached from this domain.

I was able to replicate this in a 4.1 environment by starting 30 VMs in a host with CD attached and then blocking the connection between the NFS server and host. I edited the code to print the JsonRpcServer executor state just as we have in for periodic threads  and I can see that all the 8 workers are blocked in  getAllVmIoTunePolicies task.


====
2017-07-27 21:04:10,721+0530 DEBUG (jsonrpc/3) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:04:40,739+0530 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:05:10,746+0530 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:05:40,764+0530 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:06:10,771+0530 DEBUG (jsonrpc/6) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:06:40,795+0530 DEBUG (jsonrpc/0) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:07:10,820+0530 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:07:40,832+0530 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)


2017-07-27 21:09:24,879+0530 DEBUG (JsonRpcServer) [Executor] custom:executor state: count=8 workers=set([<Worker name=jsonrpc/5 running Task(callable=<functools.partial object at 0x7fd8bc6b7100>, timeout=None) task#=78 at 0x3ae4450>, <Worker name=jsonrpc/0 running Task(callable=<functools.partial object at 0x41ed7e0>, timeout=None) task#=66 at 0x3a5a290>, <Worker name=jsonrpc/4 running Task(callable=<functools.partial object at 0x41edd08>, timeout=None) task#=68 at 0x3ae40d0>, <Worker name=jsonrpc/6 running Task(callable=<functools.partial object at 0x3ee8c00>, timeout=None) task#=64 at 0x3ace7d0>, <Worker name=jsonrpc/3 running Task(callable=<functools.partial object at 0x41edf18>, timeout=None) task#=60 at 0x3aced10>, <Worker name=jsonrpc/1 running Task(callable=<functools.partial object at 0x3ee8c58>, timeout=None) task#=71 at 0x3a5a550>, <Worker name=jsonrpc/2 running Task(callable=<functools.partial object at 0x7fd8bc6b9418>, timeout=None) task#=77 at 0x3ac17d0>, <Worker name=jsonrpc/7 running Task(callable=<functools.partial object at 0x7fd8bc25b7e0>, timeout=None) task#=50 at 0x3ae4950>]) (executor:150)
====

Even if I use, virsh command , I can see that it's getting hanged for a long time.

==
time virsh -r blkdeviotune test2e hdc --live
^C

real	3m50.048s
user	0m0.009s
sys	0m0.010s
==



Version-Release number of selected component (if applicable):
vdsm-4.19.10.1-1.el7ev.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Start around 30 VMs in a machine and block the NFS connection between the host and storage.

2. Monitor the JsonRpc executor. All the worker thread will be blocked in getAllVmIoTunePolicies .

Actual results:

The JsonRpc executor is blocked for a long time because of getAllVmIoTunePolicies. May have to call this from periodic executor with discard ability. 

Expected results:

The JsonRpc executor should not be blocked for a long time.


Additional info:

Comment 2 Roman Hodain 2017-07-28 06:03:06 UTC
Duplicate of Bug 1443654
Keeping it open for verification by the bugzilla owner.

Comment 3 nijin ashok 2017-07-28 08:31:27 UTC
Indeed this is fixed as per Bug 1443654 and I can't reproduce this with vdsm-4.19.24-1.el7ev.x86_64 .

Closing this.

*** This bug has been marked as a duplicate of bug 1443654 ***


Note You need to log in before you can comment on or make changes to this bug.