Bug 1475971 - getAllVmIoTunePolicies can get blocked making executor queue full and host non responsive
getAllVmIoTunePolicies can get blocked making executor queue full and host n...
Status: CLOSED DUPLICATE of bug 1443654
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
All Linux
unspecified Severity high
: ---
: ---
Assigned To: Dan Kenigsberg
Raz Tamir
Depends On:
  Show dependency treegraph
Reported: 2017-07-27 12:24 EDT by nijin ashok
Modified: 2017-07-28 04:31 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-07-28 04:31:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description nijin ashok 2017-07-27 12:24:52 EDT
Description of problem:

Currently getAllVmIoTunePolicies is not executed as a periodic task. But if any of the storage domain accessing these VMs go down , these calls can get blocked  for a long time. This can make the tasks not getting served from the engine as the all worker threads are equipped with getAllVmIoTunePolicies. The host will not be able to process the request from the manager making host non responsive. This even happens if the ISO storage domain go away if VMs are having CD's attached from this domain.

I was able to replicate this in a 4.1 environment by starting 30 VMs in a host with CD attached and then blocking the connection between the NFS server and host. I edited the code to print the JsonRpcServer executor state just as we have in for periodic threads  and I can see that all the 8 workers are blocked in  getAllVmIoTunePolicies task.

2017-07-27 21:04:10,721+0530 DEBUG (jsonrpc/3) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:04:40,739+0530 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:05:10,746+0530 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:05:40,764+0530 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:06:10,771+0530 DEBUG (jsonrpc/6) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:06:40,795+0530 DEBUG (jsonrpc/0) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:07:10,820+0530 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)
2017-07-27 21:07:40,832+0530 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:532)

2017-07-27 21:09:24,879+0530 DEBUG (JsonRpcServer) [Executor] custom:executor state: count=8 workers=set([<Worker name=jsonrpc/5 running Task(callable=<functools.partial object at 0x7fd8bc6b7100>, timeout=None) task#=78 at 0x3ae4450>, <Worker name=jsonrpc/0 running Task(callable=<functools.partial object at 0x41ed7e0>, timeout=None) task#=66 at 0x3a5a290>, <Worker name=jsonrpc/4 running Task(callable=<functools.partial object at 0x41edd08>, timeout=None) task#=68 at 0x3ae40d0>, <Worker name=jsonrpc/6 running Task(callable=<functools.partial object at 0x3ee8c00>, timeout=None) task#=64 at 0x3ace7d0>, <Worker name=jsonrpc/3 running Task(callable=<functools.partial object at 0x41edf18>, timeout=None) task#=60 at 0x3aced10>, <Worker name=jsonrpc/1 running Task(callable=<functools.partial object at 0x3ee8c58>, timeout=None) task#=71 at 0x3a5a550>, <Worker name=jsonrpc/2 running Task(callable=<functools.partial object at 0x7fd8bc6b9418>, timeout=None) task#=77 at 0x3ac17d0>, <Worker name=jsonrpc/7 running Task(callable=<functools.partial object at 0x7fd8bc25b7e0>, timeout=None) task#=50 at 0x3ae4950>]) (executor:150)

Even if I use, virsh command , I can see that it's getting hanged for a long time.

time virsh -r blkdeviotune test2e hdc --live

real	3m50.048s
user	0m0.009s
sys	0m0.010s

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start around 30 VMs in a machine and block the NFS connection between the host and storage.

2. Monitor the JsonRpc executor. All the worker thread will be blocked in getAllVmIoTunePolicies .

Actual results:

The JsonRpc executor is blocked for a long time because of getAllVmIoTunePolicies. May have to call this from periodic executor with discard ability. 

Expected results:

The JsonRpc executor should not be blocked for a long time.

Additional info:
Comment 2 Roman Hodain 2017-07-28 02:03:06 EDT
Duplicate of Bug 1443654
Keeping it open for verification by the bugzilla owner.
Comment 3 nijin ashok 2017-07-28 04:31:27 EDT
Indeed this is fixed as per Bug 1443654 and I can't reproduce this with vdsm-4.19.24-1.el7ev.x86_64 .

Closing this.

*** This bug has been marked as a duplicate of bug 1443654 ***

Note You need to log in before you can comment on or make changes to this bug.