Bug 1381899 - The python jsonrpc client is parsing the API schema on each connect eating a lot of CPU cycles
Summary: The python jsonrpc client is parsing the API schema on each connect eating a ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Bindings-API
Version: 4.18.14
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.0.5
: 4.18.15.2
Assignee: Piotr Kliczewski
QA Contact: guy chen
URL:
Whiteboard:
Depends On:
Blocks: 1384786 1466461
TreeView+ depends on / blocked
 
Reported: 2016-10-05 10:32 UTC by Simone Tiraboschi
Modified: 2019-04-28 14:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1384786 (view as bug list)
Environment:
Last Closed: 2017-01-18 07:37:29 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: planning_ack+
sbonazzo: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)
profile.txt (25.04 KB, text/plain)
2016-10-05 16:36 UTC, Simone Tiraboschi
no flags Details
benchmark script comparing yaml, json and pickle for loading vdsm schema (1.07 KB, text/plain)
2016-10-05 18:00 UTC, Nir Soffer
no flags Details
benchmark script comparing yaml cyaml, json and pickle for loading vdsm schema (1.61 KB, text/plain)
2016-10-07 12:03 UTC, Nir Soffer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1349829 0 low CLOSED ovirt-ha-agent should reuse json-rpc connections 2021-02-22 00:41:40 UTC
oVirt gerrit 65230 0 master MERGED api: Speed up schema loading 2020-07-31 22:07:52 UTC
oVirt gerrit 65274 0 ovirt-4.0 MERGED api: Speed up schema loading 2020-07-31 22:07:52 UTC
oVirt gerrit 65452 0 ovirt-4.0.5 MERGED api: Speed up schema loading 2020-07-31 22:07:52 UTC

Internal Links: 1349829

Description Simone Tiraboschi 2016-10-05 10:32:41 UTC
Description of problem:
The python jsonrpc client is parsing the API schema on each connect.
The library does *not* need to load and parse the schema at all for sending requests to vdsm.

The schema is only needed if we want to verify request parameters,
or provide online help, but these are needed as defaults in a client library.

Version-Release number of selected component (if applicable):
4.18.14

How reproducible:
100%

Steps to Reproduce:
1. instantiate and connect the python jsonrpc client
2.
3.

Actual results:
I spend a lot of CPU power parsing the API schema.

Expected results:
It parses the API schema only if required.

Additional info:

Comment 1 Simone Tiraboschi 2016-10-05 12:22:17 UTC
The severity is quite high since due to https://bugzilla.redhat.com/1349829 is strongly affecting ovirt-ha-agent performances.

Comment 2 Piotr Kliczewski 2016-10-05 12:37:07 UTC
Simone what do you propose to do to fix this issue. If you create new client and not use it we could not parse the schema but if you do the schema would need to be parsed. I think that reasonable fix is to fix BZ #1349829.

Comment 3 Simone Tiraboschi 2016-10-05 13:28:39 UTC
Fixing BZ #1349829 (that requires BZ #1376843) will solve the issue on ovirt-ha-agent but I think it will be still worth to solve this by itself since a simple getVdsStats rises from 0.088s to 3.263s moving from vdsClient to the python jsonrpc client.
Of course once it's connected I don't expect that overhead but for a single command it's really too much.

[root@c72he20161003h1 ~]# time vdsClient -s 0 getVdsStats > /dev/null

real	0m0.088s
user	0m0.070s
sys	0m0.012s
[root@c72he20161003h1 ~]# time python test.py > /dev/null

real	0m3.263s
user	0m1.217s
sys	0m0.031s
[root@c72he20161003h1 ~]# cat test.py 
from ovirt_hosted_engine_ha.lib import util as ohautil

cli = ohautil.connect_vdsm_json_rpc(
    logger=None,
    timeout=60,
)
print(cli.getVdsStats())

Comment 4 Piotr Kliczewski 2016-10-05 13:35:24 UTC
I understand the issue but I have no good solution to the issue except of proper usage of it. Do you have any suggestion how to  improve it?

Comment 5 Nir Soffer 2016-10-05 13:44:21 UTC
(In reply to Piotr Kliczewski from comment #4)
> I understand the issue but I have no good solution to the issue except of
> proper usage of it. Do you have any suggestion how to  improve it?

You cannot use this client "properly" in a command line tool like vdsClient,
hosted-engine and vdsm-tool. On each run, you will have to pay for schema
loading.

The solution is very simple, do not use the schema in the client library :-)

The new client library Irit works on will solve this issue:
https://gerrit.ovirt.org/64502

If we want to add client side verification based on the schema, it should be
disabled by default so command line tool can enable it when the user ask for it,
maybe in --debug mode.

The other use case for the schema is generating on line help, this is also needed
only for command line client, not for the library.

If we must have verification on client, the solution it to generate the client
code from the schema. But I don't think we need this.

Comment 6 Piotr Kliczewski 2016-10-05 13:51:23 UTC
(In reply to Nir Soffer from comment #5)
> The solution is very simple, do not use the schema in the client library :-)
> 

I do not see any value in this approach. We had this before and hopefully all of us learned that it is very bad idea.

Comment 7 Nir Soffer 2016-10-05 13:59:16 UTC
(In reply to Piotr Kliczewski from comment #6)
> (In reply to Nir Soffer from comment #5)
> > The solution is very simple, do not use the schema in the client library :-)
> > 
> 
> I do not see any value in this approach. We had this before and hopefully
> all of us learned that it is very bad idea.

No, we had this before and it worked fine for years, nobody complained about
missing schema. Now with the schema, we cannot move to jsonrpc because of
unacceptable performance.

The verification is done on the server side anyway, so we don't need to do it on 
the client.

Comment 8 Piotr Kliczewski 2016-10-05 14:13:53 UTC
(In reply to Nir Soffer from comment #7)
> (In reply to Piotr Kliczewski from comment #6)
> > (In reply to Nir Soffer from comment #5)
> > > The solution is very simple, do not use the schema in the client library :-)
> > > 
> > 
> > I do not see any value in this approach. We had this before and hopefully
> > all of us learned that it is very bad idea.
> 
> No, we had this before and it worked fine for years, nobody complained about
> missing schema. Now with the schema, we cannot move to jsonrpc because of
> unacceptable performance.
> 

We just handled issues as they occur during runtime. We had schema which was not used, out of date and duplicated in our client. I am surprised that we like the duplication and burden to maintenance it.  

> The verification is done on the server side anyway, so we don't need to do
> it on 
> the client.

What is the value of verification which we made it irrelevant. Anyway schema is not good anyway mostly due to storage part being not updated.

Comment 9 Simone Tiraboschi 2016-10-05 16:36:53 UTC
Created attachment 1207646 [details]
profile.txt

Comment 10 Simone Tiraboschi 2016-10-05 16:37:30 UTC
Attaching profile output for my simple test as a reference.

Comment 11 Nir Soffer 2016-10-05 17:59:14 UTC
Looks like yaml is 18 times slower than json, and 70 times slower than pickle.

$ python bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing json...
testing pickle...
      yaml: 1.067940 seconds per load
      json: 0.058074 seconds per load
    pickle: 0.016276 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

$ grep 'model name' /proc/cpuinfo | wc -l
8

$ python bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing json...
testing pickle...
      yaml: 1.397966 seconds per load
      json: 0.063834 seconds per load
    pickle: 0.021624 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

$ grep 'model name' /proc/cpuinfo | wc -l
8

So to fix the load time, we can ship the schema as pickle.

I don't think that anyone will complain about 20 milliseconds
delay in a command line tool.

See the attached benchmark script.

Comment 12 Nir Soffer 2016-10-05 18:00:22 UTC
Created attachment 1207662 [details]
benchmark script comparing yaml, json and pickle for loading vdsm schema

Comment 13 Piotr Kliczewski 2016-10-06 07:50:58 UTC
I do not think that we should change schema format. We could use yaml.CLoader to parse it and we should performance improvement but as far as I know w do not want native code in vdsm.

In my opinion we should fix usage of the client and instead of creating new one each time we need to make a call to vdsm we should share one instance. This would save not only yaml parsing time but reconnect time as well.

Comment 14 Nir Soffer 2016-10-06 10:31:53 UTC
(In reply to Piotr Kliczewski from comment #13)
> I do not think that we should change schema format. We could use
> yaml.CLoader to parse it and we should performance improvement but as far as
> I know w do not want native code in vdsm.

We want native code when it is available, pure python code is too slow for such
tasks. We do use native code when we read json or pickle or anything interesting
in Python.

PyYaml uses libyaml by default, so I'm not sure you can speed it up, but if you
can please send a patch.

> In my opinion we should fix usage of the client and instead of creating new
> one each time we need to make a call to vdsm we should share one instance.
> This would save not only yaml parsing time but reconnect time as well.

This is relevant only for daemon using vdsm client library, (e.g. hosted engine
agent), when you can pay for loading the schema once.

For command line tools like hosted-engine, vdsm-tool, and vdsClient, introducing
operation that take more than a second on each run is not acceptable.

Comment 15 Nir Soffer 2016-10-07 11:57:52 UTC
In the previous test (comment 11) I used the pure python version of pickle by mistake, and I was not testing the libyaml based CLoader in yaml.

I rerun the tests, this time also with yaml.CLoader, on both python 2 and 3.

$ python bench.py
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.270148 seconds per load
     cyaml: 0.100760 seconds per load
      json: 0.059857 seconds per load
    pickle: 0.001730 seconds per load

python3 bench.py
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.360985 seconds per load
     cyaml: 0.109421 seconds per load
      json: 0.032055 seconds per load
    pickle: 0.002821 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz

Using yaml.CLoader is a one line change in the current code, yielding 12X 
improvement.

Using a pickled is 785X faster in python 2, and 485X faster in python 3, so this
is obviously the best way.

Note that both pickle and yaml are *not* safe, see:
http://pyyaml.org/wiki/PyYAMLDocumentation#LoadingYAML

Comment 16 Nir Soffer 2016-10-07 12:03:17 UTC
Created attachment 1208141 [details]
benchmark script comparing yaml cyaml, json and pickle for loading vdsm schema

Comment 17 Nir Soffer 2016-10-07 12:15:12 UTC
Previous test was using tempfs, speeding up pickle a little. Here is another
run on a faster machine with hard disk. The results are similar.

$ python bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.053720 seconds per load
     cyaml: 0.086472 seconds per load
      json: 0.049521 seconds per load
    pickle: 0.001516 seconds per load

$ python3 bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.340607 seconds per load
     cyaml: 0.100010 seconds per load
      json: 0.026627 seconds per load
    pickle: 0.002153 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Comment 18 Nir Soffer 2016-10-07 14:50:26 UTC
This was reported for ovirt 3.6.6:
http://lists.ovirt.org/pipermail/users/2016-August/041764.html

And 4.0.3:
http://lists.ovirt.org/pipermail/users/2016-October/043341.html

Since the fix is trivial, I think we should backport down to 3.6.10.

Comment 19 Pavel Stehlik 2016-10-14 07:15:14 UTC
QE see c#14 - HostedEngine only.

Comment 22 guy chen 2016-11-06 10:11:53 UTC
Tested again the CPU utilization with 4.0.5 repos.
With 3 VMS, ovirt-ha-agent CPU spikes sometime (every 20-30 seconds) up to ~20% utilization, VS. up to 66% documented at the bug, thus bug is verified.


Note You need to log in before you can comment on or make changes to this bug.