1381899 – The python jsonrpc client is parsing the API schema on each connect eating a lot of CPU cycles

Bug 1381899 - The python jsonrpc client is parsing the API schema on each connect eating a lot of CPU cycles

Summary: The python jsonrpc client is parsing the API schema on each connect eating a ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Bindings-API
Sub Component:
Version:	4.18.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.0.5
Target Release:	4.18.15.2
Assignee:	Piotr Kliczewski
QA Contact:	guy chen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1384786 1466461
TreeView+	depends on / blocked

Reported:	2016-10-05 10:32 UTC by Simone Tiraboschi
Modified:	2019-04-28 14:25 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Clones:	1384786 (view as bug list)
Environment:
Last Closed:	2017-01-18 07:37:29 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.0.z+ rule-engine: planning_ack+ sbonazzo: devel_ack+ pstehlik: testing_ack+

Attachments	(Terms of Use)
profile.txt (25.04 KB, text/plain) 2016-10-05 16:36 UTC, Simone Tiraboschi	no flags	Details
benchmark script comparing yaml, json and pickle for loading vdsm schema (1.07 KB, text/plain) 2016-10-05 18:00 UTC, Nir Soffer	no flags	Details
benchmark script comparing yaml cyaml, json and pickle for loading vdsm schema (1.61 KB, text/plain) 2016-10-07 12:03 UTC, Nir Soffer	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1349829	low	CLOSED	ovirt-ha-agent should reuse json-rpc connections	2021-02-22 00:41:40 UTC
oVirt gerrit	65230	master	MERGED	api: Speed up schema loading	2020-07-31 22:07:52 UTC
oVirt gerrit	65274	ovirt-4.0	MERGED	api: Speed up schema loading	2020-07-31 22:07:52 UTC
oVirt gerrit	65452	ovirt-4.0.5	MERGED	api: Speed up schema loading	2020-07-31 22:07:52 UTC

Internal Links: 1349829

Description Simone Tiraboschi 2016-10-05 10:32:41 UTC

Description of problem:
The python jsonrpc client is parsing the API schema on each connect.
The library does *not* need to load and parse the schema at all for sending requests to vdsm.

The schema is only needed if we want to verify request parameters,
or provide online help, but these are needed as defaults in a client library.

Version-Release number of selected component (if applicable):
4.18.14

How reproducible:
100%

Steps to Reproduce:
1. instantiate and connect the python jsonrpc client
2.
3.

Actual results:
I spend a lot of CPU power parsing the API schema.

Expected results:
It parses the API schema only if required.

Additional info:

Comment 1 Simone Tiraboschi 2016-10-05 12:22:17 UTC

The severity is quite high since due to https://bugzilla.redhat.com/1349829 is strongly affecting ovirt-ha-agent performances.

Comment 2 Piotr Kliczewski 2016-10-05 12:37:07 UTC

Simone what do you propose to do to fix this issue. If you create new client and not use it we could not parse the schema but if you do the schema would need to be parsed. I think that reasonable fix is to fix BZ #1349829.

Comment 3 Simone Tiraboschi 2016-10-05 13:28:39 UTC

Fixing BZ #1349829 (that requires BZ #1376843) will solve the issue on ovirt-ha-agent but I think it will be still worth to solve this by itself since a simple getVdsStats rises from 0.088s to 3.263s moving from vdsClient to the python jsonrpc client.
Of course once it's connected I don't expect that overhead but for a single command it's really too much.

[root@c72he20161003h1 ~]# time vdsClient -s 0 getVdsStats > /dev/null

real	0m0.088s
user	0m0.070s
sys	0m0.012s
[root@c72he20161003h1 ~]# time python test.py > /dev/null

real	0m3.263s
user	0m1.217s
sys	0m0.031s
[root@c72he20161003h1 ~]# cat test.py 
from ovirt_hosted_engine_ha.lib import util as ohautil

cli = ohautil.connect_vdsm_json_rpc(
    logger=None,
    timeout=60,
)
print(cli.getVdsStats())

Comment 4 Piotr Kliczewski 2016-10-05 13:35:24 UTC

I understand the issue but I have no good solution to the issue except of proper usage of it. Do you have any suggestion how to  improve it?

Comment 5 Nir Soffer 2016-10-05 13:44:21 UTC

(In reply to Piotr Kliczewski from comment #4)
> I understand the issue but I have no good solution to the issue except of
> proper usage of it. Do you have any suggestion how to  improve it?

You cannot use this client "properly" in a command line tool like vdsClient,
hosted-engine and vdsm-tool. On each run, you will have to pay for schema
loading.

The solution is very simple, do not use the schema in the client library :-)

The new client library Irit works on will solve this issue:
https://gerrit.ovirt.org/64502

If we want to add client side verification based on the schema, it should be
disabled by default so command line tool can enable it when the user ask for it,
maybe in --debug mode.

The other use case for the schema is generating on line help, this is also needed
only for command line client, not for the library.

If we must have verification on client, the solution it to generate the client
code from the schema. But I don't think we need this.

Comment 6 Piotr Kliczewski 2016-10-05 13:51:23 UTC

(In reply to Nir Soffer from comment #5)
> The solution is very simple, do not use the schema in the client library :-)
> 

I do not see any value in this approach. We had this before and hopefully all of us learned that it is very bad idea.

Comment 7 Nir Soffer 2016-10-05 13:59:16 UTC

(In reply to Piotr Kliczewski from comment #6)
> (In reply to Nir Soffer from comment #5)
> > The solution is very simple, do not use the schema in the client library :-)
> > 
> 
> I do not see any value in this approach. We had this before and hopefully
> all of us learned that it is very bad idea.

No, we had this before and it worked fine for years, nobody complained about
missing schema. Now with the schema, we cannot move to jsonrpc because of
unacceptable performance.

The verification is done on the server side anyway, so we don't need to do it on 
the client.

Comment 8 Piotr Kliczewski 2016-10-05 14:13:53 UTC

(In reply to Nir Soffer from comment #7)
> (In reply to Piotr Kliczewski from comment #6)
> > (In reply to Nir Soffer from comment #5)
> > > The solution is very simple, do not use the schema in the client library :-)
> > > 
> > 
> > I do not see any value in this approach. We had this before and hopefully
> > all of us learned that it is very bad idea.
> 
> No, we had this before and it worked fine for years, nobody complained about
> missing schema. Now with the schema, we cannot move to jsonrpc because of
> unacceptable performance.
> 

We just handled issues as they occur during runtime. We had schema which was not used, out of date and duplicated in our client. I am surprised that we like the duplication and burden to maintenance it.  

> The verification is done on the server side anyway, so we don't need to do
> it on 
> the client.

What is the value of verification which we made it irrelevant. Anyway schema is not good anyway mostly due to storage part being not updated.

Comment 9 Simone Tiraboschi 2016-10-05 16:36:53 UTC

Created attachment 1207646 [details]
profile.txt

Comment 10 Simone Tiraboschi 2016-10-05 16:37:30 UTC

Attaching profile output for my simple test as a reference.

Comment 11 Nir Soffer 2016-10-05 17:59:14 UTC

Looks like yaml is 18 times slower than json, and 70 times slower than pickle.

$ python bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing json...
testing pickle...
      yaml: 1.067940 seconds per load
      json: 0.058074 seconds per load
    pickle: 0.016276 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

$ grep 'model name' /proc/cpuinfo | wc -l
8

$ python bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing json...
testing pickle...
      yaml: 1.397966 seconds per load
      json: 0.063834 seconds per load
    pickle: 0.021624 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

$ grep 'model name' /proc/cpuinfo | wc -l
8

So to fix the load time, we can ship the schema as pickle.

I don't think that anyone will complain about 20 milliseconds
delay in a command line tool.

See the attached benchmark script.

Comment 12 Nir Soffer 2016-10-05 18:00:22 UTC

Created attachment 1207662 [details]
benchmark script comparing yaml, json and pickle for loading vdsm schema

Comment 13 Piotr Kliczewski 2016-10-06 07:50:58 UTC

I do not think that we should change schema format. We could use yaml.CLoader to parse it and we should performance improvement but as far as I know w do not want native code in vdsm.

In my opinion we should fix usage of the client and instead of creating new one each time we need to make a call to vdsm we should share one instance. This would save not only yaml parsing time but reconnect time as well.

Comment 14 Nir Soffer 2016-10-06 10:31:53 UTC

(In reply to Piotr Kliczewski from comment #13)
> I do not think that we should change schema format. We could use
> yaml.CLoader to parse it and we should performance improvement but as far as
> I know w do not want native code in vdsm.

We want native code when it is available, pure python code is too slow for such
tasks. We do use native code when we read json or pickle or anything interesting
in Python.

PyYaml uses libyaml by default, so I'm not sure you can speed it up, but if you
can please send a patch.

> In my opinion we should fix usage of the client and instead of creating new
> one each time we need to make a call to vdsm we should share one instance.
> This would save not only yaml parsing time but reconnect time as well.

This is relevant only for daemon using vdsm client library, (e.g. hosted engine
agent), when you can pay for loading the schema once.

For command line tools like hosted-engine, vdsm-tool, and vdsClient, introducing
operation that take more than a second on each run is not acceptable.

Comment 15 Nir Soffer 2016-10-07 11:57:52 UTC

In the previous test (comment 11) I used the pure python version of pickle by mistake, and I was not testing the libyaml based CLoader in yaml.

I rerun the tests, this time also with yaml.CLoader, on both python 2 and 3.

$ python bench.py
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.270148 seconds per load
     cyaml: 0.100760 seconds per load
      json: 0.059857 seconds per load
    pickle: 0.001730 seconds per load

python3 bench.py
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.360985 seconds per load
     cyaml: 0.109421 seconds per load
      json: 0.032055 seconds per load
    pickle: 0.002821 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz

Using yaml.CLoader is a one line change in the current code, yielding 12X 
improvement.

Using a pickled is 785X faster in python 2, and 485X faster in python 3, so this
is obviously the best way.

Note that both pickle and yaml are *not* safe, see:
http://pyyaml.org/wiki/PyYAMLDocumentation#LoadingYAML

Comment 16 Nir Soffer 2016-10-07 12:03:17 UTC

Created attachment 1208141 [details]
benchmark script comparing yaml cyaml, json and pickle for loading vdsm schema

Comment 17 Nir Soffer 2016-10-07 12:15:12 UTC

Previous test was using tempfs, speeding up pickle a little. Here is another
run on a faster machine with hard disk. The results are similar.

$ python bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.053720 seconds per load
     cyaml: 0.086472 seconds per load
      json: 0.049521 seconds per load
    pickle: 0.001516 seconds per load

$ python3 bench.py 
loading yaml schema...
writing json schema...
writing pickle schema...
testing yaml...
testing cyaml...
testing json...
testing pickle...
      yaml: 1.340607 seconds per load
     cyaml: 0.100010 seconds per load
      json: 0.026627 seconds per load
    pickle: 0.002153 seconds per load

$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Comment 18 Nir Soffer 2016-10-07 14:50:26 UTC

This was reported for ovirt 3.6.6:
http://lists.ovirt.org/pipermail/users/2016-August/041764.html

And 4.0.3:
http://lists.ovirt.org/pipermail/users/2016-October/043341.html

Since the fix is trivial, I think we should backport down to 3.6.10.

Comment 19 Pavel Stehlik 2016-10-14 07:15:14 UTC

QE see c#14 - HostedEngine only.

Comment 22 guy chen 2016-11-06 10:11:53 UTC

Tested again the CPU utilization with 4.0.5 repos.
With 3 VMS, ovirt-ha-agent CPU spikes sometime (every 20-30 seconds) up to ~20% utilization, VS. up to 66% documented at the bug, thus bug is verified.

Note You need to log in before you can comment on or make changes to this bug.