This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1283725 - json decode wrong declaration hit the memory usage
json decode wrong declaration hit the memory usage
Status: CLOSED NOTABUG
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
4.17.11
x86_64 Linux
unspecified Severity medium (vote)
: ---
: ---
Assigned To: Dan Kenigsberg
Eldad Marciano
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-19 11:47 EST by Eldad Marciano
Modified: 2016-06-19 05:33 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-19 05:33:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

  None (edit)
Description Eldad Marciano 2015-11-19 11:47:40 EST
Description of problem:

by investigating this BZ 1269424 we found a leakage around json/decoder.py:381             
in time we use xmlrpc as channel type.

most of the decode calls comes from:
guestagent.py:482 -------- uniline = line.decode('utf8', 'replace') 
vm.py:1617\1615
misc.py:953

seems like we have wrong deceleration.

Version-Release number of selected component (if applicable):
master branch 4.17.999

How reproducible:
100%

Steps to Reproduce:
1. ruining 57 vms + guest agent + 12 SD's. 


Actual results:
json/decoder.py has been called and drives the leak.

Expected results:
json/decoder.py should not be called when using xmlrpc.

Additional info:
Comment 1 Dan Kenigsberg 2015-11-19 17:01:42 EST
Eldad, I do not understand this bug report. How did you measure the memory leak? How big is it? Why do you think that it is caused by json/decoder.py ?

We use json in the code for things other than jsonrpc.
Comment 2 Francesco Romani 2015-11-20 05:15:44 EST
(In reply to Eldad Marciano from comment #0)
> Actual results:
> json/decoder.py has been called and drives the leak.
> 
> Expected results:
> json/decoder.py should not be called when using xmlrpc.


This is a bit too simplistic.

We didn't expect these numbers in this flow, and this report is an hint we should also look elsewhere, *perhaps* in the Guest Agent area, because the traffic between VDSM and GA is in json rpc, regardless what Engine wants to use.

This doesn't mean we should use json at all in this flow.

Furthermore, not sure json "drives" the leak here. Surely it dominates the consumption, but this can be because few reasons.

For example, if the VDSM thread that handles GA message is too slow, it can pile up decoded data to process. But if we give it more time, it can process the queue actually solving "the leak"

A proper leak is when we forget the reference to some data, so we can't ever release its memory, or when we miss to release this data, like https://gerrit.ovirt.org/#/c/48616/
Comment 3 Piotr Kliczewski 2015-11-20 09:04:08 EST
(In reply to Eldad Marciano from comment #0)
> 
> most of the decode calls comes from:
> guestagent.py:482 -------- uniline = line.decode('utf8', 'replace') 
> vm.py:1617\1615
> misc.py:953
> 

Are you sure that the places you pointed to are not sting.decode?

> 
> 
> Expected results:
> json/decoder.py should not be called when using xmlrpc.
> 

Your expected assumption is wrong because json library is not only used by jsonrpc.
Comment 4 Eldad Marciano 2015-11-23 17:06:15 EST
(In reply to Dan Kenigsberg from comment #1)
> Eldad, I do not understand this bug report. How did you measure the memory
> leak? How big is it? Why do you think that it is caused by json/decoder.py ?
> 
> We use json in the code for things other than jsonrpc.

we taking memory dump every 1 hour once vdsm restarted for 12 hours (12 snapshots).

we compared the tracemalloc snapshots the first one and the last one, see the results:
* Top 10 lines:
----------------------------------------------------
#  1: json/decoder.py:381             : 6069.27 KiB (3.85%)
#  2: site-packages/libvirt.py:5149   : 0.00 KiB (0.00%)
#  3: site-packages/libvirt.py:5102   : 2953.84 KiB (1.87%)
#  4: python2.7/subprocess.py:1407    : 1000.04 KiB (0.63%)
#  5: python2.7/xmlrpclib.py:737      : 715.58 KiB (0.45%)
#  6: python2.7/xmlrpclib.py:735      : 410.95 KiB (0.26%)
#  7: profiling/memory.py:80          : 1523.45 KiB (0.97%)
#  8: python2.7/xmlrpclib.py:735      : 230.03 KiB (0.15%)
#  9: ioprocess/__init__.py:351       : 47.30 KiB (0.03%)
# 10: python2.7/genericpath.py:102    : 181.77 KiB (0.12%)
Comment 5 Eldad Marciano 2015-11-23 17:15:39 EST
(In reply to Piotr Kliczewski from comment #3)
> (In reply to Eldad Marciano from comment #0)
> > 
> > most of the decode calls comes from:
> > guestagent.py:482 -------- uniline = line.decode('utf8', 'replace') 
> > vm.py:1617\1615
> > misc.py:953
> > 
> 
> Are you sure that the places you pointed to are not sting.decode?
> 
> > 
> > 
> > Expected results:
> > json/decoder.py should not be called when using xmlrpc.
> > 
> 
> Your expected assumption is wrong because json library is not only used by
> jsonrpc.

Due to the tracemalloc results i look for who using decode (json/decoder.py:381) method in vdsm code (simple ctrl + G).
the list above as the 'find usages' results.

actually this very strange cause i tired to load the json module by the following simple code:
json.load(open(jsonfile, 'r'))
json.JSONDecoder.decode()
full code here - http://pastebin.test.redhat.com/330223

and i found stable memory for different labs (json libs').
still investigating
Comment 6 Yaniv Kaul 2016-01-10 08:49:21 EST
I'm not sure what the status of this bug...
Comment 7 Piotr Kliczewski 2016-01-11 03:26:22 EST
In comment #5 you stated that you are still investigating. Any updates for this BZ?
Comment 8 Michal Skrivanek 2016-02-15 03:54:10 EST
Eldad, ping?
Comment 9 Eldad Marciano 2016-02-18 11:00:57 EST
it was pending for a while, we still not investigate it deeper.
we should re-priorities this bug, will update ASAP 
Gil what do you think?
Comment 10 Tomas Jelinek 2016-03-01 03:24:59 EST
Eldad, Gil, any updates?
Comment 11 Tomas Jelinek 2016-04-06 03:19:16 EDT
@Eldad, Gil: is this still relevant?
Comment 12 Eldad Marciano 2016-05-03 07:28:04 EDT
we should re-investigate it.
meanwhile reducing priority
Comment 13 Eldad Marciano 2016-06-19 05:33:31 EDT
seems like its not reproduced anymore, by the last scale our testing om idle vdsm with ~70 vms ~12 SD's active - no leaks were found.
vdsm-4.18.0-0.el7ev.x86_64

Note You need to log in before you can comment on or make changes to this bug.