Bug 1283725 - json decode wrong declaration hit the memory usage
Summary: json decode wrong declaration hit the memory usage
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.17.11
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: Eldad Marciano
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-19 16:47 UTC by Eldad Marciano
Modified: 2016-06-19 09:33 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-19 09:33:31 UTC
oVirt Team: Virt
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description Eldad Marciano 2015-11-19 16:47:40 UTC
Description of problem:

by investigating this BZ 1269424 we found a leakage around json/decoder.py:381             
in time we use xmlrpc as channel type.

most of the decode calls comes from:
guestagent.py:482 -------- uniline = line.decode('utf8', 'replace') 
vm.py:1617\1615
misc.py:953

seems like we have wrong deceleration.

Version-Release number of selected component (if applicable):
master branch 4.17.999

How reproducible:
100%

Steps to Reproduce:
1. ruining 57 vms + guest agent + 12 SD's. 


Actual results:
json/decoder.py has been called and drives the leak.

Expected results:
json/decoder.py should not be called when using xmlrpc.

Additional info:

Comment 1 Dan Kenigsberg 2015-11-19 22:01:42 UTC
Eldad, I do not understand this bug report. How did you measure the memory leak? How big is it? Why do you think that it is caused by json/decoder.py ?

We use json in the code for things other than jsonrpc.

Comment 2 Francesco Romani 2015-11-20 10:15:44 UTC
(In reply to Eldad Marciano from comment #0)
> Actual results:
> json/decoder.py has been called and drives the leak.
> 
> Expected results:
> json/decoder.py should not be called when using xmlrpc.


This is a bit too simplistic.

We didn't expect these numbers in this flow, and this report is an hint we should also look elsewhere, *perhaps* in the Guest Agent area, because the traffic between VDSM and GA is in json rpc, regardless what Engine wants to use.

This doesn't mean we should use json at all in this flow.

Furthermore, not sure json "drives" the leak here. Surely it dominates the consumption, but this can be because few reasons.

For example, if the VDSM thread that handles GA message is too slow, it can pile up decoded data to process. But if we give it more time, it can process the queue actually solving "the leak"

A proper leak is when we forget the reference to some data, so we can't ever release its memory, or when we miss to release this data, like https://gerrit.ovirt.org/#/c/48616/

Comment 3 Piotr Kliczewski 2015-11-20 14:04:08 UTC
(In reply to Eldad Marciano from comment #0)
> 
> most of the decode calls comes from:
> guestagent.py:482 -------- uniline = line.decode('utf8', 'replace') 
> vm.py:1617\1615
> misc.py:953
> 

Are you sure that the places you pointed to are not sting.decode?

> 
> 
> Expected results:
> json/decoder.py should not be called when using xmlrpc.
> 

Your expected assumption is wrong because json library is not only used by jsonrpc.

Comment 4 Eldad Marciano 2015-11-23 22:06:15 UTC
(In reply to Dan Kenigsberg from comment #1)
> Eldad, I do not understand this bug report. How did you measure the memory
> leak? How big is it? Why do you think that it is caused by json/decoder.py ?
> 
> We use json in the code for things other than jsonrpc.

we taking memory dump every 1 hour once vdsm restarted for 12 hours (12 snapshots).

we compared the tracemalloc snapshots the first one and the last one, see the results:
* Top 10 lines:
----------------------------------------------------
#  1: json/decoder.py:381             : 6069.27 KiB (3.85%)
#  2: site-packages/libvirt.py:5149   : 0.00 KiB (0.00%)
#  3: site-packages/libvirt.py:5102   : 2953.84 KiB (1.87%)
#  4: python2.7/subprocess.py:1407    : 1000.04 KiB (0.63%)
#  5: python2.7/xmlrpclib.py:737      : 715.58 KiB (0.45%)
#  6: python2.7/xmlrpclib.py:735      : 410.95 KiB (0.26%)
#  7: profiling/memory.py:80          : 1523.45 KiB (0.97%)
#  8: python2.7/xmlrpclib.py:735      : 230.03 KiB (0.15%)
#  9: ioprocess/__init__.py:351       : 47.30 KiB (0.03%)
# 10: python2.7/genericpath.py:102    : 181.77 KiB (0.12%)

Comment 5 Eldad Marciano 2015-11-23 22:15:39 UTC
(In reply to Piotr Kliczewski from comment #3)
> (In reply to Eldad Marciano from comment #0)
> > 
> > most of the decode calls comes from:
> > guestagent.py:482 -------- uniline = line.decode('utf8', 'replace') 
> > vm.py:1617\1615
> > misc.py:953
> > 
> 
> Are you sure that the places you pointed to are not sting.decode?
> 
> > 
> > 
> > Expected results:
> > json/decoder.py should not be called when using xmlrpc.
> > 
> 
> Your expected assumption is wrong because json library is not only used by
> jsonrpc.

Due to the tracemalloc results i look for who using decode (json/decoder.py:381) method in vdsm code (simple ctrl + G).
the list above as the 'find usages' results.

actually this very strange cause i tired to load the json module by the following simple code:
json.load(open(jsonfile, 'r'))
json.JSONDecoder.decode()
full code here - http://pastebin.test.redhat.com/330223

and i found stable memory for different labs (json libs').
still investigating

Comment 6 Yaniv Kaul 2016-01-10 13:49:21 UTC
I'm not sure what the status of this bug...

Comment 7 Piotr Kliczewski 2016-01-11 08:26:22 UTC
In comment #5 you stated that you are still investigating. Any updates for this BZ?

Comment 8 Michal Skrivanek 2016-02-15 08:54:10 UTC
Eldad, ping?

Comment 9 Eldad Marciano 2016-02-18 16:00:57 UTC
it was pending for a while, we still not investigate it deeper.
we should re-priorities this bug, will update ASAP 
Gil what do you think?

Comment 10 Tomas Jelinek 2016-03-01 08:24:59 UTC
Eldad, Gil, any updates?

Comment 11 Tomas Jelinek 2016-04-06 07:19:16 UTC
@Eldad, Gil: is this still relevant?

Comment 12 Eldad Marciano 2016-05-03 11:28:04 UTC
we should re-investigate it.
meanwhile reducing priority

Comment 13 Eldad Marciano 2016-06-19 09:33:31 UTC
seems like its not reproduced anymore, by the last scale our testing om idle vdsm with ~70 vms ~12 SD's active - no leaks were found.
vdsm-4.18.0-0.el7ev.x86_64


Note You need to log in before you can comment on or make changes to this bug.