Bug 1269424

Summary: VDSM memory leak
Product: Red Hat Enterprise Virtualization Manager Reporter: Pavel Zhukov <pzhukov>
Component: vdsmAssignee: Francesco Romani <fromani>
Status: CLOSED ERRATA QA Contact: Jiri Belka <jbelka>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.4CC: ahoness, bazulay, bmcclain, cwu, emarcian, fromani, gklein, jentrena, lsurette, michal.skrivanek, mkalinin, nsoffer, oourfali, pkliczew, pzhukov, rhev-integ, rhodain, sbonazzo, sherold, s.kieske, ycui, yeylon, ykaul
Target Milestone: ovirt-3.6.3Keywords: ZStream
Target Release: 3.6.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, VDSM memory consumption continually increased on some environments, caused by a memory leak in VDSM. The code has been updated to eliminate the VDSM memory leak, and there is no longer a memory usage increase when running VDSM.
Story Points: ---
Clone Of:
: 1279950 1299491 (view as bug list) Environment:
Last Closed: 2016-03-09 19:45:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1279740    
Bug Blocks: 1279950, 1299491    

Description Pavel Zhukov 2015-10-07 10:04:11 UTC
Description of problem:
vdsm memory cunsumption grows on *some* environments. 

Version-Release number of selected component (if applicable):
vdsm-python-4.16.13.1-1.el7ev.noarch

How reproducible:
100% in some installations

Actual results:
This is vdsm memory consumption *before* host restarted:
vdsm     22366 25.9  3.7 13_697_756 9_890_156 ?    S<sl Sep24 4659:33 /usr/bin/python /usr/share/vdsm/vdsm

10:53:08 up 57 days,  2:34,  1 user,  load average: 0.42, 0.27, 0.24

Expected results:
This is vdsm memory consumtion *after* host was restarted:
vdsm     24448  7.1  0.7 5_612_236 1_881_976 ?     S<sl Sep21 212:56 /usr/bin/python /usr/share/vdsm/vdsm

13:23:40 up 12 days, 13:14,  1 user,  load average: 1.27, 1.25, 1.26


Additional info:
In this case host has been rebooted but vdsmd restart reduces memory consumption  as well. 

See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c31
http://lists.ovirt.org/pipermail/users/2015-March/032047.html

Comment 15 Nir Soffer 2015-10-08 14:04:29 UTC
Please check if using xmlrpc instead of jsonrpc eliminate the memory leak.
xmlrpc is fully supported in 3.5.

You can check this on specific host - edit the host, open the advanced
options and uncheck the "use json rpc" checkbox. No need to restart vdsm
(but you may need to put the host into maintenance).

Comment 18 Pavel Zhukov 2015-10-08 14:49:54 UTC
(In reply to Nir Soffer from comment #15)
> Please check if using xmlrpc instead of jsonrpc eliminate the memory leak.
> xmlrpc is fully supported in 3.5.
I suggested them to disable SSL and switch to XMLRPC. Waiting for test results.

Comment 19 Oved Ourfali 2015-10-08 15:21:57 UTC
(In reply to Pavel Zhukov from comment #18)
> (In reply to Nir Soffer from comment #15)
> > Please check if using xmlrpc instead of jsonrpc eliminate the memory leak.
> > xmlrpc is fully supported in 3.5.
> I suggested them to disable SSL and switch to XMLRPC. Waiting for test
> results.

I'd start with what Nir suggested... And not to disable SSL.

Comment 20 Piotr Kliczewski 2015-10-08 20:03:21 UTC
Looking at the comments above I can see that memory grows during migration. In 3.5 we use xmlrpc for this operation so I am not really sure whether switching
protocols will give anything. I suggest to disable SSL and see whether the same
memory consumption trend stays or changes somehow.

Comment 43 Yaniv Kaul 2015-11-10 15:11:11 UTC
Eldad - any luck reproducing this, now that we have a suspect?

Comment 50 Nir Soffer 2015-12-01 11:00:31 UTC
I could not find in this bug anything that suggest a leak in vdsm.
The only info found here is something that looks like the output
of ps without the headers, so we have to guess the meaning of the
numbers.

It seems that the customer expects vdsm memory after reboot or restart
to be the same as vdsm memory after running for many days.

We do not expect this behavior. We expect vdsm memory to remain constant
when the workload is stable.

Python may keep allocated memory, so Python process memory usage typically
reflect the highest usage it needed in the past.

Is the fix for bug 115810 included in the vdsm version used here?

We must have more precise information in this bug.

Comment 51 Pavel Zhukov 2015-12-01 13:01:22 UTC
(In reply to Nir Soffer from comment #50)
> Python may keep allocated memory, so Python process memory usage typically
> reflect the highest usage it needed in the past.
Vdsm consumes twice more memory after 1 month than after 1 week. I don't think it's normal for Python even. 
> 
> Is the fix for bug 115810 included in the vdsm version used here?
It's 11 years old bug on anaconda. Typo?
> 
> We must have more precise information in this bug.

CU uses
vdsm-python-4.16.13.1-1.el7ev.noarch

Comment 53 Oved Ourfali 2016-01-03 07:39:04 UTC
*** Bug 1279950 has been marked as a duplicate of this bug. ***

Comment 56 Michal Skrivanek 2016-01-15 10:26:01 UTC
Eldad, were you able to validate the benefit of https://gerrit.ovirt.org/#/c/51630 ?

Comment 57 Michal Skrivanek 2016-01-18 13:51:51 UTC
we want at least a partial backport to 3.5.z

Comment 61 Francesco Romani 2016-01-21 13:52:02 UTC
bug targeted for 3.6.3, patches merged to 3.6 branch -> MODIFIED

Comment 62 Francesco Romani 2016-01-21 13:55:34 UTC
back to post, we need to backport https://gerrit.ovirt.org/#/c/51917/ as well

Comment 63 Francesco Romani 2016-01-22 14:28:00 UTC
51917 merged in ovirt-3.6 branch -> MODIFED

Comment 67 Jiri Belka 2016-02-04 12:23:49 UTC
according to fromani@ via irc BZ1283725 is not really connected with this BZ.

Comment 68 Jiri Belka 2016-02-04 14:08:52 UTC
ok,  vdsm-4.17.19-0.el7ev.noarch

this BZ depens on BZ1279740 which is CodeChange only. no visible mem usage increase seen for vdsm.

Comment 71 errata-xmlrpc 2016-03-09 19:45:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html