Bug 1269424 - VDSM memory leak
VDSM memory leak
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.5.4
Unspecified Unspecified
urgent Severity urgent
: ovirt-3.6.3
: 3.6.3
Assigned To: Francesco Romani
Jiri Belka
: ZStream
: 1279950 (view as bug list)
Depends On: 1279740
Blocks: 1279950 1299491
  Show dependency treegraph
 
Reported: 2015-10-07 06:04 EDT by Pavel Zhukov
Modified: 2017-04-03 06:39 EDT (History)
23 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, VDSM memory consumption continually increased on some environments, caused by a memory leak in VDSM. The code has been updated to eliminate the VDSM memory leak, and there is no longer a memory usage increase when running VDSM.
Story Points: ---
Clone Of:
: 1279950 1299491 (view as bug list)
Environment:
Last Closed: 2016-03-09 14:45:52 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 48287 ovirt-3.6 MERGED scheduler: use single instance Never
oVirt gerrit 51630 None None None 2016-01-18 05:35 EST
oVirt gerrit 51868 master ABANDONED misc: Fix RWLock leak due to reference cycle 2016-01-15 10:18 EST
oVirt gerrit 51917 master MERGED resourceManager: Fix ResourceRef leak 2016-01-19 10:00 EST
oVirt gerrit 52422 None None None 2016-01-21 08:53 EST
oVirt gerrit 52423 None None None 2016-01-20 07:48 EST
oVirt gerrit 52563 ovirt-3.6 MERGED resourceManager: Fix ResourceRef leak 2016-01-22 09:16 EST
oVirt gerrit 52624 ovirt-3.6 MERGED concurrent: Introduce concurrent.thread() utility 2016-01-25 11:06 EST
oVirt gerrit 52625 ovirt-3.6 MERGED health: Introduce Vdsm health monitoring 2016-01-25 11:06 EST
oVirt gerrit 52626 ovirt-3.6 MERGED health: Report resource usage 2016-01-25 11:06 EST
oVirt gerrit 52629 ovirt-3.5 MERGED resourceManager: Fix ResourceRef leak 2016-01-25 05:45 EST
OpenStack gerrit 48293 None None None Never

  None (edit)
Description Pavel Zhukov 2015-10-07 06:04:11 EDT
Description of problem:
vdsm memory cunsumption grows on *some* environments. 

Version-Release number of selected component (if applicable):
vdsm-python-4.16.13.1-1.el7ev.noarch

How reproducible:
100% in some installations

Actual results:
This is vdsm memory consumption *before* host restarted:
vdsm     22366 25.9  3.7 13_697_756 9_890_156 ?    S<sl Sep24 4659:33 /usr/bin/python /usr/share/vdsm/vdsm

10:53:08 up 57 days,  2:34,  1 user,  load average: 0.42, 0.27, 0.24

Expected results:
This is vdsm memory consumtion *after* host was restarted:
vdsm     24448  7.1  0.7 5_612_236 1_881_976 ?     S<sl Sep21 212:56 /usr/bin/python /usr/share/vdsm/vdsm

13:23:40 up 12 days, 13:14,  1 user,  load average: 1.27, 1.25, 1.26


Additional info:
In this case host has been rebooted but vdsmd restart reduces memory consumption  as well. 

See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c31
http://lists.ovirt.org/pipermail/users/2015-March/032047.html
Comment 15 Nir Soffer 2015-10-08 10:04:29 EDT
Please check if using xmlrpc instead of jsonrpc eliminate the memory leak.
xmlrpc is fully supported in 3.5.

You can check this on specific host - edit the host, open the advanced
options and uncheck the "use json rpc" checkbox. No need to restart vdsm
(but you may need to put the host into maintenance).
Comment 18 Pavel Zhukov 2015-10-08 10:49:54 EDT
(In reply to Nir Soffer from comment #15)
> Please check if using xmlrpc instead of jsonrpc eliminate the memory leak.
> xmlrpc is fully supported in 3.5.
I suggested them to disable SSL and switch to XMLRPC. Waiting for test results.
Comment 19 Oved Ourfali 2015-10-08 11:21:57 EDT
(In reply to Pavel Zhukov from comment #18)
> (In reply to Nir Soffer from comment #15)
> > Please check if using xmlrpc instead of jsonrpc eliminate the memory leak.
> > xmlrpc is fully supported in 3.5.
> I suggested them to disable SSL and switch to XMLRPC. Waiting for test
> results.

I'd start with what Nir suggested... And not to disable SSL.
Comment 20 Piotr Kliczewski 2015-10-08 16:03:21 EDT
Looking at the comments above I can see that memory grows during migration. In 3.5 we use xmlrpc for this operation so I am not really sure whether switching
protocols will give anything. I suggest to disable SSL and see whether the same
memory consumption trend stays or changes somehow.
Comment 43 Yaniv Kaul 2015-11-10 10:11:11 EST
Eldad - any luck reproducing this, now that we have a suspect?
Comment 50 Nir Soffer 2015-12-01 06:00:31 EST
I could not find in this bug anything that suggest a leak in vdsm.
The only info found here is something that looks like the output
of ps without the headers, so we have to guess the meaning of the
numbers.

It seems that the customer expects vdsm memory after reboot or restart
to be the same as vdsm memory after running for many days.

We do not expect this behavior. We expect vdsm memory to remain constant
when the workload is stable.

Python may keep allocated memory, so Python process memory usage typically
reflect the highest usage it needed in the past.

Is the fix for bug 115810 included in the vdsm version used here?

We must have more precise information in this bug.
Comment 51 Pavel Zhukov 2015-12-01 08:01:22 EST
(In reply to Nir Soffer from comment #50)
> Python may keep allocated memory, so Python process memory usage typically
> reflect the highest usage it needed in the past.
Vdsm consumes twice more memory after 1 month than after 1 week. I don't think it's normal for Python even. 
> 
> Is the fix for bug 115810 included in the vdsm version used here?
It's 11 years old bug on anaconda. Typo?
> 
> We must have more precise information in this bug.

CU uses
vdsm-python-4.16.13.1-1.el7ev.noarch
Comment 53 Oved Ourfali 2016-01-03 02:39:04 EST
*** Bug 1279950 has been marked as a duplicate of this bug. ***
Comment 56 Michal Skrivanek 2016-01-15 05:26:01 EST
Eldad, were you able to validate the benefit of https://gerrit.ovirt.org/#/c/51630 ?
Comment 57 Michal Skrivanek 2016-01-18 08:51:51 EST
we want at least a partial backport to 3.5.z
Comment 61 Francesco Romani 2016-01-21 08:52:02 EST
bug targeted for 3.6.3, patches merged to 3.6 branch -> MODIFIED
Comment 62 Francesco Romani 2016-01-21 08:55:34 EST
back to post, we need to backport https://gerrit.ovirt.org/#/c/51917/ as well
Comment 63 Francesco Romani 2016-01-22 09:28:00 EST
51917 merged in ovirt-3.6 branch -> MODIFED
Comment 67 Jiri Belka 2016-02-04 07:23:49 EST
according to fromani@ via irc BZ1283725 is not really connected with this BZ.
Comment 68 Jiri Belka 2016-02-04 09:08:52 EST
ok,  vdsm-4.17.19-0.el7ev.noarch

this BZ depens on BZ1279740 which is CodeChange only. no visible mem usage increase seen for vdsm.
Comment 71 errata-xmlrpc 2016-03-09 14:45:52 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html

Note You need to log in before you can comment on or make changes to this bug.