Bug 1331618 - Possible memory leak
Summary: Possible memory leak
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 23
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Lubomir Rintel
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-29 03:26 UTC by Marcelo Ricardo Leitner
Modified: 2016-12-20 20:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-20 20:11:25 UTC
Type: Bug


Attachments (Terms of Use)
valgrind log so far (643.43 KB, text/plain)
2016-05-11 13:26 UTC, Marcelo Ricardo Leitner
no flags Details
pmap -X output with high memory usage (46.54 KB, text/plain)
2016-06-08 13:31 UTC, Marcelo Ricardo Leitner
no flags Details
New valgrind check (801.74 KB, text/plain)
2016-06-09 18:28 UTC, Marcelo Ricardo Leitner
no flags Details

Description Marcelo Ricardo Leitner 2016-04-29 03:26:21 UTC
Description of problem:
root      1256  0.0  1.1 905860 135332 ?       Ssl  Abr17   3:47 /usr/sbin/NetworkManager --no-daemon

 23:55:37 up 11 days,  1:24, 19 users,  load average: 0,02, 0,10, 0,07

It's using 135MB resident, it just seems to much? It's after 11 days uptime.

Then I restarted it and:
root      6143  2.5  0.1 552476 16952 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon
nearly just 17mb.

Connected to the vpn:
root      6143  0.2  0.1 626644 17452 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon

Disconnected:
root      6143  0.2  0.1 626644 17460 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon

Connected again:
root      6143  0.2  0.1 626644 17472 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon

Disconnected:
root      6143  0.2  0.1 626644 17512 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon

Connected again:
root      6143  0.2  0.1 626644 17516 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon

But I connect most of the times only once per day, or twice. Or 5 if I'm messing with something.

My usage is basically connect and disconnect from vpn using vpnc plugin via a wireless interface. I hardly use cable, I think I didn't during these last 11 days. I don't suspend the notebook, it's always on.

Maybe it's related to the way the vpn goes down. Normally my adsl link is restarted during the night and NM just reacts to it, instead of I disconnecting it like I did above. 

Left it idle a bit and:
root      6143  0.1  0.1 626644 17532 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon
grew a bit more already.

And a bit more, in a matter of minutes.
root      6143  0.1  0.1 626644 17536 ?        Ssl  00:00   0:00 /usr/sbin/NetworkManager --no-daemon

It may be related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1268950

Restarted it again:
root      8224 18.0  0.1 476552 16388 ?        Ssl  00:18   0:00 /usr/sbin/NetworkManager --no-daemon

2 minutes after and it hit:
root      8224  0.7  0.1 552608 16980 ?        Ssl  00:18   0:00 /usr/sbin/NetworkManager --no-daemon

A minute more and:
root      8224  0.2  0.1 552608 17028 ?        Ssl  00:18   0:00 /usr/sbin/NetworkManager --no-daemon

6 minutes from restart and:
root      8224  0.1  0.1 552608 17040 ?        Ssl  00:18   0:00 /usr/sbin/NetworkManager --no-daemon

So I'm afraid I really don't know what caused that high used in the beginning.
I'll just leave it running now and check for a re-occurrence. 

Version-Release number of selected component (if applicable):
I have all these installed:
NetworkManager-bluetooth-1.0.10-3.fc23.x86_64
NetworkManager-wwan-1.0.10-3.fc23.x86_64
NetworkManager-openswan-gnome-1.0.8-3.fc23.x86_64
NetworkManager-vpnc-1.0.8-1.fc23.x86_64
NetworkManager-wifi-1.0.10-3.fc23.x86_64
NetworkManager-libnm-1.0.10-3.fc23.x86_64
NetworkManager-l2tp-0.9.8.7-4.fc23.x86_64
NetworkManager-config-connectivity-fedora-1.0.10-3.fc23.x86_64
NetworkManager-openvpn-1.0.8-2.fc23.x86_64
NetworkManager-glib-1.0.10-3.fc23.x86_64
NetworkManager-pptp-1.1.0-2.20150428git695d4f2.fc23.x86_64
NetworkManager-1.0.10-3.fc23.x86_64
NetworkManager-vpnc-gnome-1.0.8-1.fc23.x86_64
NetworkManager-openvpn-gnome-1.0.8-2.fc23.x86_64
NetworkManager-team-1.0.10-3.fc23.x86_64
NetworkManager-adsl-1.0.10-3.fc23.x86_64
NetworkManager-iodine-0.0.5-2.fc23.x86_64
NetworkManager-pptp-gnome-1.1.0-2.20150428git695d4f2.fc23.x86_64
NetworkManager-openconnect-1.0.8-1.fc23.x86_64
NetworkManager-openswan-1.0.8-3.fc23.x86_64
NetworkManager-iodine-gnome-0.0.5-2.fc23.x86_64
NetworkManager-debuginfo-1.0.6-5.fc22.x86_64

Comment 1 Marcelo Ricardo Leitner 2016-04-29 18:58:10 UTC
Lubomir told me that there were some fixes for this on 1.2.
I'm testing NetworkManager-1.2.1-14786.9ecead6081.fc23.x86_64 now.

root      8224  0.0  0.1 700564 22284 ?        Ssl  00:18   0:07 /usr/sbin/NetworkManager --no-daemon

Let's see how it goes.

Comment 2 Marcelo Ricardo Leitner 2016-05-03 21:15:29 UTC
root      8224  0.0  0.5 1032176 71732 ?       Ssl  Abr29   1:15 /usr/sbin/NetworkManager --no-daemon

I didn't restart it since comment #1

Comment 3 Marcelo Ricardo Leitner 2016-05-04 17:24:17 UTC
Had to reboot today:
root      3308  0.0  0.1 764948 21396 ?        Ssl  09:31   0:04 /usr/sbin/NetworkManager --no-daemon

NetworkManager-1.2.1-14786.9ecead6081.fc23.x86_64

Comment 4 Thomas Lake 2016-05-05 16:33:44 UTC
Before restart:
root      1005  0.1 11.0 2881800 903048 ?      Ssl  Apr22  21:13 /usr/sbin/NetworkManager --no-daemon

After 'systemctl restart NetworkManager.service':
root       735  5.2  0.2 614948 16480 ?        Ssl  17:05   0:00 /usr/sbin/NetworkManager --no-daemon

I've noticed this a few times recently. PC is connected to a (reliable) wired network. WiFi is available, but not used.

This is with NetworkManager-1.0.12-2.fc23.x86_64 - will try a 1.2 version and see if the situation improves.

Comment 5 Marcelo Ricardo Leitner 2016-05-11 12:30:01 UTC
I've been running NM under valgrind for two days now. I couldn't spot anything much obvious, but one thing caught my eye.

==11353== 168 bytes in 1 blocks are possibly lost in loss record 9,193 of 10,405
==11353==    at 0x4C2AA98: calloc (vg_replace_malloc.c:711)
==11353==    by 0x5E2D92C: PR_NewLock (in /usr/lib64/libnspr4.so)
==11353==    by 0x55806A3: ??? (in /usr/lib64/libnss3.so)
==11353==    by 0x557DCD6: ??? (in /usr/lib64/libnss3.so)
==11353==    by 0x55435BC: ??? (in /usr/lib64/libnss3.so)
==11353==    by 0x54C6289: ??? (in /usr/lib64/libnss3.so)
==11353==    by 0x54C69D7: NSS_NoDB_Init (in /usr/lib64/libnss3.so)
==11353==    by 0x2E997A: crypto_init (crypto_nss.c:50)
==11353==    by 0x2E869A: crypto_load_and_verify_certificate (crypto.c:601)
==11353==    by 0x22F6B2: load_and_verify_certificate (nm-setting-8021x.c:501)
==11353==    by 0x22FA05: nm_setting_802_1x_set_ca_cert (nm-setting-8021x.c:663)
==11353==    by 0x12F15D2E: eap_ttls_reader (reader.c:2761)

I know it says "possibly" but crypto_init() is protected to only call NSS_NoDb_Init() only once, and yet valgrind is accusing it several times. 

# grep NSS_NoDB_Init nm.log | wc -l
412

If this module is re-loaded this could be a reason, as the global variable would be refreshed and the lib re-initialized all the time. But I don't know NM code that much to check this out.

Comment 6 Marcelo Ricardo Leitner 2016-05-11 13:26:08 UTC
Created attachment 1156150 [details]
valgrind log so far

Terminal output from the run, FWIW:
[root@localhost ~]# valgrind --log-file=nm.log --leak-check=full /usr/sbin/NetworkManager --no-daemon
** Message: vpnc started with pid 21687
VPNC started in foreground...
vpnc: connection terminated by dead peer detection
** Message: vpnc started with pid 11370
VPNC started in foreground...
^C** Message: Terminated vpnc daemon with PID 11370.
vpnc: select: Interrupted system call
vpnc: terminated by signal: 15
/usr/sbin/vpnc: can't send packet: Invalid argument
[root@localhost ~]# 

Again NSS in the spotlight:

==21360== 72 bytes in 1 blocks are definitely lost in loss record 6,453 of 10,21
8
==21360==    at 0x4C2AA98: calloc (vg_replace_malloc.c:711)
==21360==    by 0x5520136: ??? (in /usr/lib64/libnss3.so)
==21360==    by 0x55201F8: ??? (in /usr/lib64/libnss3.so)
==21360==    by 0x551FD98: ??? (in /usr/lib64/libnss3.so)
==21360==    by 0x5514FEA: ??? (in /usr/lib64/libnss3.so)
==21360==    by 0x5519645: ??? (in /usr/lib64/libnss3.so)
==21360==    by 0x54C6220: ??? (in /usr/lib64/libnss3.so)
==21360==    by 0x54C69D7: NSS_NoDB_Init (in /usr/lib64/libnss3.so)
==21360==    by 0x2E997A: crypto_init (crypto_nss.c:50)
==21360==    by 0x2E869A: crypto_load_and_verify_certificate (crypto.c:601)
==21360==    by 0x22F6B2: load_and_verify_certificate (nm-setting-8021x.c:501)
==21360==    by 0x22FA05: nm_setting_802_1x_set_ca_cert (nm-setting-8021x.c:663)

amongst others. I tried installing the relevant debuginfos, but some symbols missed. Either me or optimizations.

Comment 7 Marcelo Ricardo Leitner 2016-05-16 17:03:21 UTC
FWIW, I do not see leaks on NetworkManager-1.0.12-2.fc23.i686 running on a laptop with WiFi interface off. This laptop terminates my adsl connection, so it has a ppp0 interface comes and goes every day and there is no leak.

From this other laptop, no leak:
root       876  0.0  0.4  48688  4712 ?        Ssl  Abr23   1:08 /usr/sbin/NetworkManager --no-daemon

Comment 8 Beniamino Galvani 2016-06-06 14:13:39 UTC
(In reply to Marcelo Ricardo Leitner from comment #5)
>
> # grep NSS_NoDB_Init nm.log | wc -l
> 412
>
> If this module is re-loaded this could be a reason, as the global variable
> would be refreshed and the lib re-initialized all the time. But I don't know
> NM code that much to check this out.

The global variable is in an object file statically compiled into the
application, so this should not happen. Perhaps all the warnings
including NSS_NoDB_Init are generated in the single, initial invocation?

Apart from these, all other "definitely lost" warnings seems to be
related to glib/gio (and hopefully false positives).

Comment 9 Marcelo Ricardo Leitner 2016-06-06 16:54:35 UTC
(In reply to Beniamino Galvani from comment #8)
> (In reply to Marcelo Ricardo Leitner from comment #5)
> >
> > # grep NSS_NoDB_Init nm.log | wc -l
> > 412
> >
> > If this module is re-loaded this could be a reason, as the global variable
> > would be refreshed and the lib re-initialized all the time. But I don't know
> > NM code that much to check this out.
> 
> The global variable is in an object file statically compiled into the
> application, so this should not happen. Perhaps all the warnings

You mean the re-initialization of the global variable wouldn't happen?

Code is like follows:

static gboolean initialized = FALSE;

gboolean
crypto_init (GError **error)
{
	SECStatus ret;

	if (initialized)
		return TRUE;

	PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 1);
	ret = NSS_NoDB_Init (NULL);
...
}

I am not sure but I would expect that if a library is unloaded and then loaded again that that "initialized" would become FALSE again, no?

> including NSS_NoDB_Init are generated in the single, initial invocation?

Too bad valgrind doesn't put a timestamp on them, but AFAICT they were distributed in time, with exception of the last ones which were batched (not sure how many).

> 
> Apart from these, all other "definitely lost" warnings seems to be
> related to glib/gio (and hopefully false positives).

I'm open to suggestions. I can use an instrumented version for some time if needed to catch this, no problem.

Comment 10 Beniamino Galvani 2016-06-08 13:10:01 UTC
(In reply to Marcelo Ricardo Leitner from comment #9)
>
> You mean the re-initialization of the global variable wouldn't happen?

Exactly.

> I am not sure but I would expect that if a library is unloaded and then
> loaded again that that "initialized" would become FALSE again, no?

Here 'initialized' is not in a library (the file is part of
libnm-core, which is statically compiled into the final application),
so I expect that the variable can never return back to FALSE once set
to TRUE.

> > including NSS_NoDB_Init are generated in the single, initial invocation?
>
> Too bad valgrind doesn't put a timestamp on them, but AFAICT they were
> distributed in time, with exception of the last ones which were batched (not
> sure how many).

Maybe next time we can use the "--time-stamp=yes" valgrind option.

> > Apart from these, all other "definitely lost" warnings seems to be
> > related to glib/gio (and hopefully false positives).
>
> I'm open to suggestions. I can use an instrumented version for some time if
> needed to catch this, no problem.

Out of curiosity, what's the output of:

  pmap -X `pidof NetworkManager`

when there is a high memory usage?

And do you still see leaks if you run NM with "G_SLICE=always-malloc"
in the environment?

Comment 11 Marcelo Ricardo Leitner 2016-06-08 13:30:10 UTC
(In reply to Beniamino Galvani from comment #10)
> (In reply to Marcelo Ricardo Leitner from comment #9)
> >
> > You mean the re-initialization of the global variable wouldn't happen?
> 
> Exactly.
> 
> > I am not sure but I would expect that if a library is unloaded and then
> > loaded again that that "initialized" would become FALSE again, no?
> 
> Here 'initialized' is not in a library (the file is part of
> libnm-core, which is statically compiled into the final application),
> so I expect that the variable can never return back to FALSE once set
> to TRUE.

Ahh sure, okay.

> > > including NSS_NoDB_Init are generated in the single, initial invocation?
> >
> > Too bad valgrind doesn't put a timestamp on them, but AFAICT they were
> > distributed in time, with exception of the last ones which were batched (not
> > sure how many).
> 
> Maybe next time we can use the "--time-stamp=yes" valgrind option.

Will do

> > > Apart from these, all other "definitely lost" warnings seems to be
> > > related to glib/gio (and hopefully false positives).
> >
> > I'm open to suggestions. I can use an instrumented version for some time if
> > needed to catch this, no problem.
> 
> Out of curiosity, what's the output of:
> 
>   pmap -X `pidof NetworkManager`
> 
> when there is a high memory usage?

Will attach it in the next comment, it's kind of big.

> 
> And do you still see leaks if you run NM with "G_SLICE=always-malloc"
> in the environment?

[root@localhost ~]# ps uax | grep NetworkManag
root      1288  0.0  0.7 783788 86304 ?        Ssl  Mai28   2:41 /usr/sbin/NetworkManager --no-daemon

I think I still have them. I'll try using this together with the next valgrind run. Thanks

Comment 12 Marcelo Ricardo Leitner 2016-06-08 13:31:33 UTC
Created attachment 1166028 [details]
pmap -X  output with high memory usage

[root@localhost ~]# ps uax | grep NetworkManag
root      1288  0.0  0.7 783788 86304 ?        Ssl  Mai28   2:41 /usr/sbin/NetworkManager --no-daemon

[root@localhost ~]# uptime
 10:31:12 up 10 days, 13:29, 20 users,  load average: 0,04, 0,09, 0,17

Comment 13 Marcelo Ricardo Leitner 2016-06-09 18:28:33 UTC
Created attachment 1166399 [details]
New valgrind check

This is the output generated by:
G_SLICE=always-malloc valgrind --time-stamp=yes --log-file=nm.log --leak-check=full /usr/sbin/NetworkManager --no-daemon

Comment 14 Marcelo Ricardo Leitner 2016-06-09 18:29:07 UTC
(In reply to Marcelo Ricardo Leitner from comment #13)

Still on NetworkManager-1.2.1-14786.9ecead6081.fc23.x86_64 , fwiw.

Comment 15 Blueowl 2016-08-10 17:18:48 UTC
(In reply to Marcelo Ricardo Leitner from comment #6)
> Created attachment 1156150 [details]
> valgrind log so far
> 
==21360== 5,094,359 (33,840 direct, 5,060,519 indirect) bytes in 846 blocks are definitely lost in loss record 10,218 of 10,218
==21360==    at 0x4C28D06: malloc (vg_replace_malloc.c:299)
==21360==    by 0x7CE84D8: g_malloc (in /usr/lib64/libglib-2.0.so.0.4600.2)
==21360==    by 0x7CFF622: g_slice_alloc (in /usr/lib64/libglib-2.0.so.0.4600.2)
==21360==    by 0x7D1DB0D: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==21360==    by 0x7D1A83B: g_variant_builder_end (in /usr/lib64/libglib-2.0.so.0.4600.2)
==21360==    by 0x779D2A5: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
==21360==    by 0x779F2EB: g_dbus_message_new_from_blob (in /usr/lib64/libgio-2.0.so.0.4600.2)
==21360==    by 0x77A94FC: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
==21360==    by 0x774C5A2: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
==21360==    by 0x774C5D8: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2)
==21360==    by 0x7CE2E39: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4600.2)
==21360==    by 0x7CE31CF: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2)
==21360== 

It actually looks like a bug in glib2 that is now fixed in glib2-2.46.2-2.fc23.
See bug 1342253

Comment 16 Marcelo Ricardo Leitner 2016-08-10 19:53:52 UTC
I've updated to F24, now I have glib2-2.48.1-1.fc24.x86_64 and a different NM, so I'm afraid I cannot test it anymore.
Currently using: NetworkManager-1.2.3-14857.a97ba456fe.fc24.x86_64
But please let me know if I can be of any help.

Comment 17 Fedora End Of Life 2016-11-25 07:30:31 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Fedora End Of Life 2016-12-20 20:11:25 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.