| Summary: | Possible memory leak | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Marcelo Ricardo Leitner <mleitner> | ||||||||
| Component: | NetworkManager | Assignee: | Lubomir Rintel <lkundrak> | ||||||||
| Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 23 | CC: | bgalvani, blueowl, dcbw, fgiudici, lkundrak, psimerda, tom+f | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2016-12-20 20:11:25 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
Marcelo Ricardo Leitner
2016-04-29 03:26:21 UTC
Lubomir told me that there were some fixes for this on 1.2. I'm testing NetworkManager-1.2.1-14786.9ecead6081.fc23.x86_64 now. root 8224 0.0 0.1 700564 22284 ? Ssl 00:18 0:07 /usr/sbin/NetworkManager --no-daemon Let's see how it goes. root 8224 0.0 0.5 1032176 71732 ? Ssl Abr29 1:15 /usr/sbin/NetworkManager --no-daemon I didn't restart it since comment #1 Had to reboot today: root 3308 0.0 0.1 764948 21396 ? Ssl 09:31 0:04 /usr/sbin/NetworkManager --no-daemon NetworkManager-1.2.1-14786.9ecead6081.fc23.x86_64 Before restart: root 1005 0.1 11.0 2881800 903048 ? Ssl Apr22 21:13 /usr/sbin/NetworkManager --no-daemon After 'systemctl restart NetworkManager.service': root 735 5.2 0.2 614948 16480 ? Ssl 17:05 0:00 /usr/sbin/NetworkManager --no-daemon I've noticed this a few times recently. PC is connected to a (reliable) wired network. WiFi is available, but not used. This is with NetworkManager-1.0.12-2.fc23.x86_64 - will try a 1.2 version and see if the situation improves. I've been running NM under valgrind for two days now. I couldn't spot anything much obvious, but one thing caught my eye. ==11353== 168 bytes in 1 blocks are possibly lost in loss record 9,193 of 10,405 ==11353== at 0x4C2AA98: calloc (vg_replace_malloc.c:711) ==11353== by 0x5E2D92C: PR_NewLock (in /usr/lib64/libnspr4.so) ==11353== by 0x55806A3: ??? (in /usr/lib64/libnss3.so) ==11353== by 0x557DCD6: ??? (in /usr/lib64/libnss3.so) ==11353== by 0x55435BC: ??? (in /usr/lib64/libnss3.so) ==11353== by 0x54C6289: ??? (in /usr/lib64/libnss3.so) ==11353== by 0x54C69D7: NSS_NoDB_Init (in /usr/lib64/libnss3.so) ==11353== by 0x2E997A: crypto_init (crypto_nss.c:50) ==11353== by 0x2E869A: crypto_load_and_verify_certificate (crypto.c:601) ==11353== by 0x22F6B2: load_and_verify_certificate (nm-setting-8021x.c:501) ==11353== by 0x22FA05: nm_setting_802_1x_set_ca_cert (nm-setting-8021x.c:663) ==11353== by 0x12F15D2E: eap_ttls_reader (reader.c:2761) I know it says "possibly" but crypto_init() is protected to only call NSS_NoDb_Init() only once, and yet valgrind is accusing it several times. # grep NSS_NoDB_Init nm.log | wc -l 412 If this module is re-loaded this could be a reason, as the global variable would be refreshed and the lib re-initialized all the time. But I don't know NM code that much to check this out. Created attachment 1156150 [details]
valgrind log so far
Terminal output from the run, FWIW:
[root@localhost ~]# valgrind --log-file=nm.log --leak-check=full /usr/sbin/NetworkManager --no-daemon
** Message: vpnc started with pid 21687
VPNC started in foreground...
vpnc: connection terminated by dead peer detection
** Message: vpnc started with pid 11370
VPNC started in foreground...
^C** Message: Terminated vpnc daemon with PID 11370.
vpnc: select: Interrupted system call
vpnc: terminated by signal: 15
/usr/sbin/vpnc: can't send packet: Invalid argument
[root@localhost ~]#
Again NSS in the spotlight:
==21360== 72 bytes in 1 blocks are definitely lost in loss record 6,453 of 10,21
8
==21360== at 0x4C2AA98: calloc (vg_replace_malloc.c:711)
==21360== by 0x5520136: ??? (in /usr/lib64/libnss3.so)
==21360== by 0x55201F8: ??? (in /usr/lib64/libnss3.so)
==21360== by 0x551FD98: ??? (in /usr/lib64/libnss3.so)
==21360== by 0x5514FEA: ??? (in /usr/lib64/libnss3.so)
==21360== by 0x5519645: ??? (in /usr/lib64/libnss3.so)
==21360== by 0x54C6220: ??? (in /usr/lib64/libnss3.so)
==21360== by 0x54C69D7: NSS_NoDB_Init (in /usr/lib64/libnss3.so)
==21360== by 0x2E997A: crypto_init (crypto_nss.c:50)
==21360== by 0x2E869A: crypto_load_and_verify_certificate (crypto.c:601)
==21360== by 0x22F6B2: load_and_verify_certificate (nm-setting-8021x.c:501)
==21360== by 0x22FA05: nm_setting_802_1x_set_ca_cert (nm-setting-8021x.c:663)
amongst others. I tried installing the relevant debuginfos, but some symbols missed. Either me or optimizations.
FWIW, I do not see leaks on NetworkManager-1.0.12-2.fc23.i686 running on a laptop with WiFi interface off. This laptop terminates my adsl connection, so it has a ppp0 interface comes and goes every day and there is no leak. From this other laptop, no leak: root 876 0.0 0.4 48688 4712 ? Ssl Abr23 1:08 /usr/sbin/NetworkManager --no-daemon (In reply to Marcelo Ricardo Leitner from comment #5) > > # grep NSS_NoDB_Init nm.log | wc -l > 412 > > If this module is re-loaded this could be a reason, as the global variable > would be refreshed and the lib re-initialized all the time. But I don't know > NM code that much to check this out. The global variable is in an object file statically compiled into the application, so this should not happen. Perhaps all the warnings including NSS_NoDB_Init are generated in the single, initial invocation? Apart from these, all other "definitely lost" warnings seems to be related to glib/gio (and hopefully false positives). (In reply to Beniamino Galvani from comment #8) > (In reply to Marcelo Ricardo Leitner from comment #5) > > > > # grep NSS_NoDB_Init nm.log | wc -l > > 412 > > > > If this module is re-loaded this could be a reason, as the global variable > > would be refreshed and the lib re-initialized all the time. But I don't know > > NM code that much to check this out. > > The global variable is in an object file statically compiled into the > application, so this should not happen. Perhaps all the warnings You mean the re-initialization of the global variable wouldn't happen? Code is like follows: static gboolean initialized = FALSE; gboolean crypto_init (GError **error) { SECStatus ret; if (initialized) return TRUE; PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 1); ret = NSS_NoDB_Init (NULL); ... } I am not sure but I would expect that if a library is unloaded and then loaded again that that "initialized" would become FALSE again, no? > including NSS_NoDB_Init are generated in the single, initial invocation? Too bad valgrind doesn't put a timestamp on them, but AFAICT they were distributed in time, with exception of the last ones which were batched (not sure how many). > > Apart from these, all other "definitely lost" warnings seems to be > related to glib/gio (and hopefully false positives). I'm open to suggestions. I can use an instrumented version for some time if needed to catch this, no problem. (In reply to Marcelo Ricardo Leitner from comment #9) > > You mean the re-initialization of the global variable wouldn't happen? Exactly. > I am not sure but I would expect that if a library is unloaded and then > loaded again that that "initialized" would become FALSE again, no? Here 'initialized' is not in a library (the file is part of libnm-core, which is statically compiled into the final application), so I expect that the variable can never return back to FALSE once set to TRUE. > > including NSS_NoDB_Init are generated in the single, initial invocation? > > Too bad valgrind doesn't put a timestamp on them, but AFAICT they were > distributed in time, with exception of the last ones which were batched (not > sure how many). Maybe next time we can use the "--time-stamp=yes" valgrind option. > > Apart from these, all other "definitely lost" warnings seems to be > > related to glib/gio (and hopefully false positives). > > I'm open to suggestions. I can use an instrumented version for some time if > needed to catch this, no problem. Out of curiosity, what's the output of: pmap -X `pidof NetworkManager` when there is a high memory usage? And do you still see leaks if you run NM with "G_SLICE=always-malloc" in the environment? (In reply to Beniamino Galvani from comment #10) > (In reply to Marcelo Ricardo Leitner from comment #9) > > > > You mean the re-initialization of the global variable wouldn't happen? > > Exactly. > > > I am not sure but I would expect that if a library is unloaded and then > > loaded again that that "initialized" would become FALSE again, no? > > Here 'initialized' is not in a library (the file is part of > libnm-core, which is statically compiled into the final application), > so I expect that the variable can never return back to FALSE once set > to TRUE. Ahh sure, okay. > > > including NSS_NoDB_Init are generated in the single, initial invocation? > > > > Too bad valgrind doesn't put a timestamp on them, but AFAICT they were > > distributed in time, with exception of the last ones which were batched (not > > sure how many). > > Maybe next time we can use the "--time-stamp=yes" valgrind option. Will do > > > Apart from these, all other "definitely lost" warnings seems to be > > > related to glib/gio (and hopefully false positives). > > > > I'm open to suggestions. I can use an instrumented version for some time if > > needed to catch this, no problem. > > Out of curiosity, what's the output of: > > pmap -X `pidof NetworkManager` > > when there is a high memory usage? Will attach it in the next comment, it's kind of big. > > And do you still see leaks if you run NM with "G_SLICE=always-malloc" > in the environment? [root@localhost ~]# ps uax | grep NetworkManag root 1288 0.0 0.7 783788 86304 ? Ssl Mai28 2:41 /usr/sbin/NetworkManager --no-daemon I think I still have them. I'll try using this together with the next valgrind run. Thanks Created attachment 1166028 [details]
pmap -X output with high memory usage
[root@localhost ~]# ps uax | grep NetworkManag
root 1288 0.0 0.7 783788 86304 ? Ssl Mai28 2:41 /usr/sbin/NetworkManager --no-daemon
[root@localhost ~]# uptime
10:31:12 up 10 days, 13:29, 20 users, load average: 0,04, 0,09, 0,17
Created attachment 1166399 [details]
New valgrind check
This is the output generated by:
G_SLICE=always-malloc valgrind --time-stamp=yes --log-file=nm.log --leak-check=full /usr/sbin/NetworkManager --no-daemon
(In reply to Marcelo Ricardo Leitner from comment #13) Still on NetworkManager-1.2.1-14786.9ecead6081.fc23.x86_64 , fwiw. (In reply to Marcelo Ricardo Leitner from comment #6) > Created attachment 1156150 [details] > valgrind log so far > ==21360== 5,094,359 (33,840 direct, 5,060,519 indirect) bytes in 846 blocks are definitely lost in loss record 10,218 of 10,218 ==21360== at 0x4C28D06: malloc (vg_replace_malloc.c:299) ==21360== by 0x7CE84D8: g_malloc (in /usr/lib64/libglib-2.0.so.0.4600.2) ==21360== by 0x7CFF622: g_slice_alloc (in /usr/lib64/libglib-2.0.so.0.4600.2) ==21360== by 0x7D1DB0D: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2) ==21360== by 0x7D1A83B: g_variant_builder_end (in /usr/lib64/libglib-2.0.so.0.4600.2) ==21360== by 0x779D2A5: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2) ==21360== by 0x779F2EB: g_dbus_message_new_from_blob (in /usr/lib64/libgio-2.0.so.0.4600.2) ==21360== by 0x77A94FC: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2) ==21360== by 0x774C5A2: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2) ==21360== by 0x774C5D8: ??? (in /usr/lib64/libgio-2.0.so.0.4600.2) ==21360== by 0x7CE2E39: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4600.2) ==21360== by 0x7CE31CF: ??? (in /usr/lib64/libglib-2.0.so.0.4600.2) ==21360== It actually looks like a bug in glib2 that is now fixed in glib2-2.46.2-2.fc23. See bug 1342253 I've updated to F24, now I have glib2-2.48.1-1.fc24.x86_64 and a different NM, so I'm afraid I cannot test it anymore. Currently using: NetworkManager-1.2.3-14857.a97ba456fe.fc24.x86_64 But please let me know if I can be of any help. This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |