1496204 – NetworkManager consuming lots of memory when running docker containers

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1496204 - NetworkManager consuming lots of memory when running docker containers

Summary: NetworkManager consuming lots of memory when running docker containers

Keywords:
Status:	CLOSED DUPLICATE of bug 1461643
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	7.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Beniamino Galvani
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-26 16:14 UTC by boo kheng khoo
Modified:	2017-11-02 14:42 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-02 14:42:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
additional info (11.51 KB, text/plain) 2017-09-26 16:14 UTC, boo kheng khoo	no flags	Details
valgrind log (757.26 KB, text/plain) 2017-09-26 19:51 UTC, boo kheng khoo	no flags	Details
additional info when running valgrind (2.78 KB, text/plain) 2017-09-26 19:52 UTC, boo kheng khoo	no flags	Details
more valgrind log (12.72 MB, application/x-gzip) 2017-09-27 20:11 UTC, boo kheng khoo	no flags	Details
View All

Description boo kheng khoo 2017-09-26 16:14:09 UTC

Created attachment 1331189 [details]
additional info

Description of problem:
On OpenShift node with containers starting and stopping, NetworkManager consumes lots of memory after the node has been running for some time.

Issue is seen also wth just docker.

Version-Release number of selected component (if applicable):
RHEL 7.3 (Maipo)
NetworkManager 1.4.0-14.el7_3

How reproducible:
Always.

Steps to Reproduce:
1. install and start docker
2. run the following:
  while true; do docker run --name test -d httpd; docker rm -f test; done 
3. let the above run for some time, check memory usage of NetworkMansger:
  top -b -n 1 -p $(pgrep -o NetworkManager)

Actual results:
The amount of memory consumed by NetworkManager increases with each iteration of container start/stop, it never decrease, even after all containers has been stopped and removed.

If the while loop is left running for long period of time, NetworkManager would ended up consuming over 1 gb of memory or more.

Expected results:
Memory usage of NetworkManager should not increase monotonically for each container, because network devices are disconnected after container stopped.

Additional info:
https://github.com/moby/moby/issues/32460

Comment 2 Josep 'Pep' Turro Mauri 2017-09-26 17:34:46 UTC

(In reply to boo kheng khoo from comment #0)
> Version-Release number of selected component (if applicable):
> RHEL 7.3 (Maipo)
> NetworkManager 1.4.0-14.el7_3

A bit of a blind shot (haven't really looked in detail) but judging only by the version number this might be a duplicate: Bug 1436650 fixed a leak in 7.3's NM.

Can you try with NetworkManager-1.4.0-18.el7_3 or later and see if the problem persists?

If so, there's also bug 1461643 with an ongoing investigation of memory consumption by NM.

Comment 3 Thomas Haller 2017-09-26 17:42:56 UTC

To me this sounds like a duplicate of bug 1461643.

Can you provide the requested information from  https://bugzilla.redhat.com/show_bug.cgi?id=1461643#c27

Thanks.

Comment 4 boo kheng khoo 2017-09-26 19:47:44 UTC

(In reply to Josep 'Pep' Turro Mauri from comment #2)
> (In reply to boo kheng khoo from comment #0)
> > Version-Release number of selected component (if applicable):
> > RHEL 7.3 (Maipo)
> > NetworkManager 1.4.0-14.el7_3
> 
> A bit of a blind shot (haven't really looked in detail) but judging only by
> the version number this might be a duplicate: Bug 1436650 fixed a leak in
> 7.3's NM.
> 
> Can you try with NetworkManager-1.4.0-18.el7_3 or later and see if the
> problem persists?
> 
> If so, there's also bug 1461643 with an ongoing investigation of memory
> consumption by NM.

I could not find NetworkManager version 1.4.0-18.el7_3 anywhere. The yum repo is a private mirror; version 1.8.0-9.el7 is available, and it has the same issue as version 1.4.0-14.el7_3.

Comment 5 boo kheng khoo 2017-09-26 19:49:45 UTC

(In reply to Thomas Haller from comment #3)
> To me this sounds like a duplicate of bug 1461643.
> 
> Can you provide the requested information from 
> https://bugzilla.redhat.com/show_bug.cgi?id=1461643#c27
> 
> Thanks.

This bug does seems like 1461643.

Attaching the log file from valgrind requested; the log is captured when i am starting and stopping docker container in a loop.

two files:
* log.txt
* proc-status.txt

Comment 6 boo kheng khoo 2017-09-26 19:51:38 UTC

Created attachment 1331225 [details]
valgrind log

Comment 7 boo kheng khoo 2017-09-26 19:52:23 UTC

Created attachment 1331226 [details]
additional info when running valgrind

Comment 8 Beniamino Galvani 2017-09-27 10:07:26 UTC

Hi,

the latest NM package for RHEL 7.3 is 1.4.0-20, which is available in
the following repositories:

https://access.redhat.com/downloads/content/NetworkManager/1.4.0-20.el7_3/x86_64/fd431d51/package

and it resolves the memory leak reported in bug 1436650. Please try
upgrading to that version and check if the issue still happens.

If it does, then probably this is a duplicate of bug 1461643, which in
under investigation. It would be useful if you could run the valgrind
massif tool following the steps reported here:

https://bugzilla.redhat.com/show_bug.cgi?id=1461643#c20

(not the steps from previous comments) and attach the output, thanks!

Comment 9 boo kheng khoo 2017-09-27 20:09:59 UTC

loaded NM version 1.4.0-20, and tested starting/stop docker containers for about 30min.

when running NM as service, not seeing NM increasing memory consumption continuously, memory usage reached 8340kb and stayed there during the 30min test:
[root@ip-1-70-141-110 ~]# top -p $(pgrep NetworkManager -o) -b -n 1
top - 14:31:32 up  3:01,  3 users,  load average: 1.93, 3.00, 3.24
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.7 us, 18.6 sy,  0.0 ni, 54.7 id, 13.9 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  3881936 total,   158132 free,   206632 used,  3517172 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  3118388 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5847 root      20   0  429264   8340   6088 S   0.0  0.2   0:43.54 NetworkManager

----

when running with valgrind, memory usage is much higher, starting at ~90mb, then slowly increasing to ~124mb at the end of the test run. if i have leave the test running, i think the memory usage would probably continue to increase slowly. the rate of increase is much lower than NM version 1.4.0-14, the increase is probably due to valgrind, not NM.

attaching the valgrind log.

Comment 10 boo kheng khoo 2017-09-27 20:11:38 UTC

Created attachment 1331553 [details]
more valgrind log

Comment 11 Beniamino Galvani 2017-09-28 09:50:30 UTC

(In reply to boo kheng khoo from comment #9)
> loaded NM version 1.4.0-20, and tested starting/stop docker containers for
> about 30min.
>
> when running NM as service, not seeing NM increasing memory consumption
> continuously, memory usage reached 8340kb and stayed there during the 30min
> test:

If NetworkManager-1.4.0-20 does not show the leak, I think we can
close this bug as duplicate of bug 1436650.

> when running with valgrind, memory usage is much higher, starting at ~90mb,
> then slowly increasing to ~124mb at the end of the test run. if i have leave
> the test running, i think the memory usage would probably continue to
> increase slowly. the rate of increase is much lower than NM version
> 1.4.0-14, the increase is probably due to valgrind, not NM.

Yes, valgrind needs its own memory for internal bookkeeping, which can
change over time.

> attaching the valgrind log.
>
> ==12199== Memcheck, a memory error detector
> ==12199== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==12199== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
> ==12199== Command: /sbin/NetworkManager -d --log-level=TRACE

Thanks. This is the output of valgrind-memcheck, which does memory
access checking and tries to detect leaks, but unfortunately it
doesn't detect all possible leaks. For this reason I requested in
comment 8 to use another command, which launches valgrind-massif.

valgrind-massif simply takes snapshots of the allocations on the heap
at regular intervals and saves them to a file that can be later
analyzed. Anyway, since the memory consumption is stable with the last
NM package, there is no need for more logs.

Comment 14 Beniamino Galvani 2017-11-02 14:42:01 UTC

Hi,

I believe this is a duplicate of bug 1461643 and the patch there should fix this leak.

*** This bug has been marked as a duplicate of bug 1461643 ***

Note You need to log in before you can comment on or make changes to this bug.