Bug 1433303 - NetworkManager leaks NMDevice objects for enslaved veth devices
Summary: NetworkManager leaks NMDevice objects for enslaved veth devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.3
Hardware: All
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Beniamino Galvani
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: 1436650
TreeView+ depends on / blocked
 
Reported: 2017-03-17 11:01 UTC by Sergio Lopez
Modified: 2017-08-01 09:24 UTC (History)
12 users (show)

Fixed In Version: NetworkManager-1.8.0-0.4.rc1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1436650 (view as bug list)
Environment:
Last Closed: 2017-08-01 09:24:38 UTC


Attachments (Terms of Use)
[PATCH] manager: ensure proper disposal of unrealized devices (1.06 KB, patch)
2017-03-17 23:10 UTC, Beniamino Galvani
no flags Details | Diff
[PATCH v2] manager: ensure proper disposal of unrealized devices (1.04 KB, patch)
2017-03-17 23:15 UTC, Beniamino Galvani
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2299 normal SHIPPED_LIVE Moderate: NetworkManager and libnl3 security, bug fix and enhancement update 2017-08-01 12:40:28 UTC

Description Sergio Lopez 2017-03-17 11:01:50 UTC
Description of problem:

For containers with the "bridge" driver, Docker creates various veth devices, and one of them is enslaved to an existing network bridge.

When the container is stopped, those devices are removed, but a reference to the one enslaved is leaked, causing NetworkManager VSZ and RSS to slowly but steadily increase through time.


Version-Release number of selected component (if applicable):

Tested with NetworkManager-1.4.0-17.el7_3.x86_64.
Upstream (1e4f1892e052c69983245b14e17a88dec6e5d138 2017-03-17) is also affected.


How reproducible:

Always.


Steps to Reproduce:

1. Execute a bunch of containers (i.e. "n=0; while echo $((++n)); docker run --rm busybox /bin/true; do :; done")
2. Wait for a few iterations.
3. See NetworkManager's VSZ and RSS increase over time.


Actual results:

NetworkManager keeps allocating and using more and more memory.


Expected results:

NetworkManager memory usage should be kept reasonably stable over time.


Additional info:

At nm-manager.c:2252, when a link is being removed from a software device, nm_device_unrealize is called, instead of remove_device (used for hardware devices):

As a consequence, the device fails the condition at nm-manager.c:977 (nm_device_unrealize sets NMDevicePriv->real to FALSE), and nm_device_removed is not called, which is the function that would eventually remove the slave from its master, releasing the otherwise pseudo-leaked reference.

Unconditionally calling nm_device_removed, even for real == FALSE devices, seems to fix the problem, but I'm not sure if that's the proper solution.

Comment 2 Beniamino Galvani 2017-03-17 23:10:00 UTC
Created attachment 1264326 [details]
[PATCH] manager: ensure proper disposal of unrealized devices

Thank you for the detailed analysis. I can reproduce the leak on 1.4
and git master with this script:

        ip l add br1 type bridge
        for i in $(seq 1 1000); do
                echo $i;
                ip l add veth$i type veth peer name vethp$i
                ip l set veth$i up
                ip a a dev veth$i 9.9.9.9
                ip l set veth$i master br1
                ip l del veth$i
        done

The attached patch against git master fixes the problem. It works for
1.4 too, but requires a (trivial) manual apply.

Comment 3 Beniamino Galvani 2017-03-17 23:15:14 UTC
Created attachment 1264327 [details]
[PATCH v2] manager: ensure proper disposal of unrealized devices

Ignore the previous patch please.

Comment 4 Lubomir Rintel 2017-03-21 11:32:40 UTC
Looks good to me

Comment 5 Thomas Haller 2017-03-21 11:41:47 UTC
lgtm

Comment 15 errata-xmlrpc 2017-08-01 09:24:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299


Note You need to log in before you can comment on or make changes to this bug.