Description of problem:
For containers with the "bridge" driver, Docker creates various veth devices, and one of them is enslaved to an existing network bridge.
When the container is stopped, those devices are removed, but a reference to the one enslaved is leaked, causing NetworkManager VSZ and RSS to slowly but steadily increase through time.
Version-Release number of selected component (if applicable):
Tested with NetworkManager-1.4.0-17.el7_3.x86_64.
Upstream (1e4f1892e052c69983245b14e17a88dec6e5d138 2017-03-17) is also affected.
Steps to Reproduce:
1. Execute a bunch of containers (i.e. "n=0; while echo $((++n)); docker run --rm busybox /bin/true; do :; done")
2. Wait for a few iterations.
3. See NetworkManager's VSZ and RSS increase over time.
NetworkManager keeps allocating and using more and more memory.
NetworkManager memory usage should be kept reasonably stable over time.
At nm-manager.c:2252, when a link is being removed from a software device, nm_device_unrealize is called, instead of remove_device (used for hardware devices):
As a consequence, the device fails the condition at nm-manager.c:977 (nm_device_unrealize sets NMDevicePriv->real to FALSE), and nm_device_removed is not called, which is the function that would eventually remove the slave from its master, releasing the otherwise pseudo-leaked reference.
Unconditionally calling nm_device_removed, even for real == FALSE devices, seems to fix the problem, but I'm not sure if that's the proper solution.
Created attachment 1264326 [details]
[PATCH] manager: ensure proper disposal of unrealized devices
Thank you for the detailed analysis. I can reproduce the leak on 1.4
and git master with this script:
ip l add br1 type bridge
for i in $(seq 1 1000); do
ip l add veth$i type veth peer name vethp$i
ip l set veth$i up
ip a a dev veth$i 188.8.131.52
ip l set veth$i master br1
ip l del veth$i
The attached patch against git master fixes the problem. It works for
1.4 too, but requires a (trivial) manual apply.
Created attachment 1264327 [details]
[PATCH v2] manager: ensure proper disposal of unrealized devices
Ignore the previous patch please.
Looks good to me
Applied to master:
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.