Red Hat Bugzilla – Bug 1433303
NetworkManager leaks NMDevice objects for enslaved veth devices
Last modified: 2017-08-01 05:24:38 EDT
Description of problem: For containers with the "bridge" driver, Docker creates various veth devices, and one of them is enslaved to an existing network bridge. When the container is stopped, those devices are removed, but a reference to the one enslaved is leaked, causing NetworkManager VSZ and RSS to slowly but steadily increase through time. Version-Release number of selected component (if applicable): Tested with NetworkManager-1.4.0-17.el7_3.x86_64. Upstream (1e4f1892e052c69983245b14e17a88dec6e5d138 2017-03-17) is also affected. How reproducible: Always. Steps to Reproduce: 1. Execute a bunch of containers (i.e. "n=0; while echo $((++n)); docker run --rm busybox /bin/true; do :; done") 2. Wait for a few iterations. 3. See NetworkManager's VSZ and RSS increase over time. Actual results: NetworkManager keeps allocating and using more and more memory. Expected results: NetworkManager memory usage should be kept reasonably stable over time. Additional info: At nm-manager.c:2252, when a link is being removed from a software device, nm_device_unrealize is called, instead of remove_device (used for hardware devices): As a consequence, the device fails the condition at nm-manager.c:977 (nm_device_unrealize sets NMDevicePriv->real to FALSE), and nm_device_removed is not called, which is the function that would eventually remove the slave from its master, releasing the otherwise pseudo-leaked reference. Unconditionally calling nm_device_removed, even for real == FALSE devices, seems to fix the problem, but I'm not sure if that's the proper solution.
Created attachment 1264326 [details] [PATCH] manager: ensure proper disposal of unrealized devices Thank you for the detailed analysis. I can reproduce the leak on 1.4 and git master with this script: ip l add br1 type bridge for i in $(seq 1 1000); do echo $i; ip l add veth$i type veth peer name vethp$i ip l set veth$i up ip a a dev veth$i 9.9.9.9 ip l set veth$i master br1 ip l del veth$i done The attached patch against git master fixes the problem. It works for 1.4 too, but requires a (trivial) manual apply.
Created attachment 1264327 [details] [PATCH v2] manager: ensure proper disposal of unrealized devices Ignore the previous patch please.
Looks good to me
lgtm
Applied to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2e0c3d1dacfa06fad0062d272fc77ecc34ba4576 nm-1-6: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-6&id=427a3e5cff1bf852c17ef2b359676d037bd58f67 and nm-1-4: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-4&id=f0eb192d8c8fcb64b49476edf79f8769cfa225a7 upstream branches.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2299