Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
* NetworkManager could not receive notifications from kernel in case of huge changes to network configuration in quick succession such as changes to bridges that affect large number of ports. The configuration is now synchronized properly if kernel indicates the events have been missed. (BZ#1141256)
Sep 4 09:00:51 rose11 NetworkManager[682]: <error> [1409810451.326147] [platform/nm-linux-platform.c:3161] event_handler(): Failed to retrieve incoming events: Out of memory (-5)
which corresponds to:
int nle;
nle = nl_recvmsgs_default (priv->nlh_event);
if (nle < 0)
switch (nle) {
case -NLE_DUMP_INTR:
/* this most likely happens due to our request (RTM_GETADDR, AF_INET6, NLM_F_DUMP)
* to detect support for support_kernel_extended_ifa_flags. This is not critical
* and can happen easily. */
debug ("Uncritical failure to retrieve incoming events: %s (%d)", nl_geterror (nle), nle);
break;
default:
----> error ("Failed to retrieve incoming events: %s (%d)", nl_geterror (nle), nle);
break;
}
This system has ~180 network interfaces (it's an OVS system) so I can only assume there are a lot of messages going around. However, since:
MemTotal: 16238772 kB
MemFree: 322568 kB
MemAvailable: 4009156 kB
Buffers: 165368 kB
Cached: 4414100 kB
SwapCached: 6728 kB
there is apparently still a ton of free/cached memory, so my assumption right now is that libnl has some upper bound on internal buffers that it's using. NM is setting up the libnl socket buffer with 128K, which perhaps is not enough:
/* The default buffer size wasn't enough for the testsuites. It might just
* as well happen with NetworkManager itself. For now let's hope 128KB is
* good enough.
*/
nle = nl_socket_set_buffer_size (priv->nlh_event, 131072, 0);
Perhaps NM should adjust the libnl3 buffer size based on the amount of memory in the system, or perhaps better, if it notices that there are > 50 interfaces on the system increase the buffer size.
> Seems like 128k is not enough for systems with many interfaces. This adds 4096k
4k, not 4096k :)
>+ g_assert (!nle);
I wouldn't assert here; we don't know why that might fail. g_warning() or nm_log_warn() instead.
Dan,
Partner Stratus would like visibility on this bug, they are seeing this in their lab. They would like to follow the bug and contribute any relevant reproduction information. Let me know if you approve.
Thank you,
Travis
(In reply to Travis Gummels from comment #4)
> Dan,
>
> Partner Stratus would like visibility on this bug, they are seeing this in
> their lab. They would like to follow the bug and contribute any relevant
> reproduction information. Let me know if you approve.
>
> Thank you,
>
> Travis
This bug is now public.
Different fix via bug #1141266.
QA, here's how do you test it:
0.) Create a bridge
# ip link add bridge0 type bridge
1.) Create a large number of interfaces and enslave them
# for i in $(seq 0 1000); ip link add port$i type dummy; ip link set port$i master bridge0; done
2.) Delete a bridge (generates link change event for each port)
# ip link del bridge0
Now you should see <error> messages about out of memory conditions. You should check that NM recovered from it, this tool should generate empty output:
http://people.freedesktop.org/~lkundrak/nm-rtnl-diff.py
It would be awesome if you could integrate this into automated testing.
Thank you!
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHBA-2015-0311.html
Sep 4 09:00:51 rose11 NetworkManager[682]: <error> [1409810451.326147] [platform/nm-linux-platform.c:3161] event_handler(): Failed to retrieve incoming events: Out of memory (-5) which corresponds to: int nle; nle = nl_recvmsgs_default (priv->nlh_event); if (nle < 0) switch (nle) { case -NLE_DUMP_INTR: /* this most likely happens due to our request (RTM_GETADDR, AF_INET6, NLM_F_DUMP) * to detect support for support_kernel_extended_ifa_flags. This is not critical * and can happen easily. */ debug ("Uncritical failure to retrieve incoming events: %s (%d)", nl_geterror (nle), nle); break; default: ----> error ("Failed to retrieve incoming events: %s (%d)", nl_geterror (nle), nle); break; } This system has ~180 network interfaces (it's an OVS system) so I can only assume there are a lot of messages going around. However, since: MemTotal: 16238772 kB MemFree: 322568 kB MemAvailable: 4009156 kB Buffers: 165368 kB Cached: 4414100 kB SwapCached: 6728 kB there is apparently still a ton of free/cached memory, so my assumption right now is that libnl has some upper bound on internal buffers that it's using. NM is setting up the libnl socket buffer with 128K, which perhaps is not enough: /* The default buffer size wasn't enough for the testsuites. It might just * as well happen with NetworkManager itself. For now let's hope 128KB is * good enough. */ nle = nl_socket_set_buffer_size (priv->nlh_event, 131072, 0); Perhaps NM should adjust the libnl3 buffer size based on the amount of memory in the system, or perhaps better, if it notices that there are > 50 interfaces on the system increase the buffer size.