Bug 1029213 - NetworkManager consumes 100% CPU printing 'add_object(): Netlink error: Invalid address for specified address family'
Summary: NetworkManager consumes 100% CPU printing 'add_object(): Netlink error: Inval...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 20
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1042402
TreeView+ depends on / blocked
 
Reported: 2013-11-11 22:10 UTC by Julian Stecklina
Modified: 2014-03-06 16:14 UTC (History)
13 users (show)

Fixed In Version: NetworkManager-0.9.9.0-22.git20131003.fc20
Clone Of:
: 1042402 (view as bug list)
Environment:
Last Closed: 2013-12-22 05:36:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Backtrace of networkmanager (13.79 KB, text/plain)
2013-11-11 22:10 UTC, Julian Stecklina
no flags Details
Log file (125.15 KB, text/plain)
2013-11-11 22:11 UTC, Julian Stecklina
no flags Details
Debug NM log (170.16 KB, application/x-xz)
2013-11-15 10:40 UTC, Simon Gerhards
no flags Details
Packet trace (25.79 KB, application/x-bzip)
2013-11-20 12:17 UTC, Julian Stecklina
no flags Details
Last 2000 log lines (354.39 KB, text/plain)
2013-12-15 21:55 UTC, Tore Anderson
no flags Details
same bug with OpenWrt trunk39114 (264.78 KB, text/x-log)
2013-12-17 16:29 UTC, dirkneukirchen
no flags Details
Diff of -20 vs -31 connection. (11.38 KB, patch)
2014-03-06 16:12 UTC, George Greer
no flags Details | Diff

Description Julian Stecklina 2013-11-11 22:10:38 UTC
Created attachment 822637 [details]
Backtrace of networkmanager

Description of problem:

NetworkManager starts to use 100% cpu and print
NetworkManager[959]: <error> [1384207571.670130] [platform/nm-linux-platform.c:1109] add_object(): Netlink error: Invalid address for specified address family
over and over into the system log.

I have attached backtraces of all threads in network manager and parts of the system log.

Might be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1020068

Version-Release number of selected component (if applicable):
kernel-3.11.7-300.fc20.x86_64
NetworkManager-0.9.9.0-15.git20131003.fc20.x86_64
NetworkManager-openvpn-0.9.8.2-3.fc20.x86_64
NetworkManager-vpnc-0.9.8.2-2.fc20.x86_64
NetworkManager-l2tp-0.9.8-4.fc20.x86_64
NetworkManager-glib-0.9.9.0-15.git20131003.fc20.i686
NetworkManager-glib-0.9.9.0-15.git20131003.fc20.x86_64
NetworkManager-pptp-0.9.8.2-3.fc20.x86_64
NetworkManager-openvpn-gnome-0.9.8.2-3.fc20.x86_64
NetworkManager-debuginfo-0.9.9.0-15.git20131003.fc20.x86_64
NetworkManager-openconnect-0.9.8.0-2.fc20.x86_64
NetworkManager-vpnc-gnome-0.9.8.2-2.fc20.x86_64
NetworkManager-pptp-gnome-0.9.8.2-3.fc20.x86_64


How reproducible:
Hard to say. It happens during normal use of the system.

Comment 1 Julian Stecklina 2013-11-11 22:11:20 UTC
Created attachment 822638 [details]
Log file

Comment 2 Julian Stecklina 2013-11-11 22:12:38 UTC
This bug has accumulated about 4G of log data during the last couple of days.

Comment 3 Simon Gerhards 2013-11-13 16:49:01 UTC
I'm hitting this bug too. For me it sometimes only repeats the log entry that Julian posted. But most of the time I get a constant repetition of the following lines:
"""
Nov 13 17:20:32 simon-desktop.local NetworkManager[407]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled...
Nov 13 17:20:32 simon-desktop.local NetworkManager[407]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started...
Nov 13 17:20:32 simon-desktop.local NetworkManager[407]: <error> [1384359632.637356] [platform/nm-linux-platform.c:1109] add_object(): Netlink error: Invalid address for specified address family
Nov 13 17:20:32 simon-desktop.local NetworkManager[407]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete.
"""

Comment 4 Dan Williams 2013-11-14 02:30:43 UTC
Can people experiencing this problem grab the following RPMs from:

http://people.redhat.com/dcbw/NetworkManager/fedora/netlink-error/

install them, and then see if you can reproduce the problem, and if so, what else gets printed to the logs.  These builds have additional debugging in them which should print out the address/route which cannot be added to the kernel.  This will help us debug the problem further.  Thanks!

Comment 5 Simon Gerhards 2013-11-14 09:24:16 UTC
After installing NetworkManager-0.9.9.0-16.git20131003.fc20.x86_64.rpm and NetworkManager-glib-0.9.9.0-16.git20131003.fc20.x86_64.rpm the log now repeats the following lines:
"""
Nov 14 10:15:46 simon-desktop.local NetworkManager[418]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled...
Nov 14 10:15:46 simon-desktop.local NetworkManager[418]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started...
Nov 14 10:15:46 simon-desktop.local NetworkManager[418]: <error> [1384420546.28308] [platform/nm-linux-platform.c:1113] add_object(): Netlink error: Invalid address for specified address family
Nov 14 10:15:46 simon-desktop.local NetworkManager[418]: :: inet6 dev 3 scope nowhere
Nov 14 10:15:46 simon-desktop.local NetworkManager[418]: valid-lifetime forever preferred-lifetime forever
Nov 14 10:15:46 simon-desktop.local NetworkManager[418]: [54B blob data]
Nov 14 10:15:46 simon-desktop.local NetworkManager[418]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete.
"""

Comment 6 Björn Esser (besser82) 2013-11-14 09:43:33 UTC
This happenes for me, too, when I'm using a dual-stack setup.  The logs show the exact output as pasted above.  When the repeated "Netlink error" appears in logs, there's no IPv6 connection anymore, but NM-applet shows a correctly assigned global IPv6-Adress.  IPv4 connections are not affected by this.

After toggling my connection with the hw-switch (or re-plugging the LAN-Cable) everything runs smooth again for some uncertain amount of time (30 mins to 2 h) and "the game" starts again...

Comment 7 Björn Esser (besser82) 2013-11-14 09:46:36 UTC
I forgot to mention: At the moment global IPv6-connection fails, NM starts to use some address from link-local/ULA fe80::-scope as default route, which actually blocks every WAN / LAN v6-traffic.

Comment 8 Björn Esser (besser82) 2013-11-14 10:48:24 UTC
This is an additional output on failure of the debugging pkgs previously mentioned:

Nov 14 11:45:51 p170hm NetworkManager[950]: nm_platform_ip6_address_add: assertion 'lifetime > 0' failed
Nov 14 11:45:51 p170hm rsyslogd-2177: imjournal: begin to drop messages due to rate-limiting

Comment 9 Dan Williams 2013-11-14 16:22:16 UTC
(In reply to Björn "besser82" Esser from comment #7)
> I forgot to mention: At the moment global IPv6-connection fails, NM starts
> to use some address from link-local/ULA fe80::-scope as default route, which
> actually blocks every WAN / LAN v6-traffic.

This may be your default IPv6 router.  Since the IPv6 router is on the same link, it's normal that your system would use the link-local address of the router as the default route.  Is there a chance you can check with wireshark and see where the IPv6 Router Advertisements come from, and if the source address of the Router Advertisements corresponds to the IPv6 address that NetworkManager uses as the default IPv6 gateway?

Comment 10 Björn Esser (besser82) 2013-11-14 17:25:07 UTC
No, the fe80::-scope isn't advertised by my IPv6-router.  My router advertises ULAs / link-local routes to fdbe:dead:beef:197::/64

But this fe80::-scope is applied by NM even if there is no IPv6-router present in the network...

Comment 11 Dan Williams 2013-11-14 20:19:41 UTC
Björn, would you mind running NM with more debugging?

nmcli g logging level debug domains device,ip4,ip6,dhcp4,dhcp6

and then lets try reconnecting the connection that causes the problem with:

nmcli con show active
<find the connection uuid or name>
nmcli dev disconnect em1
nmcli con up <uuid or name>

And you should get a lot more journal spew, including IPv6 stuff that's going to be very interesting to us.

Comment 12 Simon Gerhards 2013-11-15 10:40:18 UTC
Created attachment 824466 [details]
Debug NM log

I did the following (with output):
"""
# nmcli g logging level debug domains device,ip4,ip6,dhcp4,dhcp6
** (process:25329): CRITICAL **: nm_ip6_route_set_prefix: assertion 'prefix > 0' failed

# nmcli con show active
** (process:25483): CRITICAL **: nm_ip6_route_set_prefix: assertion 'prefix > 0' failed
NAME  UUID                                  DEVICES  DEFAULT  VPN  MASTER-PATH 
Auto  c90bccd1-3de6-41a7-823c-e03508f4221c  em1      yes      no   --          

# nmcli dev disconnect em1
** (process:25490): CRITICAL **: nm_ip6_route_set_prefix: assertion 'prefix > 0' failed

# nmcli con up c90bccd1-3de6-41a7-823c-e03508f4221c
** (process:25534): CRITICAL **: nm_ip6_route_set_prefix: assertion 'prefix > 0' failed
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/1)
"""

I waited until the bug got triggered and NM used 100 % CPU. Then I disconnected em1 and extracted the log approximately since i brought the connection up.

Comment 13 Dan Williams 2013-11-20 05:31:40 UTC
(In reply to Simon Gerhards from comment #12)
> Created attachment 824466 [details]
> Debug NM log

Bizarre, your IPv6 Router is sending you a default route:

Nov 15 11:29:09 simon-desktop.local NetworkManager[401]: <debug> [1384511349.912493] [rdisc/nm-rdisc.c:123] config_changed():   route ::/0 pref 3 exp 9360

which NetworkManager should completely ignore, but isn't currently doing.  Can anyone else confirm that their IPv6 router is also doing this?

Comment 14 Simon Gerhards 2013-11-20 10:06:56 UTC
For documentation puroses:
My router is an AVM "FRITZ!Box 6320 v2 Cable (um)" on firmware version "110.05.51". The "(um)" in the name indicates that it is running a custom firmware from my ISP (Unitymedia, Germany).

Comment 15 Julian Stecklina 2013-11-20 12:11:47 UTC
I have a Fritz!Box 7312 running firmware version 117.05.51. I'll try to reproduce the problem with tcpdump enabled.

Comment 16 Julian Stecklina 2013-11-20 12:17:44 UTC
Created attachment 826596 [details]
Packet trace

This is tcpdump output for icmpv6 traffic when network manager goes crazy. Network manager sends router solicitations in a loop!

Comment 17 Julian Stecklina 2013-11-20 12:23:44 UTC
And yes, my router also announces a ::/0 route.

Comment 18 Dan Williams 2013-11-20 20:25:08 UTC
commit 8586353b09460ec0a619058421743dd7d424a75d upstream now ignores the default route in router advertisements, like the kernel does.

Comment 19 Julian Stecklina 2013-11-27 14:08:49 UTC
Any idea whether this will be fixed until the Fedora 20 release?

Comment 20 Tore Anderson 2013-12-15 11:26:37 UTC
Got the same problems with NetworkManager-0.9.9.0-20.git20131003.fc20.x86_64. I applied commit 799477872394a5e473cb868e8c3ae2ba5fff7f50 ("core: ignore RA-provided default routes (rh #1029213)") from upstream (I couldn't find commit 8586353b09460ec0a619058421743dd7d424a75d), which appears to have solved the busy-looping and resulting CPU usage, but there are still some error messages printed to the syslog:

 Dec 15 12:22:43 sloth NetworkManager[663]: <info> Activation (wlo1) Stage 5 of 5 (IPv6 Commit) scheduled...
Dec 15 12:22:43 sloth NetworkManager[663]: <info> Activation (wlo1) Stage 5 of 5 (IPv6 Commit) started...
Dec 15 12:22:43 sloth NetworkManager[663]: <error> [1387106563.947091] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family
Dec 15 12:22:43 sloth NetworkManager[663]: <info> Activation (wlo1) Stage 5 of 5 (IPv6 Commit) complete.
Dec 15 12:22:43 sloth NetworkManager: :: inet6 dev 3 scope nowhere
Dec 15 12:22:43 sloth NetworkManager: valid-lifetime forever preferred-lifetime forever
Dec 15 12:22:43 sloth NetworkManager: created boot-time+p<AC>'^^D^? last-updated boot-time+p<AC>'^^D^?

My router is an AVM FritzBox, so it does advertise a bogus default route information option in RA just like is the case for the other commenters.

Tore

Comment 21 Tore Anderson 2013-12-15 21:55:58 UTC
Created attachment 837049 [details]
Last 2000 log lines

I was a bit quick. 799477872394a5e473cb868e8c3ae2ba5fff7f50 does not help, at least not fully. My computer is now in a tight loop sending out RSes and receiving RAs as quickly as it can.

I've attached the output from "journalctl -n 2000" after doing "nmcli g logging level debug". This is from NetworkManager-0.9.9.0-20.git20131003.fc20.x86_64 patched with 799477872394a5e473cb868e8c3ae2ba5fff7f50 from upstream git.

Tore

Comment 22 dirkneukirchen 2013-12-17 16:29:23 UTC
Created attachment 837769 [details]
same bug with OpenWrt trunk39114

I am probably observing the same bug here.

Fedora 20 runs inside a Virtualbox 4.3.x on Windows 7
Router is running OpenWrt (Barrier Breaker) trunk39114

attached a logfile

Comment 23 Julian Stecklina 2013-12-18 15:08:36 UTC
This also happens in our university wifi.

Comment 24 Dan Williams 2013-12-18 19:17:23 UTC
(In reply to Tore Anderson from comment #21)
> Created attachment 837049 [details]
> Last 2000 log lines
> 
> I was a bit quick. 799477872394a5e473cb868e8c3ae2ba5fff7f50 does not help,
> at least not fully. My computer is now in a tight loop sending out RSes and
> receiving RAs as quickly as it can.
> 
> I've attached the output from "journalctl -n 2000" after doing "nmcli g
> logging level debug". This is from
> NetworkManager-0.9.9.0-20.git20131003.fc20.x86_64 patched with
> 799477872394a5e473cb868e8c3ae2ba5fff7f50 from upstream git.

It looks like this is a different bug actually; the NM RA code is soliciting an RA even though it just processed one.  Perhaps the router doesn't set a minimum time between router advertisements it's sending out?

In any case, lets file this one separately since it's not related to the default route thing.

Comment 25 Fedora Update System 2013-12-18 20:16:33 UTC
NetworkManager-0.9.9.0-21.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-21.git20131003.fc20

Comment 26 Dan Williams 2013-12-18 22:40:58 UTC
(In reply to Tore Anderson from comment #21)
> Created attachment 837049 [details]
> Last 2000 log lines
> 
> I was a bit quick. 799477872394a5e473cb868e8c3ae2ba5fff7f50 does not help,
> at least not fully. My computer is now in a tight loop sending out RSes and
> receiving RAs as quickly as it can.
> 
> I've attached the output from "journalctl -n 2000" after doing "nmcli g
> logging level debug". This is from
> NetworkManager-0.9.9.0-20.git20131003.fc20.x86_64 patched with
> 799477872394a5e473cb868e8c3ae2ba5fff7f50 from upstream git.
> 
> Tore

Tore, I've filed https://bugzilla.redhat.com/show_bug.cgi?id=1044757 to track this issue.

Comment 27 Fedora Update System 2013-12-20 01:35:02 UTC
Package NetworkManager-0.9.9.0-21.git20131003.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-21.git20131003.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-23593/NetworkManager-0.9.9.0-21.git20131003.fc20
then log in and leave karma (feedback).

Comment 28 Fedora Update System 2013-12-20 03:03:16 UTC
NetworkManager-0.9.9.0-22.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-22.git20131003.fc20

Comment 29 Tore Anderson 2013-12-20 08:42:28 UTC
(In reply to Fedora Update System from comment #27)
> Package NetworkManager-0.9.9.0-21.git20131003.fc20:

I installed this one, but I still get the "Invalid address for specified address family" error message described in the bug title (not the 100% CPU usage though, but I haven't waited long enough to trigger the RS/RA loop due to RDNSS timer expiry bug anyway).

Every time an (unsolicited) RA is received, NM logs the following:

des. 20 09:26:58 sloth.fud.no NetworkManager[4523]: <info> Activation (wlo1) Stage 5 of 5 (IPv6 Commit) scheduled...
des. 20 09:26:58 sloth.fud.no NetworkManager[4523]: <info> Activation (wlo1) Stage 5 of 5 (IPv6 Commit) started...
des. 20 09:26:58 sloth.fud.no NetworkManager[4523]: <error> [1387528018.173284] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family
des. 20 09:26:58 sloth.fud.no NetworkManager[4523]: <info> Activation (wlo1) Stage 5 of 5 (IPv6 Commit) complete.
des. 20 09:26:58 sloth.fud.no NetworkManager[4523]: :: inet6 dev 3 scope nowhere
des. 20 09:26:58 sloth.fud.no NetworkManager[4523]: valid-lifetime forever preferred-lifetime forever
des. 20 09:26:58 sloth.fud.no NetworkManager[4523]: [54B blob data]

As an aside, I question the wisdom of printing this information to the log (any information really) every time an RA is received, at least if it was identical to the previous one and the only thing NM needs to do to act on it is to reset lifetime counters. In some environments RAs may be sent very frequently. For example, Juniper suggest a max advertisement interval of 4 seconds when configuring RAs in conjunction with VRRP (see http://www.juniper.net/techpubs/en_US/junos/topics/example/vrrp-for-ipv6-configuring-example.html). Over time this will cause the system log to contain an enormous amount of identical log lines relating to IPv6 activation, which will probably not be appreciated by the sysadmins.

Tore

Comment 30 Fedora Update System 2013-12-22 05:36:26 UTC
NetworkManager-0.9.9.0-22.git20131003.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 31 Geert Jansen 2013-12-24 09:12:05 UTC
The problem is not fixed with NetworkManager-0.9.9.0-22.git20131003.fc20

For me it is still spewing at 100% CPU to the log. Excerpt below:

Dec 24 10:10:12 barentz NetworkManager[29936]: <error> [1387876212.130833] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family
Dec 24 10:10:12 barentz NetworkManager[29936]: <error> [1387876212.167090] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family
Dec 24 10:10:12 barentz NetworkManager[29936]: <error> [1387876212.202990] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family

Comment 32 Geert Jansen 2013-12-24 09:13:33 UTC
Disabling IPv6 and restarting NetworkManager works around the issue.

Comment 33 Jonathan Abbey 2014-01-05 02:36:02 UTC
Going without IPv6 isn't much of a fix.  Disabling NetworkManager and leaving it disabled seems okay, though I have to manually ifup my interface on reboot now.

Comment 34 Jonathan Abbey 2014-01-05 03:27:59 UTC
Rather, I have to take the effort to configure my /etc/sysconfig/network-scripts/ifcfg-p20p1 config file better.

Comment 35 Dan Williams 2014-01-23 18:26:32 UTC
I have found one issue that prints this message, but only when DHCPv6 is used in information-only mode and the DHCP server does not provide a lease.  However, it does not result in 100% CPU usage, only the log mesage.

Geert, any chance you can run NetworkManager in debug mode for me?

sudo nmcli g logging level debug domains core,hw,ip6,dhcp6

Comment 36 George Greer 2014-03-06 16:12:50 UTC
Created attachment 871547 [details]
Diff of -20 vs -31 connection.

I downgraded back to NetworkManager-0.9.9.0-20.git20131003.fc20.x86_64 (which works for me) and ran a diff of their debugging output.

Oddly, both logs have the "Netlink error: Unspecific failure" message, but -20 doesn't error out the connection like -31 does.

The logs were normalized with this command so I could compare them:
perl -pe 's/\[\d+\]//; s/\[\d{10}\.\d{6}\]//; s/\.c:\d+\]/.c:~]/; s/0x[0-9a-f]{8,12}/0xdeadbeef/;' ...

Comment 37 George Greer 2014-03-06 16:14:58 UTC
Oops. I apologize; that attachment was meant for bug #1056053.


Note You need to log in before you can comment on or make changes to this bug.