Bug 1063885 - NetworkManager does not assign an IP address after last F20 update
Summary: NetworkManager does not assign an IP address after last F20 update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-11 15:21 UTC by Russ
Modified: 2014-04-22 03:59 UTC (History)
13 users (show)

Fixed In Version: NetworkManager-0.9.9.0-38.git20131003.fc20
Clone Of:
Environment:
Last Closed: 2014-04-22 03:59:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
NetworkManager logfile with DEBUG enabled (331.37 KB, text/x-vhdl)
2014-02-16 19:41 UTC, Frank Danapfel
no flags Details
New NetWorkManager log file with DEBUG enabled (149.54 KB, application/zip)
2014-02-16 22:30 UTC, Frank Danapfel
no flags Details
NetworkManager log file from manual run (621.07 KB, text/plain)
2014-02-17 19:13 UTC, Frank Danapfel
no flags Details
NM log from armv7hl (kernel-3.4.79 + NM-0.9.9.0-29.git20140131.fc20 + libnl3-3.2.24) (7.13 KB, text/plain)
2014-02-19 11:06 UTC, J. Sastre
no flags Details
dist-git patch for libnl3 scratch-build, based on 53601fc4dd142bf39ffa529cb839fad94174e59b (2.84 KB, patch)
2014-02-21 19:05 UTC, Thomas Haller
no flags Details | Diff

Description Russ 2014-02-11 15:21:34 UTC
Description of problem:
The last F20 update has somehow caused NetworkManager dhclient to be broken on our laptop systems using wlan. 

Version-Release number of selected component (if applicable):
0.9.9.0-28

How reproducible:
Always.

Steps to Reproduce:
1. Upgrade F20 to latest packages.
2. Attempt to connect to wifi network.
3. Check for assigned IPv4 address with ifconfig.

Actual results:
No IPv4 address is configured.

Expected results:
An IPv4 address should be assigned.

Additional info:
IPv6 is disabled in NM. Running dhclient as root obtains an IP address and allows the systems to work normally. It appears the issue is a communication issue between NetworkManager/dhclient and the kernel. A downgrade of the NetworkManager and dhcp packages did not resolve the issue. A netlink error message is in the logs. I will attach a log and additional information in a subsequent post.

Comment 1 Russ 2014-02-11 16:26:23 UTC
Bug was actually caused by update of libnl3-3.2.21-2.fc20 to libnl3-3.2.24-1.fc20.

Downgrade to the previously installed Netlink3 (3.2.21) library resolved the issue

The offending package was located on an i686 system. A previous nl3 downgrade on x86_64 systems did not resolve the issue on them. But NetworkManager/dhclient packages were also downgraded beforehand on those systems. Will try an upgrade on those systems, and then a downgrade of libnl3. Will report if that succeeds there.

Comment 2 Russ 2014-02-11 16:45:48 UTC
A downgrade to libnl3-3.2.21-2.fc20 on the x86_64 systems resolved the issue on them also. Please change bug report to reflect a libnl3 issue.

The pertinent lines in the system logs are as follows :

NetworkManager[2058]: <error> [1392136334.898755] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid input data or parameter

NetworkManager[2058]: <error> [1392136334.948568] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Unspecific failure

The package downgrade of libnl3 removes those lines from the logs.

libnl3-3.2.24-1.fc20 needs to be removed from the repos ASAP!

Comment 3 Thomas Haller 2014-02-12 15:53:25 UTC
Could you please provide more logs? With debug logging enabled?

You enable debug logging in the config file, see `man NetworkManager.conf`:

[logging]
    level=DEBUG
    domains=ALL

Comment 4 Dr. Tilmann Bubeck 2014-02-15 11:31:55 UTC
I can confirm this bug on arm, too (Cubietruck). Downgrading libnl3-3.2.24-1.fc20 fixed the issue.

Comment 5 Dr. Tilmann Bubeck 2014-02-15 11:45:39 UTC
In reply to Thomas Haller from comment #3)
> Could you please provide more logs? With debug logging enabled?
> 
> You enable debug logging in the config file, see `man NetworkManager.conf`:
> 
> [logging]
>     level=DEBUG
>     domains=ALL

I read the man page and entered this into /etc/NetworkManager/NetworkManager.conf but I am unable to find the log. What should I sent?

Comment 6 Thomas Haller 2014-02-15 11:54:06 UTC
(In reply to Dr. Tilmann Bubeck from comment #5)
> In reply to Thomas Haller from comment #3)
> > Could you please provide more logs? With debug logging enabled?
> > 
> > You enable debug logging in the config file, see `man NetworkManager.conf`:
> > 
> > [logging]
> >     level=DEBUG
> >     domains=ALL
> 
> I read the man page and entered this into
> /etc/NetworkManager/NetworkManager.conf but I am unable to find the log.
> What should I sent?

After you change the config, you have to restart NetworkManager.

On Fedora you do this with `systemctl restart NetworkManager.service`


Then reproduce the issue. You can find the logfile (on Fedora) in the journal.

Try

journalctl _SYSTEMD_UNIT=NetworkManager.service -b



You can redirect output to a file:

journalctl _SYSTEMD_UNIT=NetworkManager.service -b > ~/nm-logfile

and attach the file ~/nm-logfile


Downgrade libnl3 might help, but the bug is probably in NetworkManager.


Thank you!

Comment 7 Frank Danapfel 2014-02-16 19:41:37 UTC
Created attachment 863819 [details]
NetworkManager logfile with DEBUG  enabled

I can confirm this bug on ARM (Odroid X2, I know not an officially supported platform) as well.

The NM logfile with DEBUG enabled is attached.

Comment 8 Thomas Haller 2014-02-16 20:08:00 UTC
(In reply to Frank Danapfel from comment #7)

Thank you Frank... though I still don't understand what the problem is :(

Could somebody with this problem please test the new version NetworkManager-0.9.9.0-30.git20131003.fc20?

https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-30.git20131003.fc20

It has some added debug logging, that might help to identify the problem.


Thank you!

Comment 9 Frank Danapfel 2014-02-16 22:30:52 UTC
Created attachment 863893 [details]
New NetWorkManager log file with DEBUG enabled

Thomas, here is the full Network manager log file with DEBUG enabled from before and after I upgraded NM to NetworkManager-glib-0.9.9.0-30.git20131003 (since the log had grown to 1.8MB I zipped it). I've also kept libnl3 at version 3.2.24-1:

$ rpm -qa|grep -i libnl3
libnl3-cli-3.2.24-1.fc20.armv7hl
libnl3-3.2.24-1.fc20.armv7hl

$ rpm -qa|grep -i networkmanager
NetworkManager-glib-0.9.9.0-30.git20131003.fc20.armv7hl
NetworkManager-0.9.9.0-30.git20131003.fc20.armv7hl

$ cat /etc/NetworkManager/NetworkManager.conf
[main]
plugins=ifcfg-rh
[logging]
    level=DEBUG
    domains=ALL

The log with the NetworkManager-glib-0.9.9.0-30.git20131003 package starts at:
Feb 16 23:10:43 odroid NetworkManager[23057]: <info> NetworkManager (version 0.9.9.0-30.git20131003.fc20) is starting.

Unfortunately I'm still seeing the same issue with the new NM package.

Comment 10 Thomas Haller 2014-02-17 16:24:21 UTC
Hi Frank,

I still cannot reproduce this issue (or know what the problem is).


Could you please provide another logfile? But this time run NetworkManager in the terminal. Please do the following steps:



systemctl mask NetworkManager.service
systemctl stop NetworkManager.service

# enable debugging output for libnl
export NLDBG=10

NetworkManager --debug --log-level=DEBUG --log-domains=ALL 2>&1 | tee /tmp/nm-log.txt

#>> reproduce the error

# Afterwards, kill NetworkManager (CTRL+C) and undo your changes:

systemctl unmask NetworkManager.service
systemctl restart NetworkManager


Attach the logfile /tmp/nm-log.txt

Thank you!!

Comment 11 Frank Danapfel 2014-02-17 19:13:39 UTC
Created attachment 864249 [details]
NetworkManager log file from manual run

Thomas, as requested here is the log file from manually running NetworkManager.

I used the same versions of both NetworkManager and libnl3 as before.

Comment 12 J. Sastre 2014-02-18 15:24:20 UTC
(In reply to Thomas Haller from comment #10)
> Hi Frank,
> 
> I still cannot reproduce this issue (or know what the problem is).
> 
> 
> Could you please provide another logfile? But this time run NetworkManager
> in the terminal. Please do the following steps:
> 
> 
> 
> systemctl mask NetworkManager.service
> systemctl stop NetworkManager.service
> 
> # enable debugging output for libnl
> export NLDBG=10
> 
> NetworkManager --debug --log-level=DEBUG --log-domains=ALL 2>&1 | tee
> /tmp/nm-log.txt
> 
> #>> reproduce the error
> 
> # Afterwards, kill NetworkManager (CTRL+C) and undo your changes:
> 
> systemctl unmask NetworkManager.service
> systemctl restart NetworkManager
> 
> 
> Attach the logfile /tmp/nm-log.txt
> 
> Thank you!!

Same issue here. To reproduce it the IP assignment has to be configured as static.

Comment 13 Thomas Haller 2014-02-18 16:19:52 UTC
(In reply to Jonatan Sastre Hernández from comment #12)

What is your system? Which kernel?

So, it also happens for you with static IP addresses? Can I assume, that it is unrelated to DHCP, and it happens basically always when NM tries to configure an IP address (be it static or DHCP)?

Does it also happen with IPv6 autoconf? Or IPv6 static?

Comment 14 J. Sastre 2014-02-18 18:10:04 UTC
(In reply to Thomas Haller from comment #13)
> (In reply to Jonatan Sastre Hernández from comment #12)
> 
> What is your system? Which kernel?
> 
> So, it also happens for you with static IP addresses? Can I assume, that it
> is unrelated to DHCP, and it happens basically always when NM tries to
> configure an IP address (be it static or DHCP)?
> 
> Does it also happen with IPv6 autoconf? Or IPv6 static?

Unable to reproduce it again on a x86_64 machine. This must be the Heisenbug thing ;)

Recent kernel 3.13 update has been pushed to stable and likely libnl3-3.2.24-1 and the new kernel are working together fine now.

My ARM machine is using a 3.4 kernel and is not updated. The problem still persists here so it may be related with the kernel version and libnl3, not NetworkManager (the problem appeared no matter what version of NetworkManager were running).

By the way 0.9.9.0-29.git20140131.fc20 produces no warning nor error in the logs, as opposed to git20140131.

Comment 15 J. Sastre 2014-02-18 18:16:38 UTC
amendment to my previous message:

20c20
< By the way 0.9.9.0-29.git20140131.fc20 produces no warning nor error in the logs, as opposed to git20140131.
---
> By the way 0.9.9.0-29.git20140131.fc20 produces no warning nor error in the logs, as opposed to git20131003.

Comment 16 Russ 2014-02-19 00:30:48 UTC
Thomas,

Sorry I could not reply earlier. I am currently travelling and often do not have access to the net. I also do not currently have access to the machines in question. So I can't do much to help you with logs.

I would like to point out, however, that all of the three machines in question were running older kernels at the time. Therefore I would assume that, considering the comments above, that the issue is caused by a communication problem between the older Kernels and libnl3, as Jonatan seems to have surmised.

Since the problem seems to be an issue with the new libnl3 and older kernels I can imagine some people would prefer just to dismiss it with a WILL NOT FIX and advice to upgrade to a newer kernel. In the past I would have tended to agree, but Fedora now has armhfp as a primary arch. That changes things. I work a lot with ARM systems and I can say from a lot of ARM experience that is is very often not possible to update the kernel to the latest release. Therefore any updated packages should be compatible with an older kernel, or a workaround/fix for the issue somehow needs to be VERY prominently noted so people do not inadvertently break their systems, as is the case with what happened here. Therefore changes which break things when using an older kernel should only occur in a next Fedora release, where it can be duly noted in the release notes.

In any case, as I said before, the problem is with libnl3, not NetworkManager.

Comment 17 Russ 2014-02-19 00:59:15 UTC
I checked with our guys back at the office and one of the laptops in question has recently had the kernel updated to 3.13. I asked them to also do a package update of libnl3 and reboot. The issue does not occur with newer kernel. So it definitely is an issue between libnl3 and older kernels. No logs needed for that.

Previous kernel on that machine was 3.7.1. So the Netlink communication issue definitely occurs with kernels <=3.7.1. Many ARM platforms are still stuck on 3.4 kernels. Therefore an issue like this can cause a lot of problems.

I'll try to get more info, including logs, if possible. In the meantime, to reproduce the issue, just downgrade the kernel to <=3.7.1. Then slowly upgrade to see which kernel resolves the issue. A diff of the libnl3 changes would also help to determine where the problem is located.

Comment 18 Thomas Haller 2014-02-19 08:27:52 UTC
(In reply to Russ from comment #17)

As you said, it is very much intended that NM <-> libnl3 <-> kernel can work together with arbitrary versions (within reasonable limits). So, this problem should be fixed in any case regardless who is the culprit.

I could not reproduce it until now (also tried on a VM with fc20-armv7hl). Will now try with an older kernel there...

The error in any case seams to be, that NM cannot add the IPv4 address using libnl3. The error later, about not being able to add the route, is probably just a follow up error of the previous because you cannot add gateway routes if you don't have the proper addresses configured.

So, if you guys have this problem, does it happen ~always~ or just sometimes?
And does it only affect IPv4 or IPv6 too? And I assume, it happens regardless of DHCP or static configuration?

And if I see right, it only happened on 32bit systems (any affected x86_64?)

In general, please state also the kernel+architecture, libnl3 version and NM version. Thanks

Comment 19 J. Sastre 2014-02-19 11:06:44 UTC
Created attachment 865026 [details]
NM log from armv7hl (kernel-3.4.79 + NM-0.9.9.0-29.git20140131.fc20 + libnl3-3.2.24)

armv7hl
-------

kernel-3.4.79 + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.21 : Functional, better with NM-*-git20140131


kernel-3.4.79 + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.24 : Persistent issue, IPv4 not assigned. DCHP works well. (an IPv6 address assigned but not tested, seems ok)


x86_64
------

kernel-3.12-?? + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.21 : Functional (some warnings in the logs), better with NM-*-git20140131


kernel-3.12-?? + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.24 : The problem has been reproduced here too, apparently persistent but not well tested (libnl3 was downgraded as a workaround)

kernel-3.13-3 + NetworkManager-0.9.9.0-{28,30} + libnl3-3.2.24 : Functional, still some errors/warnings (NM-*-git20140131 may resolve this)

Comment 20 Frank Danapfel 2014-02-19 20:17:36 UTC
The ARM system I've noticed this issue on is also running an older Kernel version (3.8.13.16), and I can reproduce this issue any time when  libnl3-3.2.24-1.fc20 is installed, but not when libnl3-3.2.21-2.fc20 is on the system. Unfortunately I'm stuck on this older kernel on this system since the kenel patches to support this platform (ODROID X2) have not made it in the upstream kernel yet.

Comment 21 Thomas Haller 2014-02-21 12:17:39 UTC
I'm still on this. Now I seem to be able to reproduce it, with kernel 3.6.10-8.fc18.armv7hlm, libnl3-upstream and NM-upstream... now bisecting libnl3... (gosh, this VM is so slow//)

Comment 22 Thomas Haller 2014-02-21 17:18:33 UTC
The offending line in libnl3 is
https://github.com/thom311/libnl/blob/master/lib/route/addr.c#L601
from commit
https://github.com/thom311/libnl/commit/42c41336000e1ff781a91c6ec397fd787aae3124

In that case, rtnl_addr_add() returns -7 (Invalid input data or parameter)... which would be NLE_INVAL, which can be one of ENOPROTOOPT,EFAULT,EINVAL.


It fails for me with kernel "kernel-3.6.10-8.fc18.armv7hl.rpm"
... *why* that happens is unclear... it seems not to happen with 3.11.0-300.fc20.armv7hl or on my fc20.x64_86

Comment 23 Thomas Haller 2014-02-21 19:05:09 UTC
Created attachment 866200 [details]
dist-git patch for libnl3 scratch-build, based on 53601fc4dd142bf39ffa529cb839fad94174e59b

I made a scratch-build of libnl, which removes the offending line.

http://koji.fedoraproject.org/koji/taskinfo?taskID=6557421

Could the affected people please verify that with libnl3-3.2.24-2.test01 this problem no longer happens? Thanks.



(note that with libnl3-3.2.24-2.test01 NetworkManager the bugs #1047139 and #1045118 are again unresolved -- but they are also unresolved if you run pre-libnl3-3.2.24 versions).

Comment 24 J. Sastre 2014-02-21 20:00:52 UTC
It works.


Still there are some warnings from NetworkMamager but not related with libnl3. Any chance that NM-0.9.9.0-29.git20140131 be pushed to stable again?

Comment 25 Frank Danapfel 2014-02-23 12:13:19 UTC
libnl3-3.2.24-2.test01 works for me as well:

[odroid]$ uname -r
3.8.13.16

[odroid]$ rpm -qa|grep -i libnl3
libnl3-3.2.24-2.test01.fc20.armv7hl
libnl3-cli-3.2.24-2.test01.fc20.armv7hl

[odroid]$ rpm -qa|grep -i NetworkManager
NetworkManager-glib-0.9.9.0-28.git20131003.fc20.armv7hl
NetworkManager-0.9.9.0-28.git20131003.fc20.armv7hl

[odroid]$ systemctl status NetworkManager.service
NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled)
   Active: active (running) since Sun 2014-02-23 13:04:57 CET; 6min ago
 Main PID: 3953 (NetworkManager)
   CGroup: /system.slice/NetworkManager.service
           ├─3953 /usr/sbin/NetworkManager --no-daemon
           └─4008 /sbin/dhclient -d -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-wlan0.pid -lf /var/lib/NetworkManager/dhclient-43a6b2ef-d206-4e9f-941d-6c74d35a6424-wlan0.lease -cf /var/lib/NetworkManager/dhclient-wlan0.conf...

Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) Stage 5 of 5 (IPv4 Configure Commit) scheduled...
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) Stage 5 of 5 (IPv4 Commit) started...
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> (wlan0): device state change: ip-config -> ip-check (reason 'none') [70 80 0]
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) Stage 5 of 5 (IPv4 Commit) complete.
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> (wlan0): device state change: ip-check -> secondaries (reason 'none') [80 90 0]
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> (wlan0): device state change: secondaries -> activated (reason 'none') [90 100 0]
Feb 23 13:05:19 odroid NetworkManager[3953]: bound to 192.168.1.102 -- renewal in 369734 seconds.
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> NetworkManager state is now CONNECTED_GLOBAL
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Policy set 'WLAN' (wlan0) as default for IPv4 routing and DNS.
Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) successful, device activated.

[odroid]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 32:2d:8d:51:a0:85 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether e8:4e:06:0a:99:be brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.102/24 brd 192.168.1.255 scope global wlan0
    inet6 fe80::ea4e:6ff:fe0a:99be/64 scope link 
       valid_lft forever preferred_lft forever

Comment 26 Hugh Sutherland 2014-02-23 17:15:21 UTC
I can confirm the results Frank Danapfel has reported for OdroidX2; same hardware, same kernel and library versions, same everything. libnl3-3.2.24-2.test01.fc20.armv7hl and libnl3-cli-3.2.24-2.test01.fc20.armv7hl fix the problem for me as well.

[root@localhost odroid]# uname -a
Linux localhost 3.8.13.16 #1 SMP PREEMPT Sat Feb 8 17:52:39 BRST 2014 armv7l armv7l armv7l GNU/Linux

The only thing I noticed that hasn't been mentioned is router solicitation failures. I assume these are not related. My router is an old 2Wire 1701HG DSL modem (Software 3.17.5).

Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263058] [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation
Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263212] [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router solicitation retry in 10 seconds.
Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263067] [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation
Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263211] [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router solicitation retry in 10 seconds.

Those messages occur both with libnl3-3.2.24.1 and libnl3-3.2.24-2.test01.

Comment 27 Thomas Haller 2014-02-23 17:54:40 UTC
(In reply to Hugh Sutherland from comment #26)

> The only thing I noticed that hasn't been mentioned is router solicitation
> failures. I assume these are not related. My router is an old 2Wire 1701HG
> DSL modem (Software 3.17.5).
> 
> Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263058]
> [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation
> Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263212]
> [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router
> solicitation retry in 10 seconds.
> Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263067]
> [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation
> Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263211]
> [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router
> solicitation retry in 10 seconds.

That looks like normal debugging output and does not indicate any failure to me. Do you have problems with SLAAC? (please consider opening a separate BZ).

NM sends every 10 seconds a router solicitation and logs these lines while doing so.

Comment 28 Hugh Sutherland 2014-02-23 22:50:46 UTC
Thanks Thomas. Apologies for the wasted bandwidth. (No problems with SLAAC to my knowledge.)

Comment 29 Thomas Haller 2014-03-13 14:37:56 UTC
The following upstream issue of libnl3 seems to be the same issue: https://github.com/thom311/libnl/issues/56

Comment 30 Niels de Vos 2014-03-25 17:20:40 UTC
I'm hitting this too. Running on a armv7hl with 3.4.6 kernel (no functional support for imx51 in newer kernels).

I had to rebuild the test-packages (thanks for the attached patch) because koji trashed the scratch build. Using the current NetworkManager and the test packages, my system gets its IPv4 and IPv6 address again.

Comment 31 Pavel Kankovsky 2014-04-01 09:52:44 UTC
Same here on F20 with kernel 3.4.75 for sunxi (https://github.com/jwrdegoede/linux-sunxi/tree/fedora-20-07022014). Let's call it the "old kernel".

The old kernel does not implement IFA_FLAGS and it seems to freak out and return EINVAL when it gets an unknown attribute (such as IFA_FLAGS or a completely bogus attr #99).

On the other hand, 3.11.10-301.fc20.x86-64, let's call it the "new kernel", understands IFA_FLAGS *and* disregards any unknown attributes (like the aforementioned 99).

Tested with "src/.libs/nl-addr-add -d wlan0 --family=inet --broadcast=192.168.11.255 -a 192.168.11.11/24 192.168.11.11" or a similar command.

I have to admit I am confused by the difference in the handling of unknown attributes because I have failed to find any significant difference between their code so far and it appears to me both versions are supposed to ignore anything they do not recognize (see rtm_to_ifaddr() in net/ipv4/devinet.c and nla_parse() in lib/nlattr.c). But that behaviour makes me somewhat nervous. Is it really desired that the kernel *silently* ignores any attributes it does not recognize?

Anyway, I think you can both eat the cake (make libnl3 able to work on older kernels) and keep it (preserve its ability to use 32-bit flags when they are needed and when the kernel supports them) if you change

        NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags);

in build_addr_msg() to

        if (tmpl->a_flags & ~0xff)
                NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags);

A more sophisticated approach would detect whether the kernel supports IFA_FLAGS and use the result to make the decision.

Comment 32 Thomas Haller 2014-04-03 15:39:03 UTC
(In reply to Pavel Kankovsky from comment #31)
> Same here on F20 with kernel 3.4.75 for sunxi
> (https://github.com/jwrdegoede/linux-sunxi/tree/fedora-20-07022014). Let's
> call it the "old kernel".
> 
> The old kernel does not implement IFA_FLAGS and it seems to freak out and
> return EINVAL when it gets an unknown attribute (such as IFA_FLAGS or a
> completely bogus attr #99).
> 
> On the other hand, 3.11.10-301.fc20.x86-64, let's call it the "new kernel",
> understands IFA_FLAGS *and* disregards any unknown attributes (like the
> aforementioned 99).
> 
> Tested with "src/.libs/nl-addr-add -d wlan0 --family=inet
> --broadcast=192.168.11.255 -a 192.168.11.11/24 192.168.11.11" or a similar
> command.
> 
> I have to admit I am confused by the difference in the handling of unknown
> attributes because I have failed to find any significant difference between
> their code so far and it appears to me both versions are supposed to ignore
> anything they do not recognize (see rtm_to_ifaddr() in net/ipv4/devinet.c
> and nla_parse() in lib/nlattr.c). But that behaviour makes me somewhat
> nervous. Is it really desired that the kernel *silently* ignores any
> attributes it does not recognize?

I think the kernel should very much ignore unknown attributes, because that is what makes the netlink protocol extendible. And the problem seems to be that the kernel indeed does not ignore the unknown attribute.

So, this looks very much like a kernel bug to me. Of course libnl should try to play nice.

But I did not track this bug down yet. (Open for suggestions.)


> Anyway, I think you can both eat the cake (make libnl3 able to work on older
> kernels) and keep it (preserve its ability to use 32-bit flags when they are
> needed and when the kernel supports them) if you change
> 
>         NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags);
> 
> in build_addr_msg() to
> 
>         if (tmpl->a_flags & ~0xff)
>                 NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags);
> 
> A more sophisticated approach would detect whether the kernel supports
> IFA_FLAGS and use the result to make the decision.


I think this is a good workaround which will help in many cases. It will still brack, if a new application uses libnl and tries to set extended flags... 

I will send a patch to the libnl mailing list for that...

Comment 33 Pavel Kankovsky 2014-04-03 21:43:59 UTC
(In reply to Thomas Haller from comment #32)
> I think the kernel should very much ignore unknown attributes, because that
> is what makes the netlink protocol extendible. And the problem seems to be
> that the kernel indeed does not ignore the unknown attribute.

IMHO, it should be up to the program to declare whether a certain attribute is optional and can be ignored or important and the operation should fail if the kernel does not implement it. Cf. the critical flag in X.509. But I guess we are already stuck with ABI that would make it difficult to pass such a flag.

> So, this looks very much like a kernel bug to me. [...]
> But I did not track this bug down yet. (Open for suggestions.)

I have used ftrace to find out that inet_rtm_newaddr() is not called rtnetlink_rcv_msg() when the old kernel encounters an unknown attribute. Ftrace rules!

It seems rtnetlink_rcv_msg() itself used refuse any message containing an unknown attribute to but the check was removed approximately a year ago:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/rtnetlink.c?id=661d2967b3f1b34eeaa7e212e7b9bbe8ee072b59

The nature of the patch makes me wonder whether it was a really Thomas Graf's intention to make the kernel silently ignore unknown attributes...

Comment 34 Thomas Haller 2014-04-04 14:02:42 UTC
(In reply to Pavel Kankovsky from comment #33)
> (In reply to Thomas Haller from comment #32)
> > I think the kernel should very much ignore unknown attributes, because that
> > is what makes the netlink protocol extendible. And the problem seems to be
> > that the kernel indeed does not ignore the unknown attribute.
> 
> IMHO, it should be up to the program to declare whether a certain attribute
> is optional and can be ignored or important and the operation should fail if
> the kernel does not implement it. Cf. the critical flag in X.509. But I
> guess we are already stuck with ABI that would make it difficult to pass
> such a flag.

I don't agree. The protocol is precisely extendible, because older kernels (should) ignore unknown parts. Of course the protocol must be extended in a manner, that it ignoring the unknown parts doesn't cause problems.


> > So, this looks very much like a kernel bug to me. [...]
> > But I did not track this bug down yet. (Open for suggestions.)
> 
> I have used ftrace to find out that inet_rtm_newaddr() is not called
> rtnetlink_rcv_msg() when the old kernel encounters an unknown attribute.
> Ftrace rules!
> 
> It seems rtnetlink_rcv_msg() itself used refuse any message containing an
> unknown attribute to but the check was removed approximately a year ago:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/
> core/rtnetlink.c?id=661d2967b3f1b34eeaa7e212e7b9bbe8ee072b59
> 
> The nature of the patch makes me wonder whether it was a really Thomas
> Graf's intention to make the kernel silently ignore unknown attributes...

Awesome, thanks for tracking this down.




I sent a new patch to the libnl mailing list with your workaround.
Once it get's merged, I will patch the libnl-f20 package.

Also, NetworkManger must be fixed not to set these additional flags if the kernel does not support it.

Comment 35 Thomas Haller 2014-04-04 14:39:09 UTC
libnl workaround pushed to upstream as:
https://github.com/thom311/libnl/commit/5206c050504f8676a24854519b9c351470fb7cc6

Comment 36 Thomas Haller 2014-04-04 14:47:52 UTC
Pushed NetworkManager patch for review:

th/rh1063885_libnl_workaround_for_older_kernel

Comment 37 Dan Williams 2014-04-04 17:21:23 UTC
Can you put some () around the "flags & ~0xFF" bit?  Makes it a bit clearer.

Other than that, patch looks good.

Comment 39 Fedora Update System 2014-04-04 20:30:46 UTC
libnl3-3.2.24-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/libnl3-3.2.24-2.fc20

Comment 40 Fedora Update System 2014-04-08 20:12:37 UTC
NetworkManager-0.9.9.0-34.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-34.git20131003.fc20

Comment 41 Fedora Update System 2014-04-09 13:24:05 UTC
Package NetworkManager-0.9.9.0-34.git20131003.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-34.git20131003.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-4964/NetworkManager-0.9.9.0-34.git20131003.fc20
then log in and leave karma (feedback).

Comment 42 Ole Dalgaard 2014-04-09 19:47:00 UTC
Works for me with libnl3-3.2.24-2.fc20 and NetworkManager-0.9.9.0-34.git20131003.fc20 from updates-testing on a Odroid U2. Thanks!

Comment 43 Fedora Update System 2014-04-10 09:28:51 UTC
NetworkManager-0.9.9.0-35.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-35.git20131003.fc20

Comment 44 Fedora Update System 2014-04-14 14:49:06 UTC
NetworkManager-0.9.9.0-36.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-36.git20131003.fc20

Comment 45 Fedora Update System 2014-04-15 11:57:20 UTC
NetworkManager-0.9.9.0-37.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-37.git20131003.fc20

Comment 46 Fedora Update System 2014-04-15 15:41:20 UTC
libnl3-3.2.24-2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 47 Fedora Update System 2014-04-17 16:41:48 UTC
NetworkManager-0.9.9.0-38.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-38.git20131003.fc20

Comment 48 Fedora Update System 2014-04-22 03:59:39 UTC
NetworkManager-0.9.9.0-38.git20131003.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.