Description of problem: The last F20 update has somehow caused NetworkManager dhclient to be broken on our laptop systems using wlan. Version-Release number of selected component (if applicable): 0.9.9.0-28 How reproducible: Always. Steps to Reproduce: 1. Upgrade F20 to latest packages. 2. Attempt to connect to wifi network. 3. Check for assigned IPv4 address with ifconfig. Actual results: No IPv4 address is configured. Expected results: An IPv4 address should be assigned. Additional info: IPv6 is disabled in NM. Running dhclient as root obtains an IP address and allows the systems to work normally. It appears the issue is a communication issue between NetworkManager/dhclient and the kernel. A downgrade of the NetworkManager and dhcp packages did not resolve the issue. A netlink error message is in the logs. I will attach a log and additional information in a subsequent post.
Bug was actually caused by update of libnl3-3.2.21-2.fc20 to libnl3-3.2.24-1.fc20. Downgrade to the previously installed Netlink3 (3.2.21) library resolved the issue The offending package was located on an i686 system. A previous nl3 downgrade on x86_64 systems did not resolve the issue on them. But NetworkManager/dhclient packages were also downgraded beforehand on those systems. Will try an upgrade on those systems, and then a downgrade of libnl3. Will report if that succeeds there.
A downgrade to libnl3-3.2.21-2.fc20 on the x86_64 systems resolved the issue on them also. Please change bug report to reflect a libnl3 issue. The pertinent lines in the system logs are as follows : NetworkManager[2058]: <error> [1392136334.898755] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid input data or parameter NetworkManager[2058]: <error> [1392136334.948568] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Unspecific failure The package downgrade of libnl3 removes those lines from the logs. libnl3-3.2.24-1.fc20 needs to be removed from the repos ASAP!
Could you please provide more logs? With debug logging enabled? You enable debug logging in the config file, see `man NetworkManager.conf`: [logging] level=DEBUG domains=ALL
I can confirm this bug on arm, too (Cubietruck). Downgrading libnl3-3.2.24-1.fc20 fixed the issue.
In reply to Thomas Haller from comment #3) > Could you please provide more logs? With debug logging enabled? > > You enable debug logging in the config file, see `man NetworkManager.conf`: > > [logging] > level=DEBUG > domains=ALL I read the man page and entered this into /etc/NetworkManager/NetworkManager.conf but I am unable to find the log. What should I sent?
(In reply to Dr. Tilmann Bubeck from comment #5) > In reply to Thomas Haller from comment #3) > > Could you please provide more logs? With debug logging enabled? > > > > You enable debug logging in the config file, see `man NetworkManager.conf`: > > > > [logging] > > level=DEBUG > > domains=ALL > > I read the man page and entered this into > /etc/NetworkManager/NetworkManager.conf but I am unable to find the log. > What should I sent? After you change the config, you have to restart NetworkManager. On Fedora you do this with `systemctl restart NetworkManager.service` Then reproduce the issue. You can find the logfile (on Fedora) in the journal. Try journalctl _SYSTEMD_UNIT=NetworkManager.service -b You can redirect output to a file: journalctl _SYSTEMD_UNIT=NetworkManager.service -b > ~/nm-logfile and attach the file ~/nm-logfile Downgrade libnl3 might help, but the bug is probably in NetworkManager. Thank you!
Created attachment 863819 [details] NetworkManager logfile with DEBUG enabled I can confirm this bug on ARM (Odroid X2, I know not an officially supported platform) as well. The NM logfile with DEBUG enabled is attached.
(In reply to Frank Danapfel from comment #7) Thank you Frank... though I still don't understand what the problem is :( Could somebody with this problem please test the new version NetworkManager-0.9.9.0-30.git20131003.fc20? https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-30.git20131003.fc20 It has some added debug logging, that might help to identify the problem. Thank you!
Created attachment 863893 [details] New NetWorkManager log file with DEBUG enabled Thomas, here is the full Network manager log file with DEBUG enabled from before and after I upgraded NM to NetworkManager-glib-0.9.9.0-30.git20131003 (since the log had grown to 1.8MB I zipped it). I've also kept libnl3 at version 3.2.24-1: $ rpm -qa|grep -i libnl3 libnl3-cli-3.2.24-1.fc20.armv7hl libnl3-3.2.24-1.fc20.armv7hl $ rpm -qa|grep -i networkmanager NetworkManager-glib-0.9.9.0-30.git20131003.fc20.armv7hl NetworkManager-0.9.9.0-30.git20131003.fc20.armv7hl $ cat /etc/NetworkManager/NetworkManager.conf [main] plugins=ifcfg-rh [logging] level=DEBUG domains=ALL The log with the NetworkManager-glib-0.9.9.0-30.git20131003 package starts at: Feb 16 23:10:43 odroid NetworkManager[23057]: <info> NetworkManager (version 0.9.9.0-30.git20131003.fc20) is starting. Unfortunately I'm still seeing the same issue with the new NM package.
Hi Frank, I still cannot reproduce this issue (or know what the problem is). Could you please provide another logfile? But this time run NetworkManager in the terminal. Please do the following steps: systemctl mask NetworkManager.service systemctl stop NetworkManager.service # enable debugging output for libnl export NLDBG=10 NetworkManager --debug --log-level=DEBUG --log-domains=ALL 2>&1 | tee /tmp/nm-log.txt #>> reproduce the error # Afterwards, kill NetworkManager (CTRL+C) and undo your changes: systemctl unmask NetworkManager.service systemctl restart NetworkManager Attach the logfile /tmp/nm-log.txt Thank you!!
Created attachment 864249 [details] NetworkManager log file from manual run Thomas, as requested here is the log file from manually running NetworkManager. I used the same versions of both NetworkManager and libnl3 as before.
(In reply to Thomas Haller from comment #10) > Hi Frank, > > I still cannot reproduce this issue (or know what the problem is). > > > Could you please provide another logfile? But this time run NetworkManager > in the terminal. Please do the following steps: > > > > systemctl mask NetworkManager.service > systemctl stop NetworkManager.service > > # enable debugging output for libnl > export NLDBG=10 > > NetworkManager --debug --log-level=DEBUG --log-domains=ALL 2>&1 | tee > /tmp/nm-log.txt > > #>> reproduce the error > > # Afterwards, kill NetworkManager (CTRL+C) and undo your changes: > > systemctl unmask NetworkManager.service > systemctl restart NetworkManager > > > Attach the logfile /tmp/nm-log.txt > > Thank you!! Same issue here. To reproduce it the IP assignment has to be configured as static.
(In reply to Jonatan Sastre Hernández from comment #12) What is your system? Which kernel? So, it also happens for you with static IP addresses? Can I assume, that it is unrelated to DHCP, and it happens basically always when NM tries to configure an IP address (be it static or DHCP)? Does it also happen with IPv6 autoconf? Or IPv6 static?
(In reply to Thomas Haller from comment #13) > (In reply to Jonatan Sastre Hernández from comment #12) > > What is your system? Which kernel? > > So, it also happens for you with static IP addresses? Can I assume, that it > is unrelated to DHCP, and it happens basically always when NM tries to > configure an IP address (be it static or DHCP)? > > Does it also happen with IPv6 autoconf? Or IPv6 static? Unable to reproduce it again on a x86_64 machine. This must be the Heisenbug thing ;) Recent kernel 3.13 update has been pushed to stable and likely libnl3-3.2.24-1 and the new kernel are working together fine now. My ARM machine is using a 3.4 kernel and is not updated. The problem still persists here so it may be related with the kernel version and libnl3, not NetworkManager (the problem appeared no matter what version of NetworkManager were running). By the way 0.9.9.0-29.git20140131.fc20 produces no warning nor error in the logs, as opposed to git20140131.
amendment to my previous message: 20c20 < By the way 0.9.9.0-29.git20140131.fc20 produces no warning nor error in the logs, as opposed to git20140131. --- > By the way 0.9.9.0-29.git20140131.fc20 produces no warning nor error in the logs, as opposed to git20131003.
Thomas, Sorry I could not reply earlier. I am currently travelling and often do not have access to the net. I also do not currently have access to the machines in question. So I can't do much to help you with logs. I would like to point out, however, that all of the three machines in question were running older kernels at the time. Therefore I would assume that, considering the comments above, that the issue is caused by a communication problem between the older Kernels and libnl3, as Jonatan seems to have surmised. Since the problem seems to be an issue with the new libnl3 and older kernels I can imagine some people would prefer just to dismiss it with a WILL NOT FIX and advice to upgrade to a newer kernel. In the past I would have tended to agree, but Fedora now has armhfp as a primary arch. That changes things. I work a lot with ARM systems and I can say from a lot of ARM experience that is is very often not possible to update the kernel to the latest release. Therefore any updated packages should be compatible with an older kernel, or a workaround/fix for the issue somehow needs to be VERY prominently noted so people do not inadvertently break their systems, as is the case with what happened here. Therefore changes which break things when using an older kernel should only occur in a next Fedora release, where it can be duly noted in the release notes. In any case, as I said before, the problem is with libnl3, not NetworkManager.
I checked with our guys back at the office and one of the laptops in question has recently had the kernel updated to 3.13. I asked them to also do a package update of libnl3 and reboot. The issue does not occur with newer kernel. So it definitely is an issue between libnl3 and older kernels. No logs needed for that. Previous kernel on that machine was 3.7.1. So the Netlink communication issue definitely occurs with kernels <=3.7.1. Many ARM platforms are still stuck on 3.4 kernels. Therefore an issue like this can cause a lot of problems. I'll try to get more info, including logs, if possible. In the meantime, to reproduce the issue, just downgrade the kernel to <=3.7.1. Then slowly upgrade to see which kernel resolves the issue. A diff of the libnl3 changes would also help to determine where the problem is located.
(In reply to Russ from comment #17) As you said, it is very much intended that NM <-> libnl3 <-> kernel can work together with arbitrary versions (within reasonable limits). So, this problem should be fixed in any case regardless who is the culprit. I could not reproduce it until now (also tried on a VM with fc20-armv7hl). Will now try with an older kernel there... The error in any case seams to be, that NM cannot add the IPv4 address using libnl3. The error later, about not being able to add the route, is probably just a follow up error of the previous because you cannot add gateway routes if you don't have the proper addresses configured. So, if you guys have this problem, does it happen ~always~ or just sometimes? And does it only affect IPv4 or IPv6 too? And I assume, it happens regardless of DHCP or static configuration? And if I see right, it only happened on 32bit systems (any affected x86_64?) In general, please state also the kernel+architecture, libnl3 version and NM version. Thanks
Created attachment 865026 [details] NM log from armv7hl (kernel-3.4.79 + NM-0.9.9.0-29.git20140131.fc20 + libnl3-3.2.24) armv7hl ------- kernel-3.4.79 + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.21 : Functional, better with NM-*-git20140131 kernel-3.4.79 + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.24 : Persistent issue, IPv4 not assigned. DCHP works well. (an IPv6 address assigned but not tested, seems ok) x86_64 ------ kernel-3.12-?? + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.21 : Functional (some warnings in the logs), better with NM-*-git20140131 kernel-3.12-?? + NetworkManager-0.9.9.0-{28,29,30} + libnl3-3.2.24 : The problem has been reproduced here too, apparently persistent but not well tested (libnl3 was downgraded as a workaround) kernel-3.13-3 + NetworkManager-0.9.9.0-{28,30} + libnl3-3.2.24 : Functional, still some errors/warnings (NM-*-git20140131 may resolve this)
The ARM system I've noticed this issue on is also running an older Kernel version (3.8.13.16), and I can reproduce this issue any time when libnl3-3.2.24-1.fc20 is installed, but not when libnl3-3.2.21-2.fc20 is on the system. Unfortunately I'm stuck on this older kernel on this system since the kenel patches to support this platform (ODROID X2) have not made it in the upstream kernel yet.
I'm still on this. Now I seem to be able to reproduce it, with kernel 3.6.10-8.fc18.armv7hlm, libnl3-upstream and NM-upstream... now bisecting libnl3... (gosh, this VM is so slow//)
The offending line in libnl3 is https://github.com/thom311/libnl/blob/master/lib/route/addr.c#L601 from commit https://github.com/thom311/libnl/commit/42c41336000e1ff781a91c6ec397fd787aae3124 In that case, rtnl_addr_add() returns -7 (Invalid input data or parameter)... which would be NLE_INVAL, which can be one of ENOPROTOOPT,EFAULT,EINVAL. It fails for me with kernel "kernel-3.6.10-8.fc18.armv7hl.rpm" ... *why* that happens is unclear... it seems not to happen with 3.11.0-300.fc20.armv7hl or on my fc20.x64_86
Created attachment 866200 [details] dist-git patch for libnl3 scratch-build, based on 53601fc4dd142bf39ffa529cb839fad94174e59b I made a scratch-build of libnl, which removes the offending line. http://koji.fedoraproject.org/koji/taskinfo?taskID=6557421 Could the affected people please verify that with libnl3-3.2.24-2.test01 this problem no longer happens? Thanks. (note that with libnl3-3.2.24-2.test01 NetworkManager the bugs #1047139 and #1045118 are again unresolved -- but they are also unresolved if you run pre-libnl3-3.2.24 versions).
It works. Still there are some warnings from NetworkMamager but not related with libnl3. Any chance that NM-0.9.9.0-29.git20140131 be pushed to stable again?
libnl3-3.2.24-2.test01 works for me as well: [odroid]$ uname -r 3.8.13.16 [odroid]$ rpm -qa|grep -i libnl3 libnl3-3.2.24-2.test01.fc20.armv7hl libnl3-cli-3.2.24-2.test01.fc20.armv7hl [odroid]$ rpm -qa|grep -i NetworkManager NetworkManager-glib-0.9.9.0-28.git20131003.fc20.armv7hl NetworkManager-0.9.9.0-28.git20131003.fc20.armv7hl [odroid]$ systemctl status NetworkManager.service NetworkManager.service - Network Manager Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled) Active: active (running) since Sun 2014-02-23 13:04:57 CET; 6min ago Main PID: 3953 (NetworkManager) CGroup: /system.slice/NetworkManager.service ├─3953 /usr/sbin/NetworkManager --no-daemon └─4008 /sbin/dhclient -d -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-wlan0.pid -lf /var/lib/NetworkManager/dhclient-43a6b2ef-d206-4e9f-941d-6c74d35a6424-wlan0.lease -cf /var/lib/NetworkManager/dhclient-wlan0.conf... Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) Stage 5 of 5 (IPv4 Configure Commit) scheduled... Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) Stage 5 of 5 (IPv4 Commit) started... Feb 23 13:05:19 odroid NetworkManager[3953]: <info> (wlan0): device state change: ip-config -> ip-check (reason 'none') [70 80 0] Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) Stage 5 of 5 (IPv4 Commit) complete. Feb 23 13:05:19 odroid NetworkManager[3953]: <info> (wlan0): device state change: ip-check -> secondaries (reason 'none') [80 90 0] Feb 23 13:05:19 odroid NetworkManager[3953]: <info> (wlan0): device state change: secondaries -> activated (reason 'none') [90 100 0] Feb 23 13:05:19 odroid NetworkManager[3953]: bound to 192.168.1.102 -- renewal in 369734 seconds. Feb 23 13:05:19 odroid NetworkManager[3953]: <info> NetworkManager state is now CONNECTED_GLOBAL Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Policy set 'WLAN' (wlan0) as default for IPv4 routing and DNS. Feb 23 13:05:19 odroid NetworkManager[3953]: <info> Activation (wlan0) successful, device activated. [odroid]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether 32:2d:8d:51:a0:85 brd ff:ff:ff:ff:ff:ff 3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether e8:4e:06:0a:99:be brd ff:ff:ff:ff:ff:ff inet 192.168.1.102/24 brd 192.168.1.255 scope global wlan0 inet6 fe80::ea4e:6ff:fe0a:99be/64 scope link valid_lft forever preferred_lft forever
I can confirm the results Frank Danapfel has reported for OdroidX2; same hardware, same kernel and library versions, same everything. libnl3-3.2.24-2.test01.fc20.armv7hl and libnl3-cli-3.2.24-2.test01.fc20.armv7hl fix the problem for me as well. [root@localhost odroid]# uname -a Linux localhost 3.8.13.16 #1 SMP PREEMPT Sat Feb 8 17:52:39 BRST 2014 armv7l armv7l armv7l GNU/Linux The only thing I noticed that hasn't been mentioned is router solicitation failures. I assume these are not related. My router is an old 2Wire 1701HG DSL modem (Software 3.17.5). Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263058] [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263212] [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router solicitation retry in 10 seconds. Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263067] [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263211] [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router solicitation retry in 10 seconds. Those messages occur both with libnl3-3.2.24.1 and libnl3-3.2.24-2.test01.
(In reply to Hugh Sutherland from comment #26) > The only thing I noticed that hasn't been mentioned is router solicitation > failures. I assume these are not related. My router is an old 2Wire 1701HG > DSL modem (Software 3.17.5). > > Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263058] > [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation > Feb 23 11:03:16 localhost NetworkManager[2782]: <debug> [1393174996.263212] > [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router > solicitation retry in 10 seconds. > Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263067] > [rdisc/nm-lndp-rdisc.c:226] send_rs(): (eth0): sending router solicitation > Feb 23 11:03:26 localhost NetworkManager[2782]: <debug> [1393175006.263211] > [rdisc/nm-lndp-rdisc.c:234] send_rs(): (eth0): scheduling router > solicitation retry in 10 seconds. That looks like normal debugging output and does not indicate any failure to me. Do you have problems with SLAAC? (please consider opening a separate BZ). NM sends every 10 seconds a router solicitation and logs these lines while doing so.
Thanks Thomas. Apologies for the wasted bandwidth. (No problems with SLAAC to my knowledge.)
The following upstream issue of libnl3 seems to be the same issue: https://github.com/thom311/libnl/issues/56
I'm hitting this too. Running on a armv7hl with 3.4.6 kernel (no functional support for imx51 in newer kernels). I had to rebuild the test-packages (thanks for the attached patch) because koji trashed the scratch build. Using the current NetworkManager and the test packages, my system gets its IPv4 and IPv6 address again.
Same here on F20 with kernel 3.4.75 for sunxi (https://github.com/jwrdegoede/linux-sunxi/tree/fedora-20-07022014). Let's call it the "old kernel". The old kernel does not implement IFA_FLAGS and it seems to freak out and return EINVAL when it gets an unknown attribute (such as IFA_FLAGS or a completely bogus attr #99). On the other hand, 3.11.10-301.fc20.x86-64, let's call it the "new kernel", understands IFA_FLAGS *and* disregards any unknown attributes (like the aforementioned 99). Tested with "src/.libs/nl-addr-add -d wlan0 --family=inet --broadcast=192.168.11.255 -a 192.168.11.11/24 192.168.11.11" or a similar command. I have to admit I am confused by the difference in the handling of unknown attributes because I have failed to find any significant difference between their code so far and it appears to me both versions are supposed to ignore anything they do not recognize (see rtm_to_ifaddr() in net/ipv4/devinet.c and nla_parse() in lib/nlattr.c). But that behaviour makes me somewhat nervous. Is it really desired that the kernel *silently* ignores any attributes it does not recognize? Anyway, I think you can both eat the cake (make libnl3 able to work on older kernels) and keep it (preserve its ability to use 32-bit flags when they are needed and when the kernel supports them) if you change NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags); in build_addr_msg() to if (tmpl->a_flags & ~0xff) NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags); A more sophisticated approach would detect whether the kernel supports IFA_FLAGS and use the result to make the decision.
(In reply to Pavel Kankovsky from comment #31) > Same here on F20 with kernel 3.4.75 for sunxi > (https://github.com/jwrdegoede/linux-sunxi/tree/fedora-20-07022014). Let's > call it the "old kernel". > > The old kernel does not implement IFA_FLAGS and it seems to freak out and > return EINVAL when it gets an unknown attribute (such as IFA_FLAGS or a > completely bogus attr #99). > > On the other hand, 3.11.10-301.fc20.x86-64, let's call it the "new kernel", > understands IFA_FLAGS *and* disregards any unknown attributes (like the > aforementioned 99). > > Tested with "src/.libs/nl-addr-add -d wlan0 --family=inet > --broadcast=192.168.11.255 -a 192.168.11.11/24 192.168.11.11" or a similar > command. > > I have to admit I am confused by the difference in the handling of unknown > attributes because I have failed to find any significant difference between > their code so far and it appears to me both versions are supposed to ignore > anything they do not recognize (see rtm_to_ifaddr() in net/ipv4/devinet.c > and nla_parse() in lib/nlattr.c). But that behaviour makes me somewhat > nervous. Is it really desired that the kernel *silently* ignores any > attributes it does not recognize? I think the kernel should very much ignore unknown attributes, because that is what makes the netlink protocol extendible. And the problem seems to be that the kernel indeed does not ignore the unknown attribute. So, this looks very much like a kernel bug to me. Of course libnl should try to play nice. But I did not track this bug down yet. (Open for suggestions.) > Anyway, I think you can both eat the cake (make libnl3 able to work on older > kernels) and keep it (preserve its ability to use 32-bit flags when they are > needed and when the kernel supports them) if you change > > NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags); > > in build_addr_msg() to > > if (tmpl->a_flags & ~0xff) > NLA_PUT_U32(msg, IFA_FLAGS, tmpl->a_flags); > > A more sophisticated approach would detect whether the kernel supports > IFA_FLAGS and use the result to make the decision. I think this is a good workaround which will help in many cases. It will still brack, if a new application uses libnl and tries to set extended flags... I will send a patch to the libnl mailing list for that...
(In reply to Thomas Haller from comment #32) > I think the kernel should very much ignore unknown attributes, because that > is what makes the netlink protocol extendible. And the problem seems to be > that the kernel indeed does not ignore the unknown attribute. IMHO, it should be up to the program to declare whether a certain attribute is optional and can be ignored or important and the operation should fail if the kernel does not implement it. Cf. the critical flag in X.509. But I guess we are already stuck with ABI that would make it difficult to pass such a flag. > So, this looks very much like a kernel bug to me. [...] > But I did not track this bug down yet. (Open for suggestions.) I have used ftrace to find out that inet_rtm_newaddr() is not called rtnetlink_rcv_msg() when the old kernel encounters an unknown attribute. Ftrace rules! It seems rtnetlink_rcv_msg() itself used refuse any message containing an unknown attribute to but the check was removed approximately a year ago: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/rtnetlink.c?id=661d2967b3f1b34eeaa7e212e7b9bbe8ee072b59 The nature of the patch makes me wonder whether it was a really Thomas Graf's intention to make the kernel silently ignore unknown attributes...
(In reply to Pavel Kankovsky from comment #33) > (In reply to Thomas Haller from comment #32) > > I think the kernel should very much ignore unknown attributes, because that > > is what makes the netlink protocol extendible. And the problem seems to be > > that the kernel indeed does not ignore the unknown attribute. > > IMHO, it should be up to the program to declare whether a certain attribute > is optional and can be ignored or important and the operation should fail if > the kernel does not implement it. Cf. the critical flag in X.509. But I > guess we are already stuck with ABI that would make it difficult to pass > such a flag. I don't agree. The protocol is precisely extendible, because older kernels (should) ignore unknown parts. Of course the protocol must be extended in a manner, that it ignoring the unknown parts doesn't cause problems. > > So, this looks very much like a kernel bug to me. [...] > > But I did not track this bug down yet. (Open for suggestions.) > > I have used ftrace to find out that inet_rtm_newaddr() is not called > rtnetlink_rcv_msg() when the old kernel encounters an unknown attribute. > Ftrace rules! > > It seems rtnetlink_rcv_msg() itself used refuse any message containing an > unknown attribute to but the check was removed approximately a year ago: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/ > core/rtnetlink.c?id=661d2967b3f1b34eeaa7e212e7b9bbe8ee072b59 > > The nature of the patch makes me wonder whether it was a really Thomas > Graf's intention to make the kernel silently ignore unknown attributes... Awesome, thanks for tracking this down. I sent a new patch to the libnl mailing list with your workaround. Once it get's merged, I will patch the libnl-f20 package. Also, NetworkManger must be fixed not to set these additional flags if the kernel does not support it.
libnl workaround pushed to upstream as: https://github.com/thom311/libnl/commit/5206c050504f8676a24854519b9c351470fb7cc6
Pushed NetworkManager patch for review: th/rh1063885_libnl_workaround_for_older_kernel
Can you put some () around the "flags & ~0xFF" bit? Makes it a bit clearer. Other than that, patch looks good.
NetworkManager patch pushed upstream: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=dac51747ab5853b00557d7d97d4b2eae05968c03
libnl3-3.2.24-2.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/libnl3-3.2.24-2.fc20
NetworkManager-0.9.9.0-34.git20131003.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-34.git20131003.fc20
Package NetworkManager-0.9.9.0-34.git20131003.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-34.git20131003.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-4964/NetworkManager-0.9.9.0-34.git20131003.fc20 then log in and leave karma (feedback).
Works for me with libnl3-3.2.24-2.fc20 and NetworkManager-0.9.9.0-34.git20131003.fc20 from updates-testing on a Odroid U2. Thanks!
NetworkManager-0.9.9.0-35.git20131003.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-35.git20131003.fc20
NetworkManager-0.9.9.0-36.git20131003.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-36.git20131003.fc20
NetworkManager-0.9.9.0-37.git20131003.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-37.git20131003.fc20
libnl3-3.2.24-2.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
NetworkManager-0.9.9.0-38.git20131003.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-38.git20131003.fc20
NetworkManager-0.9.9.0-38.git20131003.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.