Bug 1302037 - Spurious NEWLINK netlink message after DELLINK when removing wifi module
Summary: Spurious NEWLINK netlink message after DELLINK when removing wifi module
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-26 15:29 UTC by Beniamino Galvani
Modified: 2016-02-23 19:48 UTC (History)
7 users (show)

Fixed In Version: kernel-4.3.5-300.fc23 kernel-4.3.5-200.fc22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-08 03:22:45 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Beniamino Galvani 2016-01-26 15:29:37 UTC
Reproducible on Fedora 23 (kernel 4.2.6-300.fc23.x86_64)

When the wifi module is removed, the kernel sends a spurious NEWLINK
netlink message after DELLINK:

 # ip monitor link &
 [1] 6793

 # modprobe iwlwifi
 59: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UNKNOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DORMANT group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DORMANT group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP>
     link/ether
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP>
     link/ether
 59: wlp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP>
     link/ether
 59: wlp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff

 # modprobe -r iwlmvm iwlwifi
 59: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 Deleted 59: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
     link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
 59: wlp4s0: <BROADCAST,MULTICAST,UP>
     link/ether

Note that the last message arrives after the "Deleted" event. As a
consequence, userspace applications as NetworkManager which rely on
netlink messages to build a internal state of links believe that the
interface has appeared again.

The log above was captured with NetworkManager running, which brings
up and configures the wlp4s0 device.

No message regarding ifindex 59 should be sent after the DELLINK one.

Comment 1 Josh Boyer 2016-01-26 16:04:06 UTC
Does this happen with the 4.3.3 update or with a rawhide kernel?

Comment 2 Beniamino Galvani 2016-01-26 16:23:31 UTC
(In reply to Josh Boyer from comment #1)
> Does this happen with the 4.3.3 update or with a rawhide kernel?

Upgraded to 4.3.4-300.fc23.x86_64, still happens.

Comment 3 Johannes Berg 2016-01-26 22:08:08 UTC
Would you be able to rebuild "ip" (iproute2) with a change to print out n->nlmsg_pid somewhere at the beginning of accept_msg() in ip/ipmonitor.c ?

Comment 4 Beniamino Galvani 2016-01-27 08:18:45 UTC
(In reply to Johannes Berg from comment #3)
> Would you be able to rebuild "ip" (iproute2) with a change to print out
> n->nlmsg_pid somewhere at the beginning of accept_msg() in ip/ipmonitor.c ?

It's always zero:

(pid 0) 17: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
    link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
(pid 0) Deleted 17: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
(pid 0) 17: wlp4s0: <BROADCAST,MULTICAST,UP>
    link/ether

Comment 5 Johannes Berg 2016-01-27 08:22:51 UTC
Indicating that the message is, indeed, coming from the kernel. Very odd, I don't even see how that could be generated without the MAC address etc.

Comment 6 Johannes Berg 2016-01-27 10:00:17 UTC
I can't reproduce it - if you have some time, can you ping me on IRC ("johill" on freenode or OFTC)?

I think it might also be a wext message, can you print something like

  printf("ifla_wireless=%d\n", !!tb[IFLA_WIRELESS]);

in print_linkinfo() in ip/ipaddress.c - after parse_rtattr()?

Comment 7 Beniamino Galvani 2016-01-27 11:11:53 UTC
Right, the last message has IFLA_WIRELESS set:

  wireless=0 42: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
      link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
  wireless=0 Deleted 42: wlp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
      link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
  wireless=1 42: wlp4s0: <BROADCAST,MULTICAST,UP>
      link/ether

and the caller seems to be:

 0xffffffff81768fe0 : wireless_send_event+0x0/0x400 [kernel]
 0xffffffffa0983185 : __cfg80211_disconnected+0x235/0x300 [cfg80211]
 0xffffffffa097ed3a : cfg80211_process_deauth+0xca/0xf0 [cfg80211]
 0xffffffffa097efef : cfg80211_tx_mlme_mgmt+0xaf/0xc0 [cfg80211]
 0xffffffffa0896843 : ieee80211_report_disconnect+0x63/0x130 [mac80211]
 0xffffffffa089c952 : ieee80211_mgd_deauth+0x132/0x220 [mac80211]
 0xffffffffa08666e8 : ieee80211_deauth+0x18/0x20 [mac80211]
 0xffffffffa097f9b2 : cfg80211_mlme_deauth+0xd2/0x130 [cfg80211]
 0xffffffffa097fbfb : cfg80211_mlme_down+0x6b/0x90 [cfg80211]
 0xffffffffa0983a45 : cfg80211_disconnect+0x175/0x190 [cfg80211]
 0xffffffffa0958ecd : __cfg80211_leave+0x8d/0x120 [cfg80211]
 0xffffffffa0958f8b : cfg80211_leave+0x2b/0x40 [cfg80211]
 0xffffffffa0959333 : cfg80211_netdev_notifier_call+0x393/0x5b0 [cfg80211]
 0xffffffff810bfc8a : notifier_call_chain+0x4a/0x70 [kernel]
 0xffffffff810bfe06 : raw_notifier_call_chain+0x16/0x20 [kernel]
 0xffffffff81665fb5 : call_netdevice_notifiers_info+0x35/0x60 [kernel]
 0xffffffff816662ca : __dev_close_many+0x5a/0x100 [kernel]
 0xffffffff816663f7 : dev_close_many+0x87/0x130 [kernel]
 0xffffffff81668745 : dev_close.part.77+0x45/0x70 [kernel]
 0xffffffff8166878a : dev_close+0x1a/0x20 [kernel]

I can reproduce this every time when NM or wpa_supplicant are managing the interface and the module is removed. I'll ping you on IRC later, thanks.

Comment 8 Johannes Berg 2016-01-27 11:13:58 UTC
Ah, you were connected. Perhaps with that information I can reproduce it, let me try.

Comment 9 Johannes Berg 2016-01-27 11:48:45 UTC
fix: https://p.sipsolutions.net/926eac7feec5a6a5.txt

Comment 10 Josh Boyer 2016-01-27 14:22:59 UTC
(In reply to Johannes Berg from comment #9)
> fix: https://p.sipsolutions.net/926eac7feec5a6a5.txt

We'd likely want both patches in the series you sent to netdev, correct?

Comment 11 Johannes Berg 2016-01-27 14:24:24 UTC
Yes; After sending I realized that there was another issue with the "UP" ordering, fixing that required the second patch.

Comment 12 Josh Boyer 2016-01-28 20:09:18 UTC
I've added both to all branches in Fedora.  Thanks for such a quick fix, Johannes!

Comment 13 Josh Boyer 2016-01-29 13:32:26 UTC
Looks like kernel test robot found an issue with the first patch.  Should I hold off on including these?

http://thread.gmane.org/gmane.linux.kernel/2139378

Comment 14 Johannes Berg 2016-01-29 16:14:41 UTC
Ahrg. I'd fixed that issue, but discarded the change of approach (and introduced the second patch) and forgot to carry over the fix...

I've updated my tree at https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211.git/ to fix this issue.

Comment 15 Josh Boyer 2016-01-29 17:00:17 UTC
(In reply to Johannes Berg from comment #14)
> Ahrg. I'd fixed that issue, but discarded the change of approach (and
> introduced the second patch) and forgot to carry over the fix...
> 
> I've updated my tree at
> https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211.git/ to fix this
> issue.

Could you point out the change you forgot to carry over?  I looked at your updated tree, and I don't see any difference in the patches there vs. the ones you sent to netdev.

Comment 16 Josh Boyer 2016-01-29 17:01:12 UTC
(In reply to Josh Boyer from comment #15)
> (In reply to Johannes Berg from comment #14)
> > Ahrg. I'd fixed that issue, but discarded the change of approach (and
> > introduced the second patch) and forgot to carry over the fix...
> > 
> > I've updated my tree at
> > https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211.git/ to fix this
> > issue.
> 
> Could you point out the change you forgot to carry over?  I looked at your
> updated tree, and I don't see any difference in the patches there vs. the
> ones you sent to netdev.

Oh, wait.  I might have had a stale cached copy via gitweb.  Let me review again.

Comment 17 Josh Boyer 2016-01-29 17:02:24 UTC
(In reply to Josh Boyer from comment #16)
> (In reply to Josh Boyer from comment #15)
> > (In reply to Johannes Berg from comment #14)
> > > Ahrg. I'd fixed that issue, but discarded the change of approach (and
> > > introduced the second patch) and forgot to carry over the fix...
> > > 
> > > I've updated my tree at
> > > https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211.git/ to fix this
> > > issue.
> > 
> > Could you point out the change you forgot to carry over?  I looked at your
> > updated tree, and I don't see any difference in the patches there vs. the
> > ones you sent to netdev.
> 
> Oh, wait.  I might have had a stale cached copy via gitweb.  Let me review
> again.

Yes, that was it.  I see the difference now.  I'll update the patches in Fedora git.

Comment 18 Fedora Update System 2016-02-02 02:26:05 UTC
kernel-4.3.5-200.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-16a5625f33

Comment 19 Fedora Update System 2016-02-02 02:27:06 UTC
kernel-4.3.5-300.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-fd30ad26a9

Comment 20 Fedora Update System 2016-02-08 03:22:36 UTC
kernel-4.3.5-300.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 21 Fedora Update System 2016-02-23 19:48:45 UTC
kernel-4.3.5-200.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.