Bug 1379767 - unregister_netdevice: waiting for lo to become free. Usage count = 1 [NEEDINFO]
Summary: unregister_netdevice: waiting for lo to become free. Usage count = 1
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 24
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-27 15:35 UTC by Stefan Assmann
Modified: 2017-04-28 20:09 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-28 17:23:26 UTC
Type: Bug
Embargoed:
jforbes: needinfo?


Attachments (Terms of Use)

Description Stefan Assmann 2016-09-27 15:35:25 UTC
Description of problem:
Sometimes when I use docker I get an error message about an unregistered netdevice.

[343948.870623] docker0: port 1(veth0f10bc6) entered disabled state
[344088.137146] unregister_netdevice: waiting for lo to become free. Usage count = 1
[344098.176868] unregister_netdevice: waiting for lo to become free. Usage count = 1
[344179.258056] docker0: port 1(vethfaf0cca) entered blocking state

Version-Release number of selected component (if applicable):
docker-1.10.3-26.git1ecb834.fc24.x86_64
kernel-4.7.2-201.fc24.x86_64

How reproducible:
happens occasionally

Steps to Reproduce:
1. start/stop docker container

Comment 1 Bernardo Donadio 2016-10-11 13:11:45 UTC
This was corrected upstream, as per the kernel changelog https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.22

I request this fix to be backported to RHEL7.2 3.10 kernel.

commit 8b18e0e49804ad6d481482a6663b18d99510fdfe
Author: Wei Yongjun <weiyongjun1>
Date:   Mon Sep 5 16:06:31 2016 +0800

    ipv6: addrconf: fix dev refcont leak when DAD failed

commit 751eb6b6042a596b0080967c1a529a9fe98dac1d upstream.

In general, when DAD detected IPv6 duplicate address, ifp->state
will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
delayed work, the call tree should be like this:

ndisc_recv_ns
  -> addrconf_dad_failure        <- missing ifp put
     -> addrconf_mod_dad_work
       -> schedule addrconf_dad_work()
         -> addrconf_dad_stop()  <- missing ifp hold before call it

addrconf_dad_failure() called with ifp refcont holding but not put.
addrconf_dad_work() call addrconf_dad_stop() without extra holding
refcount. This will not cause any issue normally.

But the race between addrconf_dad_failure() and addrconf_dad_work()
may cause ifp refcount leak and netdevice can not be unregister,
dmesg show the following messages:

IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
...
unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing
to workqueue")
Signed-off-by: Wei Yongjun <weiyo...>
Signed-off-by: David S. Miller <da...>
Signed-off-by: Greg Kroah-Hartman <gre...>

---
 net/ipv6/addrconf.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1898,6 +1898,7 @@ errdad:
 	spin_unlock_bh(&ifp->lock);
 
 	addrconf_mod_dad_work(ifp, 0);
+	in6_ifa_put(ifp);
 }
 
 /* Join to solicited addr multicast group.
@@ -3609,6 +3610,7 @@ static void addrconf_dad_work(struct wor
 		addrconf_dad_begin(ifp);
 		goto out;
 	} else if (action == DAD_ABORT) {
+		in6_ifa_hold(ifp);
 		addrconf_dad_stop(ifp, 1);
 		goto out;
 	}

Comment 2 Josh Boyer 2016-10-11 13:23:56 UTC
(In reply to Bernardo Donadio from comment #1)
> This was corrected upstream, as per the kernel changelog
> https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.22
> 
> I request this fix to be backported to RHEL7.2 3.10 kernel.

Please use the RHEL process to request fixes and backports.  The Fedora and RHEL bugzilla products are separate and are handled entirely differently.

Comment 3 Stefan Assmann 2016-10-11 13:29:14 UTC
I'm seeing this issue with a fedora 24 kernel, so it is not fixed by 8b18e0e49804ad6d481482a6663b18d99510fdfe. Also note that in my case it concerns the lo interface.

Comment 4 Bernardo Donadio 2016-10-13 02:49:31 UTC
Josh, comment #2, sorry. I was with RHEL in mind (since my productions systems are also affected by this bug), but I meant the Fedora current kernel (I'm seeing the issue on my F24 desktop). Too much coffee I guess...

Stefan, comment #3, this fix was applied to the currently supported branches of the kernel by Linus. However, since it is recent, there's a good chance that it hadn't reached Fedora 24 stable yet. I will verify it as soon as I have a bit of spare time.

In the meantime, there's quite a bit of discussion on this issue and how it affects docker in the following link:
https://github.com/docker/docker/issues/5618

Comment 5 flex 2017-01-15 03:04:31 UTC
Also met this issue in the newest kernel with docker:

[Sun Jan 15 10:53:44 2017] unregister_netdevice: waiting for lo to become free. Usage count = 1
[Sun Jan 15 10:54:07 2017] unregister_netdevice: waiting for lo to become free. Usage count = 1
[Sun Jan 15 10:54:17 2017] unregister_netdevice: waiting for lo to become free. Usage count = 1

kernel: 3.10.0-514.2.2.el7.x86_64
docker: 1.12.5

Comment 6 David MARTIN 2017-02-13 06:01:32 UTC
Any hope to see this backport coming to Centos7/Atomic as I'm facing also this problem from time to time?
It appears randomly, on the very latest version of Atomic.

Comment 7 Justin M. Forbes 2017-04-11 14:48:26 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-100.fc24.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 8 Justin M. Forbes 2017-04-28 17:23:26 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the 
relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.