Bug 116982

Summary: ip address flush deadlocks doing netlink communication over and over on 2.6 kernel
Product: [Fedora] Fedora Reporter: Arkadiusz Miskiewicz <arekm>
Component: iprouteAssignee: Phil Knirsch <pknirsch>
Status: CLOSED UPSTREAM QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: alex.kiernan, rvokal
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-03-17 14:34:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 114963    

Description Arkadiusz Miskiewicz 2004-02-27 00:39:29 UTC
On 2.6 kernel iproute-2.4.7-11:

ip a a 192.168.0.1/24 dev eth0
ip link set eth0 down
ip a flush dev eth0

Here on my vanilla 2.6.2 it locks eating CPU - it does netlink 
communication over and over. This ,,hang'' doesn't happen when 
interface is in UP state. Also doesn't happen on 2.4 kernels.

This also could be kernel bug...

Comment 1 Bill Nottingham 2004-03-02 04:19:38 UTC
What adapter?

Comment 2 Arkadiusz Miskiewicz 2004-03-02 10:19:19 UTC
Doesn't really matter (but it's 3c905C-TX/TX-M using 3c59x and some 
RTL-8139/8139C/8139C+ using 8139too). It even hapens on lo device so 
should be easy to reproduce. ,,how to reproduce'' recipe doesn't work 
for you?

Comment 3 Arkadiusz Miskiewicz 2004-03-04 01:57:16 UTC
It seems that in ipaddr_list_or_flush() in for(;;) loop 
rtnl_dump_filter() function executes filter++ at each pass so it 
never leaves that for(;;) loop.

On 2.6 one thing happens different from 2.4 kernels.
rtnl_wilddump_request() sends request with rth->dump == 1078365203 
but gets answer with h->nlmsg_seq == 1078365202 and in next while 
(NLMSG_OK(h, status)) pass it gets the right one h->nlmsg_seq  == 
1078365203.

on 2.4 it always gets right reply. Maybe this has nothing to do with 
the problem or maybe it has.

Comment 4 Arkadiusz Miskiewicz 2004-03-04 22:26:33 UTC
For now I've just limited number of loop passed to 10k. (hack).

diff -urN iproute2.org/ip/ipaddress.c iproute2/ip/ipaddress.c
--- iproute2.org/ip/ipaddress.c 2004-03-04 23:00:41.050515248 +0100
+++ iproute2/ip/ipaddress.c     2004-03-04 23:08:08.810575433 +0100
@@ -603,7 +603,7 @@
                                fprintf(stderr, "Flush terminated\n")
;
                                exit(1);
                        }
-                       if (filter.flushed == 0) {
+                       if (filter.flushed == 0 || round > 10000) {
                                if (round == 0) {
                                        fprintf(stderr, "Nothing to 
flush.\n");
                                } else if (show_stats)


Comment 5 Arkadiusz Miskiewicz 2004-03-17 14:34:56 UTC
Fixed in kernel
http://oss.sgi.com/projects/netdev/archive/2004-03/msg00190.html