Bug 541224 - net: possible leak of dst_entry (ipv4)
Summary: net: possible leak of dst_entry (ipv4)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Thomas Graf
QA Contact: Dayong Tian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-25 09:55 UTC by Vitaliy Gusev
Modified: 2014-06-18 08:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 20:56:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fix-dst-entry-leak.patch (996 bytes, patch)
2009-11-25 11:12 UTC, Vitaliy Gusev
no flags Details | Diff
fix-dst-entry-leak2.patch (792 bytes, patch)
2009-11-25 12:15 UTC, Vitaliy Gusev
no flags Details | Diff
initial revision of a patch (3.30 KB, patch)
2010-01-26 11:19 UTC, Thomas Graf
no flags Details | Diff
second revision of patch (3.29 KB, patch)
2010-01-26 17:24 UTC, Thomas Graf
no flags Details | Diff
updated patch (3.65 KB, patch)
2010-08-30 14:39 UTC, Thomas Graf
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Vitaliy Gusev 2009-11-25 09:55:19 UTC
Description of problem:

 On high load servers dst_entry can be leaked and we can see messages (during
 network driver removal):


   unregister_netdevice: waiting for eth0 to become free. Usage count = 3
   unregister_netdevice: waiting for eth0 to become free. Usage count = 3
   unregister_netdevice: waiting for eth0 to become free. Usage count = 3

Version-Release number of selected component (if applicable):

     kernel-2.6.18-164.6.1.el5

How reproducible:

  A few
   
Steps to Reproduce:
1.  Do many connection to server from different clients.
2.  Try remove ethernet driver

 

Additional info:

Comment 1 Vitaliy Gusev 2009-11-25 10:55:31 UTC
Bug is in linux-2.6-net-allow-for-on-demand-emergency-route-cache-flushing.patch :



+		 */
+		if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
+			rthi = rth;
 	}
 
 	if (cand) {
@@ -989,6 +1088,16 @@ restart:
 			*candp = cand->u.rt_next;
 			rt_free(cand);
 
                ^^^^^^^^^^^^^^^^^^^^^^
                Here 'cand' can be equal 'rthi'. Therefore 'rh' will be added
                to list that is out of hash.

 
 		}

       ...
 	}
 
 	/* Try to bind route to arp only if it is output
@@ -1026,7 +1135,11 @@ restart:
 		}
 	}
 
-	rt->u.rt_next = rt_hash_table[hash].chain;
+	if (rthi)
+		rt->u.rt_next = rthi->u.rt_next;
+	else
+		rt->u.rt_next = rt_hash_table[hash].chain;

Comment 2 Vitaliy Gusev 2009-11-25 11:12:57 UTC
Created attachment 373725 [details]
fix-dst-entry-leak.patch

Patch that fixes this bug

Comment 3 Vitaliy Gusev 2009-11-25 12:15:36 UTC
Created attachment 373730 [details]
fix-dst-entry-leak2.patch

Patch that fixes this bug.

Changes: 
         Don't reset rthi in rt_emergency_hash_rebuild, as 
        1. rt_emergency_hash_rebuild() in RHEL raises deadlock (access to rt_hash_lock_addr(hash))

        2. If  rt_emergency_hash_rebuild() will be fixed, rthi isn't need to be reset , as
            fixed version doesn't rebuild hash immediately.

Comment 4 Vitaliy Gusev 2009-11-26 09:55:58 UTC
Mainstream kernel reverted this patch. See commit 1ddbcb005c395518c2cd0df504cff3d4b5c85853 :


commit 1ddbcb005c395518c2cd0df504cff3d4b5c85853
Author: Eric Dumazet <dada1>
Date:   Tue May 19 20:14:28 2009 +0000

    net: fix rtable leak in net/ipv4/route.c
    
    Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
    analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
    Quoted here because its a perfect one :
    
    begin_of_quotation
     2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
     patch has at least one critical flaw, and another problem.
    
     rt_intern_hash calculates rthi pointer, which is later used for new entry
     insertion. The same loop calculates cand pointer which is used to clean the
     list. If the pointers are the same, rtable leak occurs, as first the cand is
     removed then the new entry is appended to it.
    
     This leak leads to unregister_netdevice problem (usage count > 0).
    
     Another problem of the patch is that it tries to insert the entries in certain
     order, to facilitate counting of entries distinct by all but QoS parameters.
     Unfortunately, referencing an existing rtable entry moves it to list beginning,
     to speed up further lookups, so the carefully built order is destroyed.
    
     For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
     it will also destroy the ordering.
    end_of_quotation
    
    Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
    (net: implement emergency route cache rebulds when gc_elasticity is exceeded)
    
    Trying to keep dst_entries ordered is too complex and breaks the fact that
    order should depend on the frequency of use for garbage collection.
    
    A possible fix is to make rt_intern_hash() simpler, and only makes
    rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
    entries order. The added loop is running on cache hot data, while cpu
    is prefetching next object, so should be unnoticied.
    
    Reported-and-analyzed-by: Alexander V. Lukyanov <lav>
    Signed-off-by: Eric Dumazet <dada1>
    Acked-by: Neil Horman <nhorman>
    Signed-off-by: David S. Miller <davem>

Comment 5 ilyas 2010-01-16 13:06:31 UTC
This bug probably may give kernel oops: 

http://forum.openvz.org/index.php?t=rview&th=8210&goto=38493#msg_38493

I catch kernel panics on few servers after kernel upgrade.

Rollback to kernel without rt_emergency_hash_rebuild() (from file net/ipv4/route.c) solve problem.

Comment 6 Thomas Graf 2010-01-26 11:19:23 UTC
Created attachment 386804 [details]
initial revision of a patch

Backports of:
1ddbcb005c395518c2cd0df504cff3d4b5c85853
00269b54edbf25f3bb0dccb558ae23a6fc77ed86

Comment 7 Thomas Graf 2010-01-26 17:24:56 UTC
Created attachment 386883 [details]
second revision of patch

Comment 9 Konstantin Khorenko 2010-03-29 12:23:14 UTC
Thomas, please, take a look at the upstream commit cf8da764fc6959b7efb482f375dfef9830e98205
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cf8da764fc6959b7efb482f375dfef9830e98205

This is just another bug in the same code:
net: fix length computation in rt_check_expire()

rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.

Signed-off-by: Eric Dumazet <dada1>
Acked-by: Neil Horman <nhorman>
Signed-off-by: David S. Miller <davem>

Please, take a look at the resulted combined patch, hope it may be useful: http://1371.bugzilla.openvz.org/attachment.cgi?id=1098

Thank you.

Comment 10 Thomas Graf 2010-08-24 08:19:55 UTC
Looks good.

Comment 11 Thomas Graf 2010-08-30 14:39:03 UTC
Created attachment 441954 [details]
updated patch

Integrated commit cf8da764fc6959b7efb482f375dfef9830e98205

Comment 13 RHEL Program Management 2010-09-07 23:19:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 15 Jarod Wilson 2010-09-15 13:59:59 UTC
in kernel-2.6.18-221.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 17 Dayong Tian 2010-12-06 03:31:01 UTC
I was trying to verify the bug these days. I followed your steps to reproduce the issue many times, remove the ethernet driver under high load, and also route cache had grew with huge number of different destination IPs, but couldn't trigger the issue. Any ideas? 
It's a similar issue with bug 566104, are there any other ways to trigger the issue? Thanks!

Comment 18 Dayong Tian 2010-12-08 02:06:57 UTC
No responses from Vitaliy. Added Pavel, he had ideas to reproduce bug 566104.
Hi Pavel, do you have some good ideas about how to trigger this issue? Thanks!

Comment 19 Pavel Emelyanov 2010-12-08 11:01:19 UTC
Yes - invent such ip addresses and send them in proper order, so that respective dst entries happen in one chain at desired time.

But frankly, I wouldn't try to trigger that, since it's too heavy to catch this situation.

Comment 22 Dayong Tian 2010-12-14 03:23:27 UTC
The patch was included and applied in kernel 2.6.18-236.el5:

[root@intel-s3e8132-01 SPECS]# grep 541224 kernel-2.6.spec
- [net] ipv4: fix leak, rcu and length in route cache gc (Thomas Graf) [541224]

[root@intel-s3e8132-01 SPECS]# grep -i "Patch25678" kernel-2.6.spec
Patch25678: linux-2.6-net-ipv4-fix-leak-rcu-and-length-in-route-cache-gc.patch
%patch25678 -p1

Also passed ip route function sanity checks.

Comment 24 errata-xmlrpc 2011-01-13 20:56:05 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.