Description of problem: On high load servers dst_entry can be leaked and we can see messages (during network driver removal): unregister_netdevice: waiting for eth0 to become free. Usage count = 3 unregister_netdevice: waiting for eth0 to become free. Usage count = 3 unregister_netdevice: waiting for eth0 to become free. Usage count = 3 Version-Release number of selected component (if applicable): kernel-2.6.18-164.6.1.el5 How reproducible: A few Steps to Reproduce: 1. Do many connection to server from different clients. 2. Try remove ethernet driver Additional info:
Bug is in linux-2.6-net-allow-for-on-demand-emergency-route-cache-flushing.patch : + */ + if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl)) + rthi = rth; } if (cand) { @@ -989,6 +1088,16 @@ restart: *candp = cand->u.rt_next; rt_free(cand); ^^^^^^^^^^^^^^^^^^^^^^ Here 'cand' can be equal 'rthi'. Therefore 'rh' will be added to list that is out of hash. } ... } /* Try to bind route to arp only if it is output @@ -1026,7 +1135,11 @@ restart: } } - rt->u.rt_next = rt_hash_table[hash].chain; + if (rthi) + rt->u.rt_next = rthi->u.rt_next; + else + rt->u.rt_next = rt_hash_table[hash].chain;
Created attachment 373725 [details] fix-dst-entry-leak.patch Patch that fixes this bug
Created attachment 373730 [details] fix-dst-entry-leak2.patch Patch that fixes this bug. Changes: Don't reset rthi in rt_emergency_hash_rebuild, as 1. rt_emergency_hash_rebuild() in RHEL raises deadlock (access to rt_hash_lock_addr(hash)) 2. If rt_emergency_hash_rebuild() will be fixed, rthi isn't need to be reset , as fixed version doesn't rebuild hash immediately.
Mainstream kernel reverted this patch. See commit 1ddbcb005c395518c2cd0df504cff3d4b5c85853 : commit 1ddbcb005c395518c2cd0df504cff3d4b5c85853 Author: Eric Dumazet <dada1> Date: Tue May 19 20:14:28 2009 +0000 net: fix rtable leak in net/ipv4/route.c Alexander V. Lukyanov found a regression in 2.6.29 and made a complete analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339 Quoted here because its a perfect one : begin_of_quotation 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the patch has at least one critical flaw, and another problem. rt_intern_hash calculates rthi pointer, which is later used for new entry insertion. The same loop calculates cand pointer which is used to clean the list. If the pointers are the same, rtable leak occurs, as first the cand is removed then the new entry is appended to it. This leak leads to unregister_netdevice problem (usage count > 0). Another problem of the patch is that it tries to insert the entries in certain order, to facilitate counting of entries distinct by all but QoS parameters. Unfortunately, referencing an existing rtable entry moves it to list beginning, to speed up further lookups, so the carefully built order is destroyed. For the first problem the simplest patch it to set rthi=0 when rthi==cand, but it will also destroy the ordering. end_of_quotation Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b (net: implement emergency route cache rebulds when gc_elasticity is exceeded) Trying to keep dst_entries ordered is too complex and breaks the fact that order should depend on the frequency of use for garbage collection. A possible fix is to make rt_intern_hash() simpler, and only makes rt_check_expire() a litle bit smarter, being able to cope with an arbitrary entries order. The added loop is running on cache hot data, while cpu is prefetching next object, so should be unnoticied. Reported-and-analyzed-by: Alexander V. Lukyanov <lav> Signed-off-by: Eric Dumazet <dada1> Acked-by: Neil Horman <nhorman> Signed-off-by: David S. Miller <davem>
This bug probably may give kernel oops: http://forum.openvz.org/index.php?t=rview&th=8210&goto=38493#msg_38493 I catch kernel panics on few servers after kernel upgrade. Rollback to kernel without rt_emergency_hash_rebuild() (from file net/ipv4/route.c) solve problem.
Created attachment 386804 [details] initial revision of a patch Backports of: 1ddbcb005c395518c2cd0df504cff3d4b5c85853 00269b54edbf25f3bb0dccb558ae23a6fc77ed86
Created attachment 386883 [details] second revision of patch
Thomas, please, take a look at the upstream commit cf8da764fc6959b7efb482f375dfef9830e98205 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cf8da764fc6959b7efb482f375dfef9830e98205 This is just another bug in the same code: net: fix length computation in rt_check_expire() rt_check_expire() computes average and standard deviation of chain lengths, but not correclty reset length to 0 at beginning of each chain. This probably gives overflows for sum2 (and sum) on loaded machines instead of meaningful results. Signed-off-by: Eric Dumazet <dada1> Acked-by: Neil Horman <nhorman> Signed-off-by: David S. Miller <davem> Please, take a look at the resulted combined patch, hope it may be useful: http://1371.bugzilla.openvz.org/attachment.cgi?id=1098 Thank you.
Looks good.
Created attachment 441954 [details] updated patch Integrated commit cf8da764fc6959b7efb482f375dfef9830e98205
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-221.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
I was trying to verify the bug these days. I followed your steps to reproduce the issue many times, remove the ethernet driver under high load, and also route cache had grew with huge number of different destination IPs, but couldn't trigger the issue. Any ideas? It's a similar issue with bug 566104, are there any other ways to trigger the issue? Thanks!
No responses from Vitaliy. Added Pavel, he had ideas to reproduce bug 566104. Hi Pavel, do you have some good ideas about how to trigger this issue? Thanks!
Yes - invent such ip addresses and send them in proper order, so that respective dst entries happen in one chain at desired time. But frankly, I wouldn't try to trigger that, since it's too heavy to catch this situation.
The patch was included and applied in kernel 2.6.18-236.el5: [root@intel-s3e8132-01 SPECS]# grep 541224 kernel-2.6.spec - [net] ipv4: fix leak, rcu and length in route cache gc (Thomas Graf) [541224] [root@intel-s3e8132-01 SPECS]# grep -i "Patch25678" kernel-2.6.spec Patch25678: linux-2.6-net-ipv4-fix-leak-rcu-and-length-in-route-cache-gc.patch %patch25678 -p1 Also passed ip route function sanity checks.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html