528525 – ipv4 route cache flushing causing unacceptable latency

Bug 528525 - ipv4 route cache flushing causing unacceptable latency

Summary: ipv4 route cache flushing causing unacceptable latency

Keywords:
Status:	CLOSED DUPLICATE of bug 461655
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-10-12 15:57 UTC by Casey Dahlin
Modified:	2014-06-18 08:46 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-10-12 19:01:33 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Casey Dahlin 2009-10-12 15:57:10 UTC

The customer has been noticing sudden increases in latency every 10 minutes on their production server. The cause has been identified as the periodic route cache flushing. Presently the customer is adjusting rt_secret_interval as a workaround.

This problem appears to be addressed by this commit, and we could use some advice on backporting.

commit 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
Author: Neil Horman <nhorman>
Date:   Mon Oct 27 12:28:25 2008 -0700

   net: implement emergency route cache rebulds when gc_elasticity is exceeded
  
   This is a patch to provide on demand route cache rebuilding.  Currently, our
   route cache is rebulid periodically regardless of need.  This introduced
   unneeded periodic latency.  This patch offers a better approach.  Using code
   provided by Eric Dumazet, we compute the standard deviation of the average hash
   bucket chain length while running rt_check_expire.  Should any given chain
   length grow to larger that average plus 4 standard deviations, we trigger an
   emergency hash table rebuild for that net namespace.  This allows for the common
   case in which chains are well behaved and do not grow unevenly to not incur any
   latency at all, while those systems (which may be being maliciously attacked),
   only rebuild when the attack is detected.  This patch take 2 other factors into
   account:
   1) chains with multiple entries that differ by attributes that do not affect the
   hash value are only counted once, so as not to unduly bias system to rebuilding
   if features like QOS are heavily used
   2) if rebuilding crosses a certain threshold (which is adjustable via the added
   sysctl in this patch), route caching is disabled entirely for that net
   namespace, since constant rebuilding is less efficient that no caching at all
  
   Tested successfully by me.
  
   Signed-off-by: Neil Horman <nhorman>
   Signed-off-by: Eric Dumazet <dada1>
   Signed-off-by: David S. Miller <davem>

Comment 1 Neil Horman 2009-10-12 19:01:33 UTC

already done as part of bz 461655.  in any kernel after -139.el5 you should be able to tune the secret interval to zero and stop the delays

*** This bug has been marked as a duplicate of bug 461655 ***

Note You need to log in before you can comment on or make changes to this bug.