Bug 528525

Summary: ipv4 route cache flushing causing unacceptable latency
Product: Red Hat Enterprise Linux 5 Reporter: Casey Dahlin <cdahlin>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.4CC: tao, vanhoof
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-12 19:01:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Casey Dahlin 2009-10-12 15:57:10 UTC
The customer has been noticing sudden increases in latency every 10 minutes on their production server. The cause has been identified as the periodic route cache flushing. Presently the customer is adjusting rt_secret_interval as a workaround.

This problem appears to be addressed by this commit, and we could use some advice on backporting.

commit 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Mon Oct 27 12:28:25 2008 -0700

   net: implement emergency route cache rebulds when gc_elasticity is exceeded
  
   This is a patch to provide on demand route cache rebuilding.  Currently, our
   route cache is rebulid periodically regardless of need.  This introduced
   unneeded periodic latency.  This patch offers a better approach.  Using code
   provided by Eric Dumazet, we compute the standard deviation of the average hash
   bucket chain length while running rt_check_expire.  Should any given chain
   length grow to larger that average plus 4 standard deviations, we trigger an
   emergency hash table rebuild for that net namespace.  This allows for the common
   case in which chains are well behaved and do not grow unevenly to not incur any
   latency at all, while those systems (which may be being maliciously attacked),
   only rebuild when the attack is detected.  This patch take 2 other factors into
   account:
   1) chains with multiple entries that differ by attributes that do not affect the
   hash value are only counted once, so as not to unduly bias system to rebuilding
   if features like QOS are heavily used
   2) if rebuilding crosses a certain threshold (which is adjustable via the added
   sysctl in this patch), route caching is disabled entirely for that net
   namespace, since constant rebuilding is less efficient that no caching at all
  
   Tested successfully by me.
  
   Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
   Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
   Signed-off-by: David S. Miller <davem@davemloft.net>

Comment 1 Neil Horman 2009-10-12 19:01:33 UTC
already done as part of bz 461655.  in any kernel after -139.el5 you should be able to tune the secret interval to zero and stop the delays

*** This bug has been marked as a duplicate of bug 461655 ***