Bug 485163 - (CVE-2009-0778) CVE-2009-0778 kernel: rt_cache leak leads to lack of network connectivity
CVE-2009-0778 kernel: rt_cache leak leads to lack of network connectivity
Status: CLOSED ERRATA
Product: Security Response
Classification: Other
Component: vulnerability (Show other bugs)
unspecified
All Linux
high Severity high
: ---
: ---
Assigned To: Red Hat Product Security
public=20080326,source=bugzilla,repor...
: Security
Depends On: 439670 489253 489254 549237
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-11 19:31 EST by Hector Herrera
Modified: 2013-03-15 12:11 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-12-21 12:58:53 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ip_dst_cache (in blue) VS `route -Cn | wc -l` (in green) over time (3.02 KB, image/png)
2009-02-11 19:31 EST, Hector Herrera
no flags Details
ip_dst_cache (in blue) vs. `ip route ls cache | wc -l` (in green) (3.17 KB, image/png)
2009-02-12 12:27 EST, Hector Herrera
no flags Details
rt_cache values over an extended time period (4.09 KB, image/png)
2009-02-24 14:52 EST, Hector Herrera
no flags Details
stap script to help diagnose potential router leak (804 bytes, text/plain)
2009-02-25 13:58 EST, Neil Horman
no flags Details
Stdout of stap script (15.80 KB, application/x-gzip)
2009-02-25 21:45 EST, Hector Herrera
no flags Details
Stderr of stap script (8.86 KB, application/x-gzip)
2009-02-25 21:45 EST, Hector Herrera
no flags Details
sosreport with kernel 2.6.18-128.el5 (1.35 MB, application/octet-stream)
2009-02-27 18:51 EST, Hector Herrera
no flags Details
sosreport before issue (kernel 2.6.18-128) (1.36 MB, application/octet-stream)
2009-02-28 22:57 EST, Hector Herrera
no flags Details
sosreport after issue (kernel 2.6.18-128) (1.36 MB, application/octet-stream)
2009-02-28 22:59 EST, Hector Herrera
no flags Details
patch to revert a previous leak fix (976 bytes, patch)
2009-03-02 15:55 EST, Neil Horman
no flags Details | Diff
patch to fix the dst_entry leak (1.29 KB, patch)
2009-03-06 20:05 EST, Neil Horman
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
CentOS 2990 None None None Never

  None (edit)
Description Hector Herrera 2009-02-11 19:31:32 EST
Created attachment 331637 [details]
ip_dst_cache (in blue) VS `route -Cn | wc -l` (in green) over time

Description of problem:
The value of ip_dst_cache (in /proc/slabinfo) continues to grow constantly, even thought the cached route table remains fairly constant.  This leads to the eventual time when ip_dst_cache reaches the value of /proc/sys/net/ipv4/route/max_size.  When this happens, the kernel complains with 'dst cache overflow' and the server no longer responds to any network activity.

Version-Release number of selected component (if applicable): kernel 2.6.18-92.1.22.el5


How reproducible: live system being currently affected by this issue.


Steps to Reproduce:
1. Configure test machine as a router between two networks
2. send packets from network A to network B with a large number of different source/dest IPs
3. Watch the values of ip_dst_cache and rt_cache
  
Actual results:
ip_dst_cache continues to grow while rt_cache grows and shrinks with the traffic

Expected results:
ip_dst_cache and rt_cache follow each other closely.  Values return to zero after traffic stops and route cache entries expire.

Additional info:
Comment 1 Hector Herrera 2009-02-12 12:23:31 EST
Leak rate is slower during higher traffic loads.
Comment 2 Hector Herrera 2009-02-12 12:27:20 EST
Created attachment 331719 [details]
ip_dst_cache (in blue) vs. `ip route ls cache | wc -l` (in green)

Higher network traffic loads are present between 17:00 and 04:00
Comment 3 Hector Herrera 2009-02-18 15:04:33 EST
Confirmed leak is present is 2.6.18-131  (hence it's also present in 5.3)
Comment 4 Hector Herrera 2009-02-20 19:46:09 EST
Report sent to oCERT with details of remote DoS exploiting this issue.
Comment 5 Eugene Teo (Security Response) 2009-02-20 20:04:56 EST
(In reply to comment #4)
> Report sent to oCERT with details of remote DoS exploiting this issue.

Hector, is this bug considered public? I noticed the bz was created without the Security keyword. https://bugzilla.redhat.com/show_activity.cgi?id=485163

Can you please share with us the report for oCERT if it has additional information not already in this bz? Thanks.
Comment 7 Eugene Teo (Security Response) 2009-02-21 03:37:50 EST
(In reply to comment #6)
> Created an attachment (id=332786) [details]
> Report sent to oCERT with regards to this issue
> 
> Public disclosure to follow oCERT disclosure guidelines and only upon agreement
> with oCERT and reporter

Thanks Hector. The reason I asked is because this bug was first created as a public bug, and it was made private many days later. So, I am not sure why this bug should be kept private if it is already made public previously.
Comment 8 Eugene Teo (Security Response) 2009-02-21 03:40:19 EST
(In reply to comment #7)
> (In reply to comment #6)
> > Created an attachment (id=332786) [details] [details]
> > Report sent to oCERT with regards to this issue
> > 
> > Public disclosure to follow oCERT disclosure guidelines and only upon agreement
> > with oCERT and reporter
> 
> Thanks Hector. The reason I asked is because this bug was first created as a
> public bug, and it was made private many days later. So, I am not sure why this
> bug should be kept private if it is already made public previously.

Btw, have you tested the upstream kernel?
Comment 10 Hector Herrera 2009-02-21 14:02:40 EST
I changed it to a security bug because once I figured out how the bug reacted to network traffic and I was able to come up with a simple remote DoS attack, which in my mind exposed vulnerable servers to a high risk.

My initial intent was to report a bug that was present on some of the servers I manage.

I haven't tested the issue with any mainline kernels.  I tried looking for a howto/guide on how to compile mainline into RHEL or CentOS but I didn't find anything too useful.  If you can give me pointers I can try it on my virtual machines.

I realize that there are probably very few servers which are vulnerable, thus making this a minor issue.  But those servers would most likely be part of network infrastructure (routers for ISPs for example) and the effects of a attack on one of those would affect a larger user base.
Comment 13 Neil Horman 2009-02-24 13:08:47 EST
I'm starting to have second thoughts about the seriousness of this, as I look over the code and the analysis thereof.  In looking at this, you're adding a route of type RTN_UNREACHABLE to the fib, and then sending a ton of packets through the system, and observing that the number of dst cache entires on the slab is growing unboundedly while the actual number of dst entries in the route cache is remaining constant at or near its max value.  I agree on the surface that looks like a leak, but the path from creating a dst entry in ip_route_input_slow, at the local_input label for routes of type RTN_UNREACHABLE, to where they are hashed into place in rt_intern_hash is very short and consice, and I don't see any way we can leak a dst entry out of there.

That said, I started to think about the data above, and its missing a bit.  Depending on which field you looked at in /proc/slabinfo, that data could be perfectly valid.  if dst entires are freed back to their slab cache, the active number will go down for that cache, but the total number will stay until such time as the kernel shrinks the cache, which may not happen if there is sufficient memory in the system.

Some questions)
1) If you stop traffic on the system, and wait for a gc_cycle on the router, does it become possible to pass traffic again?

2) If the answer to one is yes, can this problem be avoided by 
    a) increasing /proc/sys/net/ipv4/route/max_size
    b) lowering /proc/sys/net/ipv4/route/gc_elasticity
    c) lowering /proc/sys/net/ipv4/route/gc_interval
    d) raising /proc/sys/net/ipv4/route/gc_thresh

That should instruct the garbage collector for the route cache to be much more agressive in its collection


Bear in mind, that I'm not suggesting that this isn't a bug, I'm just trying to get straight about weather this is a true leak, or if perhaps its simply a condition set in which the garbage collector isn't as agressive as it needs to be.  Its fairly clear in ip_route_input_slow, that when we go to allocate a dst, if dst_alloc fails (which it will if the route cache entries are greater than gc_thresh), we will not route the frame, so I think we likely just need to make sure that if dst_alloc is going to fail, we get much more agressive in our garbage collection.
Comment 14 Hector Herrera 2009-02-24 14:49:30 EST
The value of /proc/sys/net/ipv4/route/gc_interval is 60

I performed the following test:

(all times are in minutes

00: Sent packet storm at time
01: Ceased all network traffic
14: observed the count of active routes drop to zero
14: value of ip_dst_cache is 32767
14: confirmed server is still offline by sending 1 ping
19: rt_cache count is 2 (probably from ping at time 14)
19: value of ip_dst_cache is 32768
19: confirmed server is still offline by sending 1 ping
28: value of rt_cache is 0
28: value of ip_dst_cache is 32768
28: confirmed server is still offline by sending 1 ping

I'll also attach a graph of our live router values showing the history for the last few days.
Comment 15 Hector Herrera 2009-02-24 14:52:22 EST
Created attachment 333083 [details]
rt_cache values over an extended time period

The blips on Mon, Tue and Wed are reboots.  The blue line tracks the value of ip_dst_cache, the green area tracks the value of `ip -o route ls cache | wc -l`.

When I learned that the issue was caused by the "REJECT" route, I removed the route on Thursday.  Since then the value of ip_dst_cache has remained constant.
Comment 16 Neil Horman 2009-02-24 15:24:39 EST
What value are you looking at when you say that ip_dst_cache is 32767?  Is it the first column or the second?  I would not be suprised if the second column stays high, in fact it should until there is a good deal of memory pressure on the system and the cache needs to shrink.  If thats the first column on the other hand, then yes, that seems to be an issue, especially if the route cache entries goes to 2.  That would suggest that you do in fact have a leak somewhere.

What would be really helpful would be if you could catch a sumbmit a vmcore, so that I could look through the route cache by hand.  Also, when you sent those pings through the system, did you continue to get the dst cache overflow messages?  That would be of great interest.  If you didn't then you hit on an existing route in the table, and the frame should have been processed further, which in turn suggests this might not be a routing problem.  It would be good, if after you preformed the above test, you captured /proc/net/snmp.  That would give us a better idea of where you were dropping frames.

In summary:
1) details about how you are tracking ip_dst_cache size, so that we can confirm that active objects are or are not being reclaimed.
2) a vmcore if possible, so that I can look through the kernel memory image by hand, and get some idea of where these errant dst entries are living.
3) capture of /proc/net/snmp after the test in comment #14 is preformed, so that we can get a better idea of where these frames are getting dropped.

Also, how are you generating your various input frames.  Are you using a hw solution like an ixia or smartbits box, or are you doing it in software?  Just trying to get an idea of the volume and variation of traffic that I need to start working on a reproducer here.  Thanks.
Comment 17 Hector Herrera 2009-02-24 15:57:48 EST
If you look at the report I sent to oCERT (it's attached to this bug), you will find details on the setup I'm using to test as well as my test scripts.

The packet generator is the pktgen kernel module configured to send randomized src IPs to random destination IPs inside the "REJECT"'d route.

I'm looking at the first value of ip_dst_cache.

When I ping the server and it doesn't respond, I do see a 'dst cache overflow' message, precisely one console message for each packet received, no matter the source or destination IP.

I'll perform the test again and obtain a vmcore and dump of /proc/net/snmp and post it here once it's done.
Comment 18 Neil Horman 2009-02-24 16:27:27 EST
Ok, thanks, it'll take me a few days to get the systems together to set up a reproducer, but I think thats going to be the best way to figure out exactly whats going on here.
Comment 20 Neil Horman 2009-02-25 13:58:33 EST
Created attachment 333204 [details]
stap script to help diagnose potential router leak

hey there, while I'm getting the systems together to recreate this, would you mind running this systemtap script on your router there, and providing me with the output?  It would help me diagnose the problem I think.  Note it will fill up your console with messages and may reduce your performance a bit.  Thanks!
Comment 21 Hector Herrera 2009-02-25 21:45:03 EST
Created attachment 333266 [details]
Stdout of stap script

Executed the stap script as requested, split the stdout and stderr into two files.  stap.1.gz is stdout, stap.2.gz is stderr.
Comment 22 Hector Herrera 2009-02-25 21:45:55 EST
Created attachment 333267 [details]
Stderr of stap script

Executed the stap script as requested, split the stdout and stderr into two files.  stap.1.gz is stdout, stap.2.gz is stderr.
Comment 23 Neil Horman 2009-02-27 13:00:00 EST
So, I've got my reprdocution environment set up here, and I've got some bad news of sorts: its working just fine.  I'm using RHEL 5.3, with the -128.el5 kernel, and your script from your oCert submission.  I'm repeatedly sending 100000 packets over and over again (about every 10 seconds), and when I check the active objects in the ip_dst_cache against the number of cached routes, they always track fairly closely against both each other, and the number of packets I'm sending (as I would expect).  Then when I discontinue packet sends to the unreachable route, and let the router quiesce, both the slab cache and the route cache quickly shrink again, also as I would expect.  So it would seem that there is some more subtle nuance in your system that is triggering your system to have this issue.  Can you send me a sysreport of your system?  I'd like to compare your tunables to mine to see if anything is off between our setups.
Comment 24 Hector Herrera 2009-02-27 18:51:21 EST
Created attachment 333557 [details]
sosreport with kernel 2.6.18-128.el5

I got a hold of kernel-2.6.18-128.el5 and I can still reproduce the bug with it.
Comment 25 Neil Horman 2009-02-28 19:10:07 EST
Thank you, was this taken on the system after the problem had occured, or prior to it?  It looks like it was taken prior, but I'd like to be sure.  When the problem does happen, do you see any stats change in /proc/net/snmp, netstat, or ifconfig?
Comment 26 Hector Herrera 2009-02-28 22:55:30 EST
The sosreport was taken after the issue appeared.
Comment 27 Hector Herrera 2009-02-28 22:57:17 EST
Created attachment 333635 [details]
sosreport before issue (kernel 2.6.18-128)

sosreport immediately after a reboot.
Comment 28 Hector Herrera 2009-02-28 22:59:51 EST
Created attachment 333636 [details]
sosreport after issue (kernel 2.6.18-128)

sosreport taken after 'dst cache overflow' messages appear.  `ip route ls cache -l` has returned to zero and ip_dst_cache is 32767.  At the time this report was taken, any network packet received by the system produces a 'dst cache overflow' message on the console
Comment 29 Neil Horman 2009-03-02 09:26:23 EST
Note to self: found something intersting, the problem seems to hinge on the addition of a default route.  I've now managed to reproduce the problem, and I can only do it if I add a default route to the route cache (as per the oCert docs).  Previously I had forgotten to add a default route, and everything worked like a charm.  slabcache and route cache grew and shrank as you would expect.  But as soon as we added a default route, the cache filled up and overflowed.  The route cache shrinks again, and it appears the slabcache is shrinking as well, although I need to monitor it to see if it returns to its expected size in correlation with the route cache.

Its interesting to note that even after the route cache is back to a steady state size for my invironment (between 2 and 10 routes), I don't see dst cache overflow messages, but I do see Neighbour table overflow messages, indiating something has gone wrong with the arp table as well (perhaps the overuse of src macs with multiple IP sources via pktgen, I'm not sure).

I'm getting the feeling like we're not looking up the reject route properly, when we have a default gw route (its like the def. gw is at a higher priority or something).  I'll try a few tests and update again this afternoon.
Comment 30 Neil Horman 2009-03-02 10:48:46 EST
Note to self: Just tried to reproduce with 2.6.18-8.el5 and was unable to, so this problem was introduced sometime during one of the RHEL update cycles.  I'm going over the changelog now to see if anything stands out
Comment 31 Neil Horman 2009-03-02 15:55:46 EST
Created attachment 333786 [details]
patch to revert a previous leak fix

Hey, would you please build a kernel with this patch.  I'm going through our RHEL5 changelog and found this patch, I'm not sure, but I think its keeping my system up here (it reverts a fix for another leak, so you'll still seem some leaked entries, which is why I'm not sure of it), but I seem able to keep my system up with this patch here.  Its not a final fix, since it does revert a previous change that fixed a leak, but I'd like to confirm that it does something for you too.  Thanks!
Comment 32 Eugene Teo (Security Response) 2009-03-02 22:06:26 EST
(In reply to comment #31)
> Created an attachment (id=333786) [details]
> patch to revert a previous leak fix
> 
> Hey, would you please build a kernel with this patch.  I'm going through our
> RHEL5 changelog and found this patch, I'm not sure, but I think its keeping my
> system up here (it reverts a fix for another leak, so you'll still seem some
> leaked entries, which is why I'm not sure of it), but I seem able to keep my
> system up with this patch here.  Its not a final fix, since it does revert a
> previous change that fixed a leak, but I'd like to confirm that it does
> something for you too.  Thanks!

I'm building the kernel with this patch for Hector. I will post the rpm soon.
Comment 33 Eugene Teo (Security Response) 2009-03-03 00:22:55 EST
(In reply to comment #32)
> (In reply to comment #31)
> > Created an attachment (id=333786) [details] [details]
> > patch to revert a previous leak fix
> > 
> > Hey, would you please build a kernel with this patch.  I'm going through our
> > RHEL5 changelog and found this patch, I'm not sure, but I think its keeping my
> > system up here (it reverts a fix for another leak, so you'll still seem some
> > leaked entries, which is why I'm not sure of it), but I seem able to keep my
> > system up with this patch here.  Its not a final fix, since it does revert a
> > previous change that fixed a leak, but I'd like to confirm that it does
> > something for you too.  Thanks!
> 
> I'm building the kernel with this patch for Hector. I will post the rpm soon.

Hector, you can download them here: http://people.redhat.com/eteo/485163/
Comment 34 Hector Herrera 2009-03-03 03:26:59 EST
Thank you, I was still getting my build environment setup when I saw the posting by Eugene.

I downloaded the kernel-2.6.18-131.el5.bz485163.rpm (and kernel-devel too) and installed both.  Rebooted to the new kernel then I proceeded to followed my usual testing process.

at time 0, I flooded the server until I received dst cache overflow messages.  After a few seconds the values of rt_cache and ip_dst_cache settled at 32767.

after 5 minutes, both rt_cache and ip_dst_cache dropped to 32766

after another 8 minutes, rt_cache dropped to zero and ip_dst_cache remained at 32766.  A ping test at this time showed that the server is unreachable from the network.  Each ping packet caused one dst cache overflow message.

Would you like sosreports for this kernel test?
Comment 35 Neil Horman 2009-03-03 06:31:37 EST
No, thank you, its not going to show me anything.  I can't understand why my cache is leaking so much less than yours.

By the way, when I reproduce this, I occasionally get Neighbour table overflow indications rather than dst cache overflow notifications (which odd as it may seem, are both attributable to route cache overflows).  Do you see those, or are yours strictly dst cache overflow messages?
Comment 36 Hector Herrera 2009-03-03 13:09:34 EST
I only see 'dst cache overflow' messages.  I checked /var/log/messages but I found nothing else.
Comment 37 Neil Horman 2009-03-04 13:16:29 EST
ok, making slow progress here

I'm able to observe the leak with a single packet.  If I send in a single packet using the pktflood script from the ocert document, and then flush the cache via /proc/net/ipv4/route/flush, the route cache is cleaned but a single slab cache entry remains active.

I've instrumented the kernel and found that on flush, both the route cache entries that are added (one from the host to the unreaachable network via l0 and another from the local router interface to the sending system) are removed.  That tells me we're dealing with an out of whack ref count when entering rt_intern_hash.  Thats good progress.  I'll report more soon.
Comment 38 Eugene Teo (Security Response) 2009-03-05 09:14:14 EST
Update:
Neil is able to reproduce the leak, in single packet quantities using the setup described in the oCERT doc. It requires exactly what the setup says, an unreachable route and a default route.

It is definitely a remote DoS attack, as anyone with the described configuration can have their router ability to forward traffic terminated. It is possible to work around this issue until it is fixed, including the loopback route mentioned in the ticket, as well as an iptables solution.
Comment 39 Andrea Barisani 2009-03-06 04:11:14 EST
Hi everybody,

am I correct to assume that with a recent kernel this issue is not present? I'm trying to understand if it deserves some kernel maintainer ping as well (they would likely ignore it if affects 2.6.18 only which is quite old).

Other than that while the full impact discussion might not be public the bug itself is. Considering that there is a workaround I think this bug deserves to be opened (and if we feel necessary a preliminary advisory could be released as there is a workaround). Hoping of course that a patch would be available asap.

Thoughts?
Comment 41 Neil Horman 2009-03-06 14:22:49 EST
Almost certain I've found the problem.  Its fixed in upstream commit 7c0ecc4c4f8fd90988aab8a95297b9c0038b6160.  The problem is that we fail to dst_release the result of a route lookup when sending the icmp host unreachable response to the frames that is attempted to be directed along the unreachable route.  I'm building a test kernel for it now that we can all use to verify it.
Comment 42 Neil Horman 2009-03-06 20:05:39 EST
Created attachment 334377 [details]
patch to fix the dst_entry leak

I've confirmed it, the backport patch I'm attaching here solves the leak.  I've got an x86_64 build if you would like to play with it:
http://people.redhat.com/nhorman/rpms/kernel-2.6.18-133.el5.bz485163.x86_64.rpm
I'll post this for inclusion monday afternoon.  Please test the kernel and make sure if solves the problem for you as well.  If I don't hear from you by monday afternoon, I'll go ahead and post.
Comment 43 Eugene Teo (Security Response) 2009-03-09 00:16:20 EDT
(In reply to comment #42)
> Created an attachment (id=334377) [details]
> patch to fix the dst_entry leak
> 
> I've confirmed it, the backport patch I'm attaching here solves the leak.  I've
> got an x86_64 build if you would like to play with it:
> http://people.redhat.com/nhorman/rpms/kernel-2.6.18-133.el5.bz485163.x86_64.rpm
> I'll post this for inclusion monday afternoon.  Please test the kernel and make
> sure if solves the problem for you as well.  If I don't hear from you by monday
> afternoon, I'll go ahead and post.  

Or if you prefer the 32-bit rpms:
http://people.redhat.com/eteo/485163b/kernel-2.6.18-134.el5.bz485163b.i686.rpm

Please test the kernel to ensure that the issue is resolved.
Comment 48 Hector Herrera 2009-03-09 14:00:42 EDT
The issue does not appear in kernel-2.6.18-134.el5.bz485163b.i686.rpm  Thank you all.  Andrea, I'll leave it up to oCERT and the Red Hat security team to determine if and how this issue is to be disclosed.
Comment 49 Eugene Teo (Security Response) 2009-03-09 21:43:50 EDT
(In reply to comment #48)
> The issue does not appear in kernel-2.6.18-134.el5.bz485163b.i686.rpm  Thank
> you all.  Andrea, I'll leave it up to oCERT and the Red Hat security team to
> determine if and how this issue is to be disclosed.  

Thanks Hector!

Eugene
Comment 52 Andrea Barisani 2009-03-10 18:38:31 EDT
Thanks from oCERT too Hector.

Eugene, do you feel oCERT should release an advisory about this? Anyone else we need to reach out/investigate if affected?
Comment 53 Eugene Teo (Security Response) 2009-03-10 20:48:28 EDT
(In reply to comment #52)
> Thanks from oCERT too Hector.
> 
> Eugene, do you feel oCERT should release an advisory about this? Anyone else we
> need to reach out/investigate if affected?  

Andrea, if you want to, it's fine with us. But I will send out a note in oss-security@ anyway with or without the advisory. I am not sure who else is affected.
Comment 54 Eugene Teo (Security Response) 2009-03-19 00:12:25 EDT
CVSS2 score of high, 7.1 (AV:N/AC:M/Au:N/C:N/I:N/A:C)
Comment 55 errata-xmlrpc 2009-04-01 04:30:47 EDT
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2009:0326 https://rhn.redhat.com/errata/RHSA-2009-0326.html
Comment 58 errata-xmlrpc 2010-02-02 16:01:30 EST
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5.2 Z Stream

Via RHSA-2010:0079 https://rhn.redhat.com/errata/RHSA-2010-0079.html

Note You need to log in before you can comment on or make changes to this bug.