From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: A firewall box I manage was dropping 90% of packets today. I managed to issue a reboot over ssh. It never came back up. I went onsite and found a panic screen (I'll attach the panic output). The only weird things in the logs are lots of "kernel: dst cache overflow" starting about 4 days ago and occurring a few times every 5 minutes or so. They seem to coincide with a watchdog script of mine (the cache error occurs 1 second after my watchdog starts nmap). I have a script that runs "nmap -S 192.168.100.1 -sP -PE 192.168.100.2 192.168.100.100-254" every few minutes to make sure that other (XP) machines on the LAN are reachable. I know that causes a lot of conntrack entries to be made. I'm not sure if this is related. This bug appears to be similar to bug 149427, bug 138040 and bug 64472. My kernel is not tainted with non-FC modules. However, the kernel I run has had one extra patch applied: the patch that fixes NAT over 2.6 native ipsec (bug 143374). Other than that it is a pure stock kernel. I can't easily "just upgrade to the latest kernel" as the patch doesn't apply to 2.6.11 yet. Version-Release number of selected component (if applicable): kernel-2.6.10-1.766_FC3 How reproducible: Didn't try Additional info:
Oh, I should also say that this box and its sister (other side of the ipsec VPN) have been working 100% fine with this kernel and setup for many months now. This is the first time this has happened.
- You didn't mention which kernel driver. - You should retest it with the latest FC3 kernel update, and try to replicate the conditions that caused the panic.
How do I tell which "kernel driver"? I have the panic text I handwrote on paper I will transcribe shortly, if that helps. I'll eventually get the latest FC3 kernel in once I can find the patch I need in a version that works on 2.6.11, or when the kernel guys get to putting the patch in the mainstream. My boxes cannot function without the patch as I *need* NAT through ipsec. The box (and many others that are almost identical) never once showed this behaviour before. I think it's a mem leak related thing since it seems to have occurred only once the uptime got very long and the box was all of a sudden put under relatively high network ipsec loads.
If it is hand written, then it is likely missing most of the beginning of the panic dump and wont be useful at all. It is unlikely that RH can help you with your problem because it is extremely rare, and this is not nearly enough information to possibly diagnose the problem. You should go to upstream kernel.org mailing lists and bugzilla for help.
If you are applying the huge NAT patch I think you are, you are asking a lot from us to debug this with that patch applied. Please tell us exactly what patch you have applied on top of the stock kernel tree. If it's huge and invasive, you're going to be on your own, sorry.
We have recently started to see this same problem here. Our firewall box is experiencing the same troubles described above, but we are running a stock kernel 2.6.10-1.770_FC3 (no patches of any kind). I googled this problem and found that it could be related to net.ipv4.route.max_size which on our firewall was set to 2048 by default. We are using Quagga/OSPF to do dynamic routing and at last check the slabinfo shows ip_dst_cache 825 825 256 15 1 : tunables 120 60 0 : slabdata 55 55 0 This is the peak that it has reached since I've been watching it, but it has grown from less than 400 when I first looked at it a few hours ago. I wasn't around the last time the firewall crashed so I couldn't verify this at the time, but I'm theorizing that we reached a point where ip_dst_cache reached 2048 and couldn't proceed. I've increased max_size to 32752 to hopefully prolong the life of the firewall, but if this is indicative of a leak of some sort, eventually we'll get to a point where it crashes again.
I have continued googling this problem and have found that there is likely a kernel bug that causes this problem. See the thread: http://lkml.org/lkml/2005/1/21/141 Towards the end of the thread they do actually post a patch: http://lkml.org/lkml/2005/1/30/87 Since the thread indicates that the problem is present in 2.6.11-rc1, I'm willing to bet it's also present in 2.6.10-1.770_FC3 but I haven't yet confirmed this. I have confirmed that the patch posted is present in the 2.6.11-1.27_FC3 kernel now available on fedora updates. I am about to apply this update to our firewall to find out if it solves our problem.
Re: comment #5: I'm not sure how to specify what patch I'm running. It's from the lartc/netfilter mailing lists and the patchfiles are around 900 lines over 4 patch files that modify a couple of dozen source files like netfilter.c, ip_forward.c, etc. AFAIK it's the only patch that enables NAT on native ipsec. I'm not sure if the problem is something do with the patch. I'd give it a 50/50 chance. I can't run without this patch as I need NAT over ipsec. I was (up until yesterday) under the impression that this patch was going to get put in the mainstream kernel, but new talk in bug 143374 indicates that this probably won't be the case. Re: comment #6: net.ipv4.route.max_size appears to be default based on your RAM size. This could be a good reason for why only one (of 3) sets of boxes I administer with this setup has had the issue so far: it's the one with only 256MB of RAM and a default max_size of 8k (the others are 16k). How do you check what the current value is? You mention "the slabinfo" but I can't figure out what you mean -- it doesn't appear to be an installed command. Let me know and I'll watch the values on the various boxes I administer (with and without the NAT patch) to see if it is growing over time.
Created attachment 115125 [details] text transcribed from the panic screen Hope this is somewhat helpful. I believe the top part had NOT scrolled off the screen yet so this should be the entire text of the panic. This panic occurred during the execution of the "reboot" command and occurred right after it started "unmounting file systems".
We are doing only static routing. My googling revealed the same links you found which surprised me since our routing tables really don't change much at all AFAIK except for the odd time interfaces go down, etc. If the latest FC3 update does fix the issue, let us know. Then I'll try to do the NAT patch on 2.6.11 and see if this doesn't happen again.
To check the current size of the ip_dst_cache, you can grep ip_dst_cache from /proc/slabinfo.. The current value is the first number, I'm not 100% sure what the second number is, but it appears to be the peak of the first value since the most recent flush (or something along those lines). My digging has also indicated that ip_dst_cache is not directly tied to the number of routes you see in a typical ip route list, but rather "ip route list cache" which includes the routes the system creates for each host it knows about. In other words, every packet that comes through the system has a source and destination.. rather than calculating the route for each new packet, the system caches the calculated route for each source/destination pair. Thus, if you have a lot of traffic going through your router from a lot of different hosts to a lot of different hosts, you can have a very large cache. By the way, since applying the updated kernel yesterday around 3pm, there have been no new crashes and the current ip_dst_cache is down around where it should be for 7:30 in the morning. Of course, now I'm getting "eth0: Too much work in interrupt, status 8401." in my syslog which may or may not be related.. I've got some more googling to do it appears.
I've added ip_dst_cache to my cricket monitoring and will report back here in a few days once I have some good visual idea of how that parameter behaves over time. I'll be able to directly compare boxes I administer that both have and don't have the ipsec nat patch.
Created attachment 115824 [details] cricket graph of ip_dst_cache value from slabinfo (The graph takes a snapshot value every 5 mins and so does not record transient spikes unless it is by chance.) I had a different (other pair) of ipsec machines go mental in what I believe was the same way. I don't think there was a panic this time though as it was rebooted fairly early once the symptoms started. However, this time I had my customized cricket grapher running and the attached shows the machine I think went mental first. The sharp drop off Wed morning is when the symptoms started and the blank area is when I rebooted; it took me a while to realize I had to recompile cricket's data files before it started graphing again. The dropoff on Sunday is after another reboot. It is obvious that this value seems to grow without bounds over time. However, it is hard to see how this relates to /proc/sys/net/ipv4/route/max_size because on this box that value is 16384 and this ip_dst_cache value never gets close to that. I will keep an eye on it to see how it behaves when it reaches the high values again. Since I really have no idea what these values represent I will leave it to the experts to interpret.
Created attachment 115826 [details] the other side of the ipsec tunnel I'm not positive that last graph's machine is the one that "went mental" first so I am here including the graph from the other machine in the 2-machine ipsec VPN. In fact, from what the end-user described, this may very well be the machine that went nuts first. The reboot times were within a few hours between these machines.
Trevor, if these are all connected to Shaw cablemodems and are subject to the regular DDoS / portscan traffic we all get :-( then ip_dst *will* in fact grow without bound until some sort of garbage collection happens. I don't know enough about how Linux does that to predict how/when GC happens. My first guess would be when max_size is reached... The DST stuff is very similar to Cisco Express Forwarding, most modern IP stacks have something along these lines nowadays. It can't be a pure ip_dst problem, otherwise thousands of linux-based hosts worldwide would be crashing on a regular basis. My only suggestion would be to work around the problem, by preventing the dst table from filling up in the first place. Limit as strictly as possible the number of IP src/dst pairs the routing code ever gets to see, possibly going as far as using ebtables as well as iptables, if the host doesn't expose any public services to the world.
Looking at net/ipv4/route.c, it occurs to me that rt_check_expire never actually calls rt_garbage_collect, which appears to be responsible for cleaning up ip_dst_cache... in fact, I can't see anywhere that GC occurs except in the middle of the in_ and out_ paths. So, one suggestion would be: instead of INCREASING the max_size parameter, DECREASE it to force more aggressive GC to happen, and see if that prevents overflows. I'm guessing there should be an inflection point (i.e. ip_dst_cache size should have a nonlinear response to net.ipv4.route.max_size somewhere around the actual *legitimate* working set size) but where that point is will be highly dependent on the specific traffic patters each system handles. The other piece of information that would be nice to capture in concert with the current slabinfo for ip_dst_cache would be a) the size of, and b) the contents of /proc/net/rt_cache. (There are also some related statistics in /proc/net/stat/rt_cache which should correlate perfectly with the size of ip_dst_cache, if I understand the routing code correctly.) Trevor: I'm assuming this is Dr. Nick's office you're talking about... the total set of valid internal IP addresses should be very small, on the order of 20 or so (?), I would try setting max_size to something ridiculously small like 64 or 32. Beware, however, that if it gets too small you'll probably lose connectivity entirely - you may need to experiment onsite, or at least make changes dynamically with a scheduled reboot pending in ~10min to recover. Also consider that you need a large enough rt_cache to hold all the "local" entries in addition to remote hosts. I would particularly pay attention to the "use" column of /proc/net/rt_cache - what sort of statistical distribution are you seeing? In a perfect world, you should have significant y-axis clustering somewhere near the mean/median of the data set with a handful of significant outliers. If you're seeing clustering near the median but NOT near the arithmetic mean of the Use column, your max_size parameter is too *big*, not too small. I suspect you'll find your nmap script is causing a large number of entries with low Use counts (0..10), whereas IP addresses corresponding to real workstations will see a secondary concentration of Use values that increase monotonically with firewall uptime. I agree that there seems to be a kernel bug of some sort involved, I'm focused more on characterizing the problem and understanding how to work around it than fixing it.
Created attachment 115921 [details] readings from the use column of /proc/net/rt_cache As per the "use" column of /proc/net/rt_cache, see the attached for some samples from some machines.
Re: Decreasing max_size: I can try that when next onsite and will report back. It may be hard to get a bead on what value to set it to to match the system since the linecount of /proc/net/rt_cache varies quite dramatically (1000+ to 20) over a very short period of time. I have programmed cricket to capture the line count of /proc/net/rt_cache and we'll see how that varies over time after a day or two. If you meant something other than the line count, let me know. I also am dumping the contents every 5 mins to files so if the bug symptom reoccurs we will have the contents near that exact time. Location: Yes, it's Dr.N's and other locations. Simple small network with a 2 office native ipsec VPN. About 15-22 machines at each location. The other pair that has had the bug symptoms is only 13/5 machines, but with half the RAM on the server (so smaller max_size), and the 13 one was the first to show the symptoms. Yes, the nmap script appears to cause a large number of entries with low counts. However, I have yet to see the symptoms at the 12+ locations I run the nmap scripts with no VPN. 2 of the 3 pairs of locations I run a VPN on have shown the symptoms, and the third is so low traffic that it's not a big surprise it hasn't shown up yet. The very strange thing is, the patched 2.6.10 kernel I am running ran fine since Mar 1 through till the first problem around May 25. Sure, I might have rebooted the systems once in a while, but it's a good bet they ran at least 30-60 days in some cases. Also, prior to that I was running a patched 2.6.9 from Dec '04, and that never showed a problem either. It could be coincidence, or it could be something introduced in 2.6.10 or even a non-kernel rpm update that triggered it. Do you think the bug could be dependent on the ipsec NAT patch I am using as some have suggested? You would think the gc would be independent of that.
Comment #7: jphillps, anything to report after running patched for nearly a month? Is there anything you were/are doing that is "weird" like in my setup (ipsec VPN, nmap, etc)? I'm looking for commonalities to explain why you and I see the bug but most of the linux population does not. If we can find a common "weirdness" then maybe we can have a better undestanding.
Comment #15: One pair of machines was on cablemodem, the other on DSL PPPoE, both see the usual noise and both have strict iptables blocking it all. The servers do expose smtp + http to the world. What if the ip_dst_cache is a red herring in the sense that it's the only thing that's complaining but has not much to do with the root cause? As you said, if it was a big deal, everyone's system would be flaking after a month or two.
Since patching, we have not had a single case of the network failing on the router. I would have to say that the patch did succeed in doing what it needed to do. At least for us. Keep in mind though, this is an internal router with nothing fancy going on. The only non-out-of-the-box thing we are doing is that we are using Quagga/OSPF to keep our routing tables updated with our various gateway servers (we have a large number of tunnels between us and our customers and use OSPF to keep track of all the routes).
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
I've now been running 3 production pairs of systems for 1.5, 1 and .5 months using 2.6.12-1.1376_FC3 with ipsec/nat patches, but otherwise a straight up FC3 kernel. I have had zero issues or crashes with regards to this bug. I think this bug is fixed, though I would probably give it another 2-3 months before declaring absolute victory considering that previously it took many months for the bug to show. Interestingly, the slabinfo/ip_dst_cache data I'm tracking/graphing with cricket show a completely different pattern compared to the 2.6.10 kernel I was using before. If the bug was due to ip_dst_cache then this could be significant.
Created attachment 118959 [details] cricket graph of ip_dst_cache using 1376 kernel Here's the new graph with the new kernel. This is from the same host as and makes a good comparison with attachment #115824 [details]. The main difference that I think is important is the fact that after the peak, the troughs in the new kernel all return to a low value. In 2.6.10 the troughs would slowly creep up higher and higher (and so would the peaks).
This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you.
I have not seen this bug for many months now (I think... but we have some new bugs now). If Jim agrees then I think this should be marked as closed and fixed.
I haven't seen this bug in quite awhile either. I think it's probably safe to call it fixed.