From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0 Description of problem: I upgraded a 256M mail server from FC1 to FC3. Spamassasin was working fine before, but now I see it causing swap storms. There are several spamd processes running each taking up about 80M virtual, 50m-60m resident, with only 6M shared. Swap is at 285 of 512 (this is only a very low volume mail server and nothing else. No gui. So thats a lot of memory.) The machine is barely useable enough to bring up vmstat to see that about 750k/sec is being swapped in and swapped out. Spammasssin, and all other packages, are up to date. May be related to: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=139491 Version-Release number of selected component (if applicable): spamassassin-3.0.0-3 How reproducible: Sometimes Steps to Reproduce: 1.service spamasaassin start 2.Wait several hours for symptoms to occur. Actual Results: Swap storms. Expected Results: No swap storms. Additional info:
Already pushed 3.0.1 to FC3 update.
I updated to 3.0.1 and restarted spamassassin. Within minutes the server was on its knees with 255M of swap used. I can't even ssh into it now, until the OOM killer decides to act.
Ok, then this is an upstream issue. I am not able to reproduce this kind of behavior myself. Go to spamassassin.org's bugzilla.
OK. I'll report it upstream. Were you testing on a ~256M system? And was there real mail coming through. I think it may take real (spam) activity to trigger the problem. A workaround seems to be to add "-m1" or "-m1 --max-conn-per-child=1" options in /etc/sysconfig/spamassassin to allow only one child process, and optionally to kill the child and refork after every connection. It then behaves very nicely.
1GB RAM 40 user system, roughly 2 incoming mail per second processed by spamd.
Could you please look at this upstream bug to see if it appears to be the same thing: http://bugzilla.spamassassin.org/show_bug.cgi?id=3981 And also look at the the patch that is attached to http://bugzilla.spamassassin.org/show_bug.cgi?id=3983 to see if it helps?
The output of "top" when the system is fine and not using any swap at all, and when it is swapping like hell and unusable, is pretty much the same. 6 processes (1 parent and 5 children) are all showing about 80M/process virt, 66M/process res. (Large, but not *that* large...) When the system starts swapping excessively, the numbers don't change much at all. This does sound like the same problem.
What kernel?
kernel-2.6.9-1.681_FC3
> This does sound like the same problem Yes, best to take the discussion over there, then, and do try out the patch I mentioned.
yep, please do; the patch on bug 3983 is almost definitely what you need.
Yes, I'll be trying that out, although I am realizing that on a low volume/low memory server like this one it is really most efficient just to set --max-children=1 and --max-conn-per-child=1. As the minimum system requirements for FC3 are about what I have, and spamd seems to be such a memory hog even when it is working "right" perhaps the init script should tune the children based on system memory at runtime? One thing that puzzles me, though, is that spamassassin 2.x seemed to use about 20M virt memory/process. The thread on the upstream bug mentions 20m virt for 3.0. What I am seeing is 80M+ virt and 66M+ res. If I don't set --max-conn-per-child=1, the numbers for the child start out there and then climb with each successive connection. My other FC3 machine, which was a fresh install with 1GB RAM, shows similar (80m) numbers. Why do my particular installations have so much more memory mapped?
are you using third-party rulesets? some of those greatly increase memory consumption in the children, it seems. try without *any* third party rulesets and see what the RAM usage is like. in 3.0.x, we added preforking -- which (as discussed in the upstream bug) means that large memory consumption of each child becomes a big deal. in 2.6x it wasn't so serious, but with 3.0.x, you now have N children *always* running -- so thrashing starts a lot earlier. btw also note the upstream comments about "top" output being incorrect regarding how much of the memory is shared.
I seem to have the same problem at the office. I just tried the same setup at home, and the problem doesn't exist here. Since one of the most significant differences between the two setups is the spam filter, which only runs at the office, your suggestion makes sense. I will continue checking this tomorrow at the office. In the mean time, I take issue with your suggestion to do avoid 3rd party rulesets. I use a third party ruleset from ... the kmail binary shipped with fc3. Could that be the culprit? If so, could you look into the kmail-generated rulesets?