spamassassin occassionally takes several minutes of CPU time to process a message: Jun 27 09:06:40 localhost spamd[29375]: checking message <2D11BC10-C74A-11D8-BE07-000A95C4B3A0.ac.uk> for ralph:500. Jun 27 09:09:23 localhost spamd[29375]: clean message (-4.9/5.0) for ralph:500 in 163.4 seconds, 6070 bytes. Jun 27 09:09:23 localhost spamd[29375]: result: . -4 - BAYES_00,SUBJ_ALL_CAPS scantime=163.4,size=6070,mid=<2D11BC10-C74A-11D8-BE07-000A95C4B3A0.ac.uk>,bayes=0,autolearn=no I verified using top etc that spamassassin was taking 100% CPU. The message is a perfectly innocuous plain text conference announcement from a mailing list. I have seen this on a number of occassions. 933 MHz P3 - not the latest and greatest, but not entirely obsolete. Downloading email via pop3 and evolution. As far as I can see, this is potentially exploitable by an attacker who wishes to make spamassassin useless. 10 messages a day at a few minutes each would be a major pain in the butt. A few hundred messages a day at a few minutes CPU each would make it physically impossible to use spamassassin. I have the message and a copy of my .spamassassin directory if required. PS. Even "normal" spamassassin performance - a few seconds per message - and a shitload of memory - is worth a grumble.
Please attach the message to this report.
Created attachment 101448 [details] Email message that caused the problem
I'd like to see the output from "spamassassin -D -t < temp.txt"; it completes in 5 seconds (with network tests from a "cold" dns cache) for me....
Just tried "time spamassassin -t < temp.txt": real 0m19.323s user 0m3.347s sys 0m0.255s DNS look-ups took a while but CPU consumption is OK. So it looks like something other than the message contents triggered the CPU usage. Is there any spamassassin logging I can turn on permanently to try and track this down?
yes, the "-D" switch turns on debugs. they're voluminous but will track down the problem, most likely. that would definitely be worthwhile. BTW, I think it may have been a Bayes expiration run; periodically, it'll expire unused tokens from the Bayes dbs to keep down db size. this should happen pretty infrequently, but somewhere between once a day and once a week I'd guess. that can take a minute or two to complete.
Ok, the expiration run explains the behaviour I'm seeing.