From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040809 Description of problem: [root@livre ~]# ps 3617 PID TTY STAT TIME COMMAND 3617 ? S 2:17 spamd child [root@livre ~]# lsof -p 3617 [...] spamd 3617 root 87u REG 253,0 10821632 2852859 /l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks spamd 3617 root 88u REG 253,0 10821632 2852859 /l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks spamd 3617 root 89u REG 253,0 10821632 2852859 /l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks spamd 3617 root 90u REG 253,0 10821632 2852859 /l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks spamd 3617 root 91u REG 253,0 10821632 2852859 /l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks Same for all other spamd processes, without any e-mail delivery having taken place for the past several minutes. I don't think it should be keeping this file open, especially not so many times. Version-Release number of selected component (if applicable): spamassassin-3.0-4.pre4 How reproducible: Always Steps to Reproduce: 1.Feed a lot of email for yourself to spamc 2.Run lsof on the spamd processes Actual Results: Lots of descriptors pointing to bayes_tosk. Expected Results: None, unless mail is being delivered. Additional info: I suspect it might eventually exhaust the number of file descriptors available and start failing to check for spam, especially because when spamd fails spamc doesn't fall back to spamassassin as I'd hope.
does the number of open fds increase as more messages are scanned? also, could you attach the output of spamd with the -D switch?
It grows as more messages are scanned, yes. After being left running overnight, which probably amounts to 2-3k messages, I had 451 open descriptors pointing to bayes_toks (surely excessive even if it was just for caching; more than one per process is absolutely pointless IMHO). I checked that the local mail queue as empty and restarted the spamassassin service. At this point, I had 0 bayes_toks file descriptors open. Then, I got fetchmail running again, and it brought in 51 messages. After they had all be delivered, I had 41 bayes_toks opened among spamd processes. I'll look into how to get spamd started with the -D flag.
ok, that sounds bad. also, could you check the versions of the following packages: libdb libdb-devel perl-DB_File perl
btw, this probably exists as a bug upstream. it might be better to open an issue on http://bugzilla.spamassassin.org/ accordingly.
I commented to this on 8-12, but my comment apparently never made it into the ticket. :( What I wrote was: This looks exactly like http://bugzilla.spamassassin.org/show_bug.cgi?id=3326 and here's my post explaining what I found: http://bugzilla.spamassassin.org/show_bug.cgi?id=3326#c7 In short, there's apparently a bug in DB_File/libdb which causes untie() to fail internally, not throwing an error and also not closing the fd. Doing a "db_upgrade" or "db_dump|db_load" to upgrade the file to the latest DB version fixed the issue for me.
So running "db_verify bayes_toks" may be interesting, based on what Theo had seen in bug 3326: <felicity> db_verify: Page 3981: non-empty page in unused hash bucket 3333 <felicity> db_verify: Page 0: page 1273 encountered a second time on free list <felicity> db_verify: DB->verify: bayes_seen: DB_VERIFY_BAD: Database verification failed
Created attachment 102750 [details] spamd debugging output This is a log of the delivery of 15 e-mails, with spamd started with the following arguments: -D -c -m1 -H At the end, there were 9 file descriptors associated with my bayes_toks file.
looks a lot like what Theo found, then, since all the "untie-ing db_toks" lines are there, valid, and do not indicate any errors from DB_File. could you try the db_verify operation?
db_verify failed. This may explain it. I'm running sa-learn --import to recreate the databases, then I'll restart spamd and see if it stops leaking fds. If so, this should probably get reassigned to perl-DBI. # rpm -q perl perl-DBI db4 db4-devel perl-5.8.5-2 perl-DBI-1.40-5 db4-4.2.52-5 db4-devel-4.2.52-5 Hmm... sa-learn --reassign didn't create a database that passed db_verify like I hoped. But db_dump|db_load did, so I'm going with that. I suppose the database corruption may have been caused by faulty memory/kernel/firewire controller/whatever that has plagued my desktop box. I'll use my notebook for the next few days and verify that the database remains consistent; I wouldn't blame the database package for the corruption for now.
Err... I meant sa-learn --import. --reassign' was myself thinking about reassigning the bug report :-)
Looks like this fixed it. I'm guessing as to whether perl-DBI is the component to blame. Please reassign if you know any better. Thanks to those who helped track it down.
actually, it's just the main perl package -- the DB_File module is part of that now. reassigning
Is this still an issue in FC3, RHEL4, or FC4?
No response in 3 months, assuming fixed. REOPEN if this is still an issue with FC3+ or RHEL4.