Red Hat Bugzilla – Bug 129726
hidden error in DB_File::untie causes file descriptor leak
Last modified: 2007-11-30 17:10:47 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Description of problem:
[root@livre ~]# ps 3617
PID TTY STAT TIME COMMAND
3617 ? S 2:17 spamd child
[root@livre ~]# lsof -p 3617
spamd 3617 root 87u REG 253,0 10821632 2852859
spamd 3617 root 88u REG 253,0 10821632 2852859
spamd 3617 root 89u REG 253,0 10821632 2852859
spamd 3617 root 90u REG 253,0 10821632 2852859
spamd 3617 root 91u REG 253,0 10821632 2852859
Same for all other spamd processes, without any e-mail delivery having
taken place for the past several minutes.
I don't think it should be keeping this file open, especially not so
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Feed a lot of email for yourself to spamc
2.Run lsof on the spamd processes
Actual Results: Lots of descriptors pointing to bayes_tosk.
Expected Results: None, unless mail is being delivered.
I suspect it might eventually exhaust the number of file descriptors
available and start failing to check for spam, especially because when
spamd fails spamc doesn't fall back to spamassassin as I'd hope.
does the number of open fds increase as more messages are scanned?
also, could you attach the output of spamd with the -D switch?
It grows as more messages are scanned, yes. After being left running
overnight, which probably amounts to 2-3k messages, I had 451 open
descriptors pointing to bayes_toks (surely excessive even if it was
just for caching; more than one per process is absolutely pointless IMHO).
I checked that the local mail queue as empty and restarted the
spamassassin service. At this point, I had 0 bayes_toks file
descriptors open. Then, I got fetchmail running again, and it brought
in 51 messages. After they had all be delivered, I had 41 bayes_toks
opened among spamd processes. I'll look into how to get spamd started
with the -D flag.
ok, that sounds bad. also, could you check the versions of the
btw, this probably exists as a bug upstream. it might be better to
open an issue on http://bugzilla.spamassassin.org/ accordingly.
I commented to this on 8-12, but my comment apparently never made it
into the ticket. :( What I wrote was:
This looks exactly like
http://bugzilla.spamassassin.org/show_bug.cgi?id=3326 and here's my
post explaining what I found:
In short, there's apparently a bug in DB_File/libdb which causes
untie() to fail internally, not throwing an error and also not closing
the fd. Doing a "db_upgrade" or "db_dump|db_load" to upgrade the file
to the latest DB version fixed the issue for me.
So running "db_verify bayes_toks" may be interesting, based on what
Theo had seen in bug 3326:
<felicity> db_verify: Page 3981: non-empty page in unused hash bucket 3333
<felicity> db_verify: Page 0: page 1273 encountered a second time on
<felicity> db_verify: DB->verify: bayes_seen: DB_VERIFY_BAD: Database
Created attachment 102750 [details]
spamd debugging output
This is a log of the delivery of 15 e-mails, with spamd started with the
following arguments: -D -c -m1 -H
At the end, there were 9 file descriptors associated with my bayes_toks file.
looks a lot like what Theo found, then, since all the "untie-ing
db_toks" lines are there, valid, and do not indicate any errors from
could you try the db_verify operation?
db_verify failed. This may explain it. I'm running sa-learn --import
to recreate the databases, then I'll restart spamd and see if it stops
leaking fds. If so, this should probably get reassigned to perl-DBI.
# rpm -q perl perl-DBI db4 db4-devel
Hmm... sa-learn --reassign didn't create a database that passed
db_verify like I hoped. But db_dump|db_load did, so I'm going with
that. I suppose the database corruption may have been caused by
faulty memory/kernel/firewire controller/whatever that has plagued my
desktop box. I'll use my notebook for the next few days and verify
that the database remains consistent; I wouldn't blame the database
package for the corruption for now.
Err... I meant sa-learn --import. --reassign' was myself thinking
about reassigning the bug report :-)
Looks like this fixed it. I'm guessing as to whether perl-DBI is the
component to blame. Please reassign if you know any better. Thanks
to those who helped track it down.
actually, it's just the main perl package -- the DB_File module is
part of that now. reassigning
Is this still an issue in FC3, RHEL4, or FC4?
No response in 3 months, assuming fixed. REOPEN if this is still an issue with
FC3+ or RHEL4.