Bug 129726 - hidden error in DB_File::untie causes file descriptor leak
hidden error in DB_File::untie causes file descriptor leak
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: perl (Show other bugs)
rawhide
All Linux
medium Severity high
: ---
: ---
Assigned To: Warren Togami
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-08-12 03:08 EDT by Alexandre Oliva
Modified: 2007-11-30 17:10 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-08-22 01:00:51 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
spamd debugging output (32.78 KB, application/x-bzip2)
2004-08-15 16:17 EDT, Alexandre Oliva
no flags Details

  None (edit)
Description Alexandre Oliva 2004-08-12 03:08:50 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Gecko/20040809

Description of problem:
[root@livre ~]# ps 3617
  PID TTY      STAT   TIME COMMAND
 3617 ?        S      2:17 spamd child
[root@livre ~]# lsof -p 3617
[...]
spamd   3617 root   87u   REG      253,0 10821632 2852859
/l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks
spamd   3617 root   88u   REG      253,0 10821632 2852859
/l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks
spamd   3617 root   89u   REG      253,0 10821632 2852859
/l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks
spamd   3617 root   90u   REG      253,0 10821632 2852859
/l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks
spamd   3617 root   91u   REG      253,0 10821632 2852859
/l/aoliva/mail/Mail/.nobackup/.spamassassin/bayes_toks

Same for all other spamd processes, without any e-mail delivery having
taken place for the past several minutes.

I don't think it should be keeping this file open, especially not so
many times.

Version-Release number of selected component (if applicable):
spamassassin-3.0-4.pre4

How reproducible:
Always

Steps to Reproduce:
1.Feed a lot of email for yourself to spamc
2.Run lsof on the spamd processes

Actual Results:  Lots of descriptors pointing to bayes_tosk.

Expected Results:  None, unless mail is being delivered.

Additional info:

I suspect it might eventually exhaust the number of file descriptors
available and start failing to check for spam, especially because when
spamd fails spamc doesn't fall back to spamassassin as I'd hope.
Comment 1 Justin Mason 2004-08-12 12:50:03 EDT
does the number of open fds increase as more messages are scanned?
also, could you attach the output of spamd with the -D switch?
Comment 2 Alexandre Oliva 2004-08-15 14:23:13 EDT
It grows as more messages are scanned, yes.  After being left running
overnight, which probably amounts to 2-3k messages, I had 451 open
descriptors pointing to bayes_toks (surely excessive even if it was
just for caching; more than one per process is absolutely pointless IMHO).

I checked that the local mail queue as empty and restarted the
spamassassin service.  At this point, I had 0 bayes_toks file
descriptors open.  Then, I got fetchmail running again, and it brought
in 51 messages.  After they had all be delivered, I had 41 bayes_toks
opened among spamd processes.  I'll look into how to get spamd started
with the -D flag.
Comment 3 Justin Mason 2004-08-15 14:45:30 EDT
ok, that sounds bad.  also, could you check the versions of the
following packages:

libdb
libdb-devel
perl-DB_File
perl
Comment 4 Justin Mason 2004-08-15 14:47:50 EDT
btw, this probably exists as a bug upstream.  it might be better to
open an issue on http://bugzilla.spamassassin.org/ accordingly.
Comment 5 Theo Van Dinter 2004-08-15 14:53:10 EDT
I commented to this on 8-12, but my comment apparently never made it
into the ticket.  :(   What I wrote was:

This looks exactly like
http://bugzilla.spamassassin.org/show_bug.cgi?id=3326 and here's my
post explaining what I found:

http://bugzilla.spamassassin.org/show_bug.cgi?id=3326#c7

In short, there's apparently a bug in DB_File/libdb which causes
untie() to fail internally, not throwing an error and also not closing
the fd.  Doing a "db_upgrade" or "db_dump|db_load" to upgrade the file
to the latest DB version fixed the issue for me.
Comment 6 Justin Mason 2004-08-15 14:58:52 EDT
So running "db_verify bayes_toks" may be interesting, based on what
Theo had seen in bug 3326:

<felicity> db_verify: Page 3981: non-empty page in unused hash bucket 3333
<felicity> db_verify: Page 0: page 1273 encountered a second time on
free list
<felicity> db_verify: DB->verify: bayes_seen: DB_VERIFY_BAD: Database
verification failed
Comment 7 Alexandre Oliva 2004-08-15 16:17:42 EDT
Created attachment 102750 [details]
spamd debugging output

This is a log of the delivery of 15 e-mails, with spamd started with the
following arguments: -D -c -m1 -H

At the end, there were 9 file descriptors associated with my bayes_toks file.
Comment 8 Justin Mason 2004-08-15 16:42:33 EDT
looks a lot like what Theo found, then, since all the "untie-ing
db_toks" lines are there, valid, and do not indicate any errors from
DB_File.

could you try the db_verify operation?
Comment 9 Alexandre Oliva 2004-08-15 16:52:07 EDT
db_verify failed.  This may explain it.  I'm running sa-learn --import
to recreate the databases, then I'll restart spamd and see if it stops
leaking fds.  If so, this should probably get reassigned to perl-DBI.

# rpm -q perl perl-DBI db4 db4-devel
perl-5.8.5-2
perl-DBI-1.40-5
db4-4.2.52-5
db4-devel-4.2.52-5

Hmm...  sa-learn --reassign didn't create a database that passed
db_verify like I hoped.  But db_dump|db_load did, so I'm going with
that.  I suppose the database corruption may have been caused by
faulty memory/kernel/firewire controller/whatever that has plagued my
desktop box.  I'll use my notebook for the next few days and verify
that the database remains consistent; I wouldn't blame the database
package for the corruption for now.
Comment 10 Alexandre Oliva 2004-08-15 16:53:50 EDT
Err...   I meant sa-learn --import.  --reassign' was myself thinking
about reassigning the bug report :-)
Comment 11 Alexandre Oliva 2004-08-15 16:57:45 EDT
Looks like this fixed it.  I'm guessing as to whether perl-DBI is the
component to blame.  Please reassign if you know any better.  Thanks
to those who helped track it down.
Comment 12 Justin Mason 2004-08-15 18:38:13 EDT
actually, it's just the main perl package -- the DB_File module is
part of that now.  reassigning
Comment 13 Warren Togami 2005-05-28 02:30:13 EDT
Is this still an issue in FC3, RHEL4, or FC4?
Comment 14 Warren Togami 2005-08-22 01:00:51 EDT
No response in 3 months, assuming fixed.  REOPEN if this is still an issue with
FC3+ or RHEL4.

Note You need to log in before you can comment on or make changes to this bug.