Bug 583523

Summary: spamass-milter stops working, causing mail rejects
Product: [Fedora] Fedora Reporter: Solomon Peachy <pizza>
Component: spamass-milterAssignee: Paul Howarth <paul>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: low    
Version: 11CC: paul
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: spamass-milter-0.3.1-19.fc11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-28 01:14:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Solomon Peachy 2010-04-19 02:17:16 UTC
Description of problem:

After running for a while, usually less than a day, spamass-milter stops working, emitting errors like this in the logs:

 Apr 18 21:55:13 stuffed spamass-milter[24192]: Thrown error: fork error: Resource temporarily unavailable
 Apr 18 21:55:13 stuffed sendmail[20261]: o3J1tDGJ020261: Milter: data, reject=451 4.3.2 Please try again later
 Apr 18 21:55:13 stuffed sendmail[20261]: o3J1tDGJ020261: to=<XXXXX>, delay=00:00:00, pri=34309, stat=Please try again later

or alternatively:

 Apr 18 21:55:48 stuffed spamass-milter[24192]: Thrown error: fork error: Resource temporarily unavailable
 Apr 18 21:55:48 stuffed sendmail[20274]: o3J1tgmg020274: Milter: data, reject=451 4.3.2 Please try again later
 Apr 18 21:55:48 stuffed sendmail[20274]: o3J1tgmg020274: to=<XXXXX>, delay=00:00:03, pri=31734, stat=Please try again later

One moment it's accepting mail and working, and the next, it's failing.  Nothing in the logs, either maillog or syslog, that indicates what's going on.

Here's one such transition:

 Apr 18 13:25:39 stuffed sendmail[2016]: o3IHPajU002016: Milter add: header: X-Spam-Level: *
 Apr 18 13:25:39 stuffed sendmail[2016]: o3IHPajU002016: Milter add: header: X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on stuffed.shaftnet.org
 Apr 18 13:25:39 stuffed sendmail[2031]: o3IHPajU002016: to=<XXXXX>, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=32637, dsn=2.0.0, stat=Sent
 Apr 18 13:25:40 stuffed spamd[17423]: prefork: child states: II
 Apr 18 13:26:16 stuffed sendmail[2046]: NOQUEUE: SYSERR(sa-milt): can not chdir(/var/spool/mqueue/): Permission denied
 Apr 18 13:26:17 stuffed sendmail[2043]: o3IHQEGs002043: from=<notification+z29f9jjc>, size=2160, class=0, nrcpts=1, msgid=<fe2120e5daee098f56586da6f49d7e9f.com>, proto=ESMTP, daemon=MTA, relay=outmail011.snc1.tfbnw.net [69.63.178.170]
 Apr 18 13:26:17 stuffed sendmail[2043]: o3IHQEGs002043: Milter add: header: X-Virus-Scanned: clamav-milter 0.95.3 at stuffed
 Apr 18 13:26:17 stuffed sendmail[2043]: o3IHQEGs002043: Milter add: header: X-Virus-Status: Clean
 Apr 18 13:26:17 stuffed spamass-milter[24192]: Thrown error: fork error: Resource temporarily unavailable
 Apr 18 13:26:17 stuffed sendmail[2043]: o3IHQEGs002043: Milter: data, reject=451 4.3.2 Please try again later
 Apr 18 13:26:17 stuffed sendmail[2043]: o3IHQEGs002043: to=<XXXX>, delay=00:00:01, pri=32160, stat=Please try again later



To fix it, restart spamass-milter (service restart) and it's good to go for another day, give or take.

It used to be okay -- when I rebooted the server about a week ago to install a new kernel and fsck the drives, this problem started, but I suspect that was simply because spamass-milter (or spamassassin or sendmail or something) was updated but not restarted until then.

Version-Release number of selected component (if applicable):

Fedora 11 x86_64
Linux stuffed 2.6.32.10-44.fc11.x86_64 #1 SMP Tue Mar 16 23:06:49 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

sendmail-8.14.3-5.fc11.x86_64
spamass-milter-0.3.1-18.fc11.x86_64
spamassassin-3.3.1-2.fc11.x86_64

SELinux is OFF.

How reproducible:

I can't recreate it on demand; instead it fails after a period of time, roughly 24 hours -- I believe it depends on incoming mail volumen.

Additional info:

/etc/sysconfig/spamass-milter contains this:

EXTRA_FLAGS="-I -u sa-milt -x -r 9"

and /etc/mail/sendmail.mc includes:

INPUT_MAIL_FILTER(`clmilter',`S=local:/var/run/clamav-milter/clamav-milter.socket,F=,T=S:4m;R:4m')dnl

INPUT_MAIL_FILTER(`spamassassin', `S=unix:/var/run/spamass-milter/spamass-milter.sock, F=, T=C:15m;S:4m;R:4m;E:10m')dnl
define(`confMILTER_MACROS_ENVRCPT',`r, v, Z')dnl 
define(`confMILTER_MACROS_CONNECT',`t, b, j, _, {daemon_name}, {if_name}, {if_addr}')dnl
define(`confMILTER_MACROS_HELO',`s, {tls_version}, {cipher}, {cipher_bits}, {cert_subject}, {cert_issuer}')dnl

Again, things were working fine at one point, until the last system reboot.  I've added an hourly spamass-milter restart to the crontab as a workaround in the mean time, but this is a serious regression.

Comment 1 Solomon Peachy 2010-04-19 03:52:28 UTC
I also see these:

Apr 18 21:37:56 stuffed spamass-milter[24192]: SpamAssassin: thread_create() failed: 11, try again

Comment 2 Paul Howarth 2010-04-19 09:38:03 UTC
This regression was probably introduced with the fix for CVE-2010-1132, in which calls to popen() to support the -b and -x options was replaced by a new, local function using fork() and exec(). I'll take a look and see if I can spot what's wrong with the patch.

In the meantime, a few comments:

1. Does the failure happen when you get a burst of emails at around the same time?

2. Once it's broken, does it stay broken - the error messages appear to suggest a resource limit being hit that could go away if tried again?

3. The -x option does not work unless you run the milter as root, which wasn't possible using the supplied initscript until the last update - do you really need this option? You may find that the problem doesn't happen if you don't use that option.

4. You should add the `b' macro to confMILTER_MACROS_ENVRCPT (see the spamass-milter README).

Comment 3 Paul Howarth 2010-04-19 09:43:17 UTC
The milter appears to be creating zombie (defunct) processes and eventually running out of threads I believe. You can see these if you do:

# ps -dfa

I'll see if I can figure out what's going wrong.

Comment 4 Solomon Peachy 2010-04-19 12:36:09 UTC
(In reply to comment #2)
> 1. Does the failure happen when you get a burst of emails at around the same
> time?

I can't say; I haven't tried to correlate the mail rate vs when it fails, but after I restart the milter a whole bunch of backlogged mail comes through pretty quickly.

> 2. Once it's broken, does it stay broken - the error messages appear to suggest
> a resource limit being hit that could go away if tried again?

It stays broken until I restart it -- eventually sendmail gives up on the milter and bypasses it altogether.

> 3. The -x option does not work unless you run the milter as root, which wasn't
> possible using the supplied initscript until the last update - do you really
> need this option? You may find that the problem doesn't happen if you don't use
> that option.

If it wasn't working properly before, that may be why things started failing now..  I thought I saw some benefit from that option, but if it wasn't working before I may have been mistaken about its "benefit"

> 4. You should add the `b' macro to confMILTER_MACROS_ENVRCPT (see the
> spamass-milter README).    

Okay, added -- for what it's worth I cut-n-paste the existing macro line from the spamass-milter documentation once upon a time.

Comment 5 Paul Howarth 2010-04-19 12:39:35 UTC
The same problem has been reported in Debian; the suggested fix there works for me too so I'll update the Fedora package similarly.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=575019

Comment 6 Solomon Peachy 2010-04-19 12:43:12 UTC
(In reply to comment #3)
> The milter appears to be creating zombie (defunct) processes and eventually
> running out of threads I believe. You can see these if you do:
> 
> # ps -dfa
> 
> I'll see if I can figure out what's going wrong.    

Ah, you're correct:

sa-milt  16810  0.0  0.0      0     0 ?        Z    08:32   0:00 [sendmail] <defunct>
sa-milt  16857  0.0  0.0      0     0 ?        Z    08:33   0:00 [sendmail] <defunct>

since the last restart of the milter about 30 minutes ago, I have 26 of these zombies.

I've nuked my auto-restart cron job and reduced the EXTRA_FLAGS to '-I -r 9' and I'll see how this goes...

Comment 7 Fedora Update System 2010-04-19 13:37:51 UTC
spamass-milter-0.3.1-19.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/spamass-milter-0.3.1-19.fc13

Comment 8 Fedora Update System 2010-04-19 13:39:06 UTC
spamass-milter-0.3.1-19.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/spamass-milter-0.3.1-19.fc12

Comment 9 Fedora Update System 2010-04-19 13:40:10 UTC
spamass-milter-0.3.1-19.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/spamass-milter-0.3.1-19.fc11

Comment 10 Fedora Update System 2010-04-19 13:40:57 UTC
spamass-milter-0.3.1-19.el5 has been submitted as an update for Fedora EPEL 5.
http://admin.fedoraproject.org/updates/spamass-milter-0.3.1-19.el5

Comment 11 Fedora Update System 2010-04-19 13:41:35 UTC
spamass-milter-0.3.1-19.el4 has been submitted as an update for Fedora EPEL 4.
http://admin.fedoraproject.org/updates/spamass-milter-0.3.1-19.el4

Comment 12 Fedora Update System 2010-04-20 13:33:23 UTC
spamass-milter-0.3.1-19.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update spamass-milter'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/spamass-milter-0.3.1-19.fc11

Comment 13 Fedora Update System 2010-04-28 01:14:04 UTC
spamass-milter-0.3.1-19.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 14 Fedora Update System 2010-04-28 01:14:27 UTC
spamass-milter-0.3.1-19.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2010-04-28 01:19:43 UTC
spamass-milter-0.3.1-19.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 16 Bug Zapper 2010-04-28 12:02:37 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Fedora Update System 2010-05-06 00:52:04 UTC
spamass-milter-0.3.1-19.el5 has been pushed to the Fedora EPEL 5 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2010-05-06 00:52:58 UTC
spamass-milter-0.3.1-19.el4 has been pushed to the Fedora EPEL 4 stable repository.  If problems still persist, please make note of it in this bug report.