If you feed the attached mail into spamd for spamassassin-2.44-11.8.x you'll get errors: Mar 12 11:54:59 poincare spamd[32111]: connection from localhost.localdomain [127.0.0.1] at port 51842 Mar 12 11:55:00 poincare spamd[1513]: info: setuid to otaylor succeeded Mar 12 11:55:00 poincare spamd[1513]: processing message <1047474641.3234.18.camel@thor> for otaylor:2181, expecting 5768 bytes. Mar 12 11:55:00 poincare spamd[1513]: bad protocol: header error: (Content-length mismatch: 5768 vs. 5764) This is because 'spamd' is doing (essentially) while (<IN>) { $len += length; } To get the length of the message that spamc sends it, but spamc's Content-Length: header is in bytes. If you add 'use bytes' to spamd, then that gets fixed, but, you then get: Mar 12 12:10:17 poincare spamd[1549]: connection from localhost.localdomain [127.0.0.1] at port 52038 Mar 12 12:10:17 poincare spamd[1718]: info: setuid to otaylor succeeded Mar 12 12:10:18 poincare spamd[1718]: processing message <1047474641.3234.18.camel@thor> for otaylor:2181, expecting 5768 bytes. Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected non-continuation byte 0x6e, immediately after start byte 0xe4) in transliteration (tr///) at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1786, <STDIN> line 140. Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected non-continuation byte 0x6e, immediately after start byte 0xe4) in transliteration (tr///) at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1786, <STDIN> line 140. Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected non-continuation byte 0x6e, immediately after start byte 0xe4) in transliteration (tr///) at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1787, <STDIN> line 140. Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected non-continuation byte 0x6e, immediately after start byte 0xe4) in transliteration (tr///) at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1787, <STDIN> line 140. Mar 12 12:10:18 poincare spamd[1718]: clean message (-0.9/5.0) for otaylor:2181 in 0.4 seconds, 5768 bytes. Which is less harmful, but should be fixed too. It may be possible that starting up spamd with a non-default system encoding (LANG=C?) will fix these problems. I don't have a very good understanding of Perl's encoding handling.
Created attachment 90574 [details] Message with non-ASCII causing spamd to error
can you test this with spamassassin-2.50-2.8.x? it fixes a number of utf8/non-ascii issues
I tried this with 2.53 (original tar.gz, not RPM package) and the length is now calculated correctly. Spamassassin 2.44 calculates length using: for (<STDIN>) { .... $actual_length += length; } This returns a number of characters rather than number of bytes. spamc however sends the nubmer of bytes. The mismatch between the two numbers (which is normal if LANG is set to utf8 AND the email actually contains nonascii characters) causes spamd to simply return 1;: if($actual_length != $expected_length) { protocol_error ("(Content-length mismatch: $expected_length vs. $actual_length)"); return 1; } Spamassassin 2.53 reads from a socket rather than STDIN: $server = new IO::Socket::INET(...); ... $client = $server->accept; ... while ($_ = $client->getline()) { .... $actual_length += length; } which returns the correct number of bytes regardless of what is in LANG. The only reason why I went this far in investigating this problem is that I ACTUALLY LOST ABOUT A DOZEN OF MAILS since I moved to Red Hat Linux 9 because of this bug. This is a terrible problem and needs an urgent ERRATA!
I forgot to say why mail gets lost: If you are using spamassasin with a procmail rule like this: :0fw | spamc It seems that although spamd exits with a non-zero error code, spamc doesn't, so procmail doesn't try to rescue the mail. THIS is the really ugly part.
Actually, the lost mail problem is more precisely bug 86029, which I filed at the same time.
closing due to inactivity and ancient software