Bug 86028 - spamd and non-ASCII input
spamd and non-ASCII input
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: spamassassin (Show other bugs)
9
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Warren Togami
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-03-12 12:20 EST by Owen Taylor
Modified: 2007-04-18 12:51 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-04-03 06:30:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Message with non-ASCII causing spamd to error (5.63 KB, text/plain)
2003-03-12 12:25 EST, Owen Taylor
no flags Details

  None (edit)
Description Owen Taylor 2003-03-12 12:20:09 EST
If you feed the attached mail into spamd for spamassassin-2.44-11.8.x
you'll get errors:

Mar 12 11:54:59 poincare spamd[32111]: connection from localhost.localdomain
[127.0.0.1] at port 51842
Mar 12 11:55:00 poincare spamd[1513]: info: setuid to otaylor succeeded
Mar 12 11:55:00 poincare spamd[1513]: processing message
<1047474641.3234.18.camel@thor> for otaylor:2181, expecting 5768 bytes.
Mar 12 11:55:00 poincare spamd[1513]: bad protocol: header error:
(Content-length mismatch: 5768 vs. 5764)

This is because 'spamd' is doing (essentially)

 while (<IN>) {
   $len += length;
 }

To get the length of the message that spamc sends it, but spamc's
Content-Length: header is in bytes.

If you add 'use bytes' to spamd, then that gets fixed, but, you
then get:

Mar 12 12:10:17 poincare spamd[1549]: connection from localhost.localdomain
[127.0.0.1] at port 52038
Mar 12 12:10:17 poincare spamd[1718]: info: setuid to otaylor succeeded
Mar 12 12:10:18 poincare spamd[1718]: processing message
<1047474641.3234.18.camel@thor> for otaylor:2181, expecting 5768 bytes.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1786,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1786,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1787,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1787,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: clean message (-0.9/5.0) for otaylor:2181
in 0.4 seconds, 5768 bytes.

Which is less harmful, but should be fixed too. It may be possible
that starting up spamd with a non-default system encoding (LANG=C?)
will fix these problems. I don't have a very good understanding of
Perl's encoding handling.
Comment 1 Owen Taylor 2003-03-12 12:25:18 EST
Created attachment 90574 [details]
Message with non-ASCII causing spamd to error
Comment 2 Chip Turner 2003-03-15 11:20:40 EST
can you test this with spamassassin-2.50-2.8.x?  it fixes a number of
utf8/non-ascii issues
Comment 3 Branimir Dolicki 2003-05-02 13:29:17 EDT
I tried this with 2.53 (original tar.gz, not RPM package) and the length is now
calculated correctly.

Spamassassin 2.44 calculates length using:

  for (<STDIN>) {
      ....
      $actual_length += length;
  }

This returns a number of characters rather than number of bytes.  spamc however
sends the nubmer of bytes.  The mismatch between the two numbers (which is
normal if LANG is set to utf8 AND the email actually contains nonascii
characters) causes spamd
to simply return 1;:

if($actual_length != $expected_length) { protocol_error ("(Content-length
mismatch: $expected_length vs. $actual_length)"); return 1; }


Spamassassin 2.53 reads from a socket rather than STDIN:

  $server = new IO::Socket::INET(...);
  ...
  $client = $server->accept;
  ...
  while ($_ = $client->getline()) {
      ....
      $actual_length += length;
  }

which returns the correct number of bytes regardless of what is in LANG.

The only reason why I went this far in investigating this problem is that
I ACTUALLY LOST ABOUT A DOZEN OF MAILS since I moved to Red Hat Linux 9
because of this bug.

This is a terrible problem and needs an urgent ERRATA!
Comment 4 Branimir Dolicki 2003-05-02 13:46:19 EDT
I forgot to say why mail gets lost: If you are using spamassasin with a
procmail rule like this:

:0fw
| spamc

It seems that although spamd exits with a non-zero error code, spamc doesn't,
so procmail doesn't try to rescue the mail. THIS is the really ugly part.
Comment 5 Owen Taylor 2003-05-02 15:17:53 EDT
Actually, the lost mail problem is more precisely bug 86029, which I 
filed at the same time.
Comment 6 Warren Togami 2005-04-03 06:30:45 EDT
closing due to inactivity and ancient software

Note You need to log in before you can comment on or make changes to this bug.