Bug 86028 - spamd and non-ASCII input
Summary: spamd and non-ASCII input
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: spamassassin
Version: 9
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Warren Togami
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-03-12 17:20 UTC by Owen Taylor
Modified: 2007-04-18 16:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-03 10:30:45 UTC
Embargoed:


Attachments (Terms of Use)
Message with non-ASCII causing spamd to error (5.63 KB, text/plain)
2003-03-12 17:25 UTC, Owen Taylor
no flags Details

Description Owen Taylor 2003-03-12 17:20:09 UTC
If you feed the attached mail into spamd for spamassassin-2.44-11.8.x
you'll get errors:

Mar 12 11:54:59 poincare spamd[32111]: connection from localhost.localdomain
[127.0.0.1] at port 51842
Mar 12 11:55:00 poincare spamd[1513]: info: setuid to otaylor succeeded
Mar 12 11:55:00 poincare spamd[1513]: processing message
<1047474641.3234.18.camel@thor> for otaylor:2181, expecting 5768 bytes.
Mar 12 11:55:00 poincare spamd[1513]: bad protocol: header error:
(Content-length mismatch: 5768 vs. 5764)

This is because 'spamd' is doing (essentially)

 while (<IN>) {
   $len += length;
 }

To get the length of the message that spamc sends it, but spamc's
Content-Length: header is in bytes.

If you add 'use bytes' to spamd, then that gets fixed, but, you
then get:

Mar 12 12:10:17 poincare spamd[1549]: connection from localhost.localdomain
[127.0.0.1] at port 52038
Mar 12 12:10:17 poincare spamd[1718]: info: setuid to otaylor succeeded
Mar 12 12:10:18 poincare spamd[1718]: processing message
<1047474641.3234.18.camel@thor> for otaylor:2181, expecting 5768 bytes.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1786,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1786,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1787,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: Malformed UTF-8 character (unexpected
non-continuation byte 0x6e, immediately after start byte 0xe4) in
transliteration (tr///) at
/usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm line 1787,
<STDIN> line 140.
Mar 12 12:10:18 poincare spamd[1718]: clean message (-0.9/5.0) for otaylor:2181
in 0.4 seconds, 5768 bytes.

Which is less harmful, but should be fixed too. It may be possible
that starting up spamd with a non-default system encoding (LANG=C?)
will fix these problems. I don't have a very good understanding of
Perl's encoding handling.

Comment 1 Owen Taylor 2003-03-12 17:25:18 UTC
Created attachment 90574 [details]
Message with non-ASCII causing spamd to error

Comment 2 Chip Turner 2003-03-15 16:20:40 UTC
can you test this with spamassassin-2.50-2.8.x?  it fixes a number of
utf8/non-ascii issues


Comment 3 Branimir Dolicki 2003-05-02 17:29:17 UTC
I tried this with 2.53 (original tar.gz, not RPM package) and the length is now
calculated correctly.

Spamassassin 2.44 calculates length using:

  for (<STDIN>) {
      ....
      $actual_length += length;
  }

This returns a number of characters rather than number of bytes.  spamc however
sends the nubmer of bytes.  The mismatch between the two numbers (which is
normal if LANG is set to utf8 AND the email actually contains nonascii
characters) causes spamd
to simply return 1;:

if($actual_length != $expected_length) { protocol_error ("(Content-length
mismatch: $expected_length vs. $actual_length)"); return 1; }


Spamassassin 2.53 reads from a socket rather than STDIN:

  $server = new IO::Socket::INET(...);
  ...
  $client = $server->accept;
  ...
  while ($_ = $client->getline()) {
      ....
      $actual_length += length;
  }

which returns the correct number of bytes regardless of what is in LANG.

The only reason why I went this far in investigating this problem is that
I ACTUALLY LOST ABOUT A DOZEN OF MAILS since I moved to Red Hat Linux 9
because of this bug.

This is a terrible problem and needs an urgent ERRATA!

Comment 4 Branimir Dolicki 2003-05-02 17:46:19 UTC
I forgot to say why mail gets lost: If you are using spamassasin with a
procmail rule like this:

:0fw
| spamc

It seems that although spamd exits with a non-zero error code, spamc doesn't,
so procmail doesn't try to rescue the mail. THIS is the really ugly part.

Comment 5 Owen Taylor 2003-05-02 19:17:53 UTC
Actually, the lost mail problem is more precisely bug 86029, which I 
filed at the same time.

Comment 6 Warren Togami 2005-04-03 10:30:45 UTC
closing due to inactivity and ancient software


Note You need to log in before you can comment on or make changes to this bug.