Bug 175513 - UTF-8 error from sa-learn
Summary: UTF-8 error from sa-learn
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: spamassassin
Version: 4
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Warren Togami
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-12-12 06:31 UTC by Ilpo Nyyssonen
Modified: 2007-11-30 22:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-04-17 17:28:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
mail that causes the bug (6.13 KB, application/octet-stream)
2005-12-12 16:01 UTC, Ilpo Nyyssonen
no flags Details

Description Ilpo Nyyssonen 2005-12-12 06:31:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050923 Fedora/1.7.12-1.5.1

Description of problem:
$ sa-learn --spam Maildir/.training.spam/cur/
Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/HTML.pm line 182.


Version-Release number of selected component (if applicable):
spamassassin-3.0.4-2.fc4

How reproducible:
Always

Steps to Reproduce:
1. Invoke sa-learn
  

Actual Results:  Got this message.

Expected Results:  Shouldn't have got it.

Additional info:

Comment 1 Warren Togami 2005-12-12 06:35:18 UTC
It would be helpful if you could isolate the message that causes that error,
save it into its own mbox file, and attach that mbox file.  We need the complete
message including headers and everything intact.

Comment 2 Sidney Markowitz 2005-12-12 08:48:06 UTC
This looks like upstream bug
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4046
which is fixed in the svn trunk, but not the 3.0 branch.

Comment 3 Warren Togami 2005-12-12 08:57:44 UTC
Hmm, would that patch apply in 3.0.5?  I personally see this error often in my
3.0.5 testing.

Comment 4 Sidney Markowitz 2005-12-12 10:24:07 UTC
The comments in that bug are confusing. No patches that are in that bug were
applied. The bug was closed because other changes in trunk made the warning go
away. and Justin determined that the problem was only cosmetic. The warnings did
not affect rule hits.

The relevant code in the trunk version of HTML.pm looks like this:

 # Ignore stupid warning that can't be suppressed: 'Parsing of
 # undecoded UTF-8 will give garbage when decoding entities at ..' (bug 4046)
 {
   local $SIG{__WARN__} = sub {
     warn @_ unless (defined $_[0] && $_[0] =~ /^Parsing of undecoded UTF-/);
   };

   $self->SUPER::parse($text);
 }

In 3.0 there is a call to

  $hp->parse(pack ('C0A*', $text));

at or near line 182, instead of a call to $self->SUPER::parse($text);
and that's what you would wrap the block around.

Comment 5 Ilpo Nyyssonen 2005-12-12 16:01:50 UTC
Created attachment 122138 [details]
mail that causes the bug

$ sa-learn --spam sa-learn-bug-mail 
Parsing of undecoded UTF-8 will give garbage when decoding entities at
/usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/HTML.pm line 182.
Learned from 0 message(s) (1 message(s) examined).

Comment 6 Orion Poplawski 2006-05-17 15:11:25 UTC
Seems like all relevant information has been provided.

Comment 7 Christian Iseli 2007-01-20 00:23:02 UTC
This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.


Note You need to log in before you can comment on or make changes to this bug.