Bug 175513

Summary: UTF-8 error from sa-learn
Product: [Fedora] Fedora Reporter: Ilpo Nyyssonen <iny>
Component: spamassassinAssignee: Warren Togami <wtogami>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: felicity, jm, kms, parkerm, perl-devel, reg+redhat, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-17 17:28:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mail that causes the bug none

Description Ilpo Nyyssonen 2005-12-12 06:31:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050923 Fedora/1.7.12-1.5.1

Description of problem:
$ sa-learn --spam Maildir/.training.spam/cur/
Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/HTML.pm line 182.


Version-Release number of selected component (if applicable):
spamassassin-3.0.4-2.fc4

How reproducible:
Always

Steps to Reproduce:
1. Invoke sa-learn
  

Actual Results:  Got this message.

Expected Results:  Shouldn't have got it.

Additional info:

Comment 1 Warren Togami 2005-12-12 06:35:18 UTC
It would be helpful if you could isolate the message that causes that error,
save it into its own mbox file, and attach that mbox file.  We need the complete
message including headers and everything intact.

Comment 2 Sidney Markowitz 2005-12-12 08:48:06 UTC
This looks like upstream bug
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4046
which is fixed in the svn trunk, but not the 3.0 branch.

Comment 3 Warren Togami 2005-12-12 08:57:44 UTC
Hmm, would that patch apply in 3.0.5?  I personally see this error often in my
3.0.5 testing.

Comment 4 Sidney Markowitz 2005-12-12 10:24:07 UTC
The comments in that bug are confusing. No patches that are in that bug were
applied. The bug was closed because other changes in trunk made the warning go
away. and Justin determined that the problem was only cosmetic. The warnings did
not affect rule hits.

The relevant code in the trunk version of HTML.pm looks like this:

 # Ignore stupid warning that can't be suppressed: 'Parsing of
 # undecoded UTF-8 will give garbage when decoding entities at ..' (bug 4046)
 {
   local $SIG{__WARN__} = sub {
     warn @_ unless (defined $_[0] && $_[0] =~ /^Parsing of undecoded UTF-/);
   };

   $self->SUPER::parse($text);
 }

In 3.0 there is a call to

  $hp->parse(pack ('C0A*', $text));

at or near line 182, instead of a call to $self->SUPER::parse($text);
and that's what you would wrap the block around.

Comment 5 Ilpo Nyyssonen 2005-12-12 16:01:50 UTC
Created attachment 122138 [details]
mail that causes the bug

$ sa-learn --spam sa-learn-bug-mail 
Parsing of undecoded UTF-8 will give garbage when decoding entities at
/usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/HTML.pm line 182.
Learned from 0 message(s) (1 message(s) examined).

Comment 6 Orion Poplawski 2006-05-17 15:11:25 UTC
Seems like all relevant information has been provided.

Comment 7 Christian Iseli 2007-01-20 00:23:02 UTC
This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.