Bug 159051

Summary: GET prints error message on http://www.britishairwaysband.com/veday.htm
Product: [Fedora] Fedora Reporter: Nigel Horne <njh>
Component: perl-libwww-perlAssignee: Jason Vas Dias <jvdias>
Status: CLOSED NOTABUG QA Contact:
Severity: low Docs Contact:
Priority: medium    
Version: 3CC: perl-devel
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
URL: http://www.britishairwaysband.com/veday.htm
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-12-21 16:11:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
file with non-7-bit-ascii characters stripped none

Description Nigel Horne 2005-05-28 08:04:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1

Description of problem:
GET http://www.britishairwaysband.com/veday.htm says

Parsing of undecoded UTF-16 at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 114.

Version-Release number of selected component (if applicable):
perl-libwww-perl-5.79-5

How reproducible:
Always

Steps to Reproduce:
1. GET http://www.britishairwaysband.com/veday.htm
2.
3.
  

Actual Results:  This message appears

Parsing of undecoded UTF-16 at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 114.

Expected Results:  The message shouldn't have appeared.

Additional info:

Comment 1 Jason Vas Dias 2005-12-21 16:11:26 UTC
We apologize for the delay in processing this bug report .


I've just downloaded page in question, with both: 
 $ wget http://www.britishairwaysband.com/veday.htm
and
 $ lwp-request  http://www.britishairwaysband.com/veday.htm > /tmp/veday.htm

It appears the message from LWP/Protocol.pm is not in error - 
the veday.htm file is full of illegal binary characters - 
e.g. the first line:

 $ head -1 < /tmp/veday.htm | od -cx
0000000 377 376   <  \0   H  \0   T  \0   M  \0   L  \0   >  \0  \r  \0
        feff 003c 0048 0054 004d 004c 003e 000d
0000020  \n  \0
        000a
0000021

Perl does its best to figure out what kind of encoding is being used, but
there is no encoding in which all the 8-bit sequences in this file are legal.
 
I suggest converting the file to 7-bit ASCII:

perl -ne 'foreach $c ( split //, $_ )
{ if( ((ord($c) < 0x20) && !( $c =~ /[\n\r\t\v]/)) || (ord($c) > 0x7f) )
  { next; }; 
  print $c; 
};' < /tmp/veday.htm > /tmp/veday_ascii.htm

I've attached the /tmp/veday_ascii.htm file so you can see the differences.


Comment 2 Jason Vas Dias 2005-12-21 16:12:46 UTC
Created attachment 122493 [details]
file with non-7-bit-ascii characters stripped