Red Hat Bugzilla – Bug 159051
GET prints error message on http://www.britishairwaysband.com/veday.htm
Last modified: 2007-11-30 17:11:06 EST
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 Description of problem: GET http://www.britishairwaysband.com/veday.htm says Parsing of undecoded UTF-16 at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 114. Version-Release number of selected component (if applicable): perl-libwww-perl-5.79-5 How reproducible: Always Steps to Reproduce: 1. GET http://www.britishairwaysband.com/veday.htm 2. 3. Actual Results: This message appears Parsing of undecoded UTF-16 at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 114. Expected Results: The message shouldn't have appeared. Additional info:
We apologize for the delay in processing this bug report . I've just downloaded page in question, with both: $ wget http://www.britishairwaysband.com/veday.htm and $ lwp-request http://www.britishairwaysband.com/veday.htm > /tmp/veday.htm It appears the message from LWP/Protocol.pm is not in error - the veday.htm file is full of illegal binary characters - e.g. the first line: $ head -1 < /tmp/veday.htm | od -cx 0000000 377 376 < \0 H \0 T \0 M \0 L \0 > \0 \r \0 feff 003c 0048 0054 004d 004c 003e 000d 0000020 \n \0 000a 0000021 Perl does its best to figure out what kind of encoding is being used, but there is no encoding in which all the 8-bit sequences in this file are legal. I suggest converting the file to 7-bit ASCII: perl -ne 'foreach $c ( split //, $_ ) { if( ((ord($c) < 0x20) && !( $c =~ /[\n\r\t\v]/)) || (ord($c) > 0x7f) ) { next; }; print $c; };' < /tmp/veday.htm > /tmp/veday_ascii.htm I've attached the /tmp/veday_ascii.htm file so you can see the differences.
Created attachment 122493 [details] file with non-7-bit-ascii characters stripped