Bug 39985

Summary: wget does not parse &
Product: [Retired] Red Hat Linux Reporter: Need Real Name <mal>
Component: wgetAssignee: Trond Eivind Glomsrxd <teg>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-06-06 16:36:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2001-05-10 02:15:56 UTC
wget, when extractiong URLs
from HTML pages <A HREF="http://url.com/?param=a&amp;b">text</A>
will not HTML-decode &amp; in URL http://url.com/?param=a&amp;b
This is a bug. HTML should be parsed first (&amp; replaced to &)
then URL should be extracted.

netscape, explorer and other browsers do this correctly,
wget does not.

rpm -q wget
wget-1.5.3-10

Comment 1 Alan Cox 2001-05-19 20:16:44 UTC
Well well that explains why my wget on www.linux.org.uk was failing. You are
100% right. ?a=b&amp;c=d is the right HTML for ?a=b&c=d to be posted


Comment 2 Trond Eivind Glomsrxd 2001-06-06 16:15:15 UTC
Could you please try wget-1.7 (available from http://people.redhat.com/teg/wget/
for a limited time, soon from Rawhide) and see if this solves the problem?

If not, please give me a test case.

Comment 3 Need Real Name 2001-06-06 16:36:38 UTC
wget-1.7-1  seems ok with this, it parses &amp;

wget --span-hosts -r http://127.0.0.1/jj.html
--12:42:36--  http://127.0.0.1/jj.html
           => `127.0.0.1/jj.html'
Connecting to 127.0.0.1:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 111 [text/html]

    0K                                                       100% @ 108.40 KB/s

12:42:36 (108.40 KB/s) - `127.0.0.1/jj.html' saved [111/111]

Loading robots.txt; please ignore errors.
--12:42:36--  http://people.redhat.com/robots.txt
           => `people.redhat.com/robots.txt'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 404 Not Found
12:42:36 ERROR 404: Not Found.

--12:42:36--  http://people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO
           => `people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    0K                                         
==============================================================================


wget-1.5.3-10 does not parse &amp;

wget --span-hosts -r http://127.0.0.1/jj.html
--12:47:00--  http://127.0.0.1:80/jj.html
           => `127.0.0.1/jj.html'
Connecting to 127.0.0.1:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 111 [text/html]

    0K ->                                                        [100%]

12:47:00 (2.31 KB/s) - `127.0.0.1/jj.html' saved [111/111]

Loading robots.txt; please ignore errors.
--12:47:01--  http://people.redhat.com:80/robots.txt
           => `people.redhat.com/robots.txt'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 404 Not Found
12:47:01 ERROR 404: Not Found.

--12:47:01--  http://people.redhat.com:80/teg/wget/?HH=jj&amp;hw=iww&amp;KK=OO
           => `people.redhat.com/teg/wget/?HH=jj&amp;hw=iww&amp;KK=OO'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    0K ->