Bug 39985 - wget does not parse &
Summary: wget does not parse &
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: wget
Version: 7.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Trond Eivind Glomsrxd
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-05-10 02:15 UTC by Need Real Name
Modified: 2008-05-01 15:38 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-06-06 16:36:42 UTC
Embargoed:


Attachments (Terms of Use)

Description Need Real Name 2001-05-10 02:15:56 UTC
wget, when extractiong URLs
from HTML pages <A HREF="http://url.com/?param=a&amp;b">text</A>
will not HTML-decode &amp; in URL http://url.com/?param=a&amp;b
This is a bug. HTML should be parsed first (&amp; replaced to &)
then URL should be extracted.

netscape, explorer and other browsers do this correctly,
wget does not.

rpm -q wget
wget-1.5.3-10

Comment 1 Alan Cox 2001-05-19 20:16:44 UTC
Well well that explains why my wget on www.linux.org.uk was failing. You are
100% right. ?a=b&amp;c=d is the right HTML for ?a=b&c=d to be posted


Comment 2 Trond Eivind Glomsrxd 2001-06-06 16:15:15 UTC
Could you please try wget-1.7 (available from http://people.redhat.com/teg/wget/
for a limited time, soon from Rawhide) and see if this solves the problem?

If not, please give me a test case.

Comment 3 Need Real Name 2001-06-06 16:36:38 UTC
wget-1.7-1  seems ok with this, it parses &amp;

wget --span-hosts -r http://127.0.0.1/jj.html
--12:42:36--  http://127.0.0.1/jj.html
           => `127.0.0.1/jj.html'
Connecting to 127.0.0.1:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 111 [text/html]

    0K                                                       100% @ 108.40 KB/s

12:42:36 (108.40 KB/s) - `127.0.0.1/jj.html' saved [111/111]

Loading robots.txt; please ignore errors.
--12:42:36--  http://people.redhat.com/robots.txt
           => `people.redhat.com/robots.txt'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 404 Not Found
12:42:36 ERROR 404: Not Found.

--12:42:36--  http://people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO
           => `people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    0K                                         
==============================================================================


wget-1.5.3-10 does not parse &amp;

wget --span-hosts -r http://127.0.0.1/jj.html
--12:47:00--  http://127.0.0.1:80/jj.html
           => `127.0.0.1/jj.html'
Connecting to 127.0.0.1:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 111 [text/html]

    0K ->                                                        [100%]

12:47:00 (2.31 KB/s) - `127.0.0.1/jj.html' saved [111/111]

Loading robots.txt; please ignore errors.
--12:47:01--  http://people.redhat.com:80/robots.txt
           => `people.redhat.com/robots.txt'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 404 Not Found
12:47:01 ERROR 404: Not Found.

--12:47:01--  http://people.redhat.com:80/teg/wget/?HH=jj&amp;hw=iww&amp;KK=OO
           => `people.redhat.com/teg/wget/?HH=jj&amp;hw=iww&amp;KK=OO'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    0K ->



Note You need to log in before you can comment on or make changes to this bug.