Bug 39985 - wget does not parse &
wget does not parse &
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: wget (Show other bugs)
7.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Trond Eivind Glomsrxd
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-05-09 22:15 EDT by Need Real Name
Modified: 2008-05-01 11:38 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-06-06 12:36:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2001-05-09 22:15:56 EDT
wget, when extractiong URLs
from HTML pages <A HREF="http://url.com/?param=a&amp;b">text</A>
will not HTML-decode &amp; in URL http://url.com/?param=a&amp;b
This is a bug. HTML should be parsed first (&amp; replaced to &)
then URL should be extracted.

netscape, explorer and other browsers do this correctly,
wget does not.

rpm -q wget
wget-1.5.3-10
Comment 1 Alan Cox 2001-05-19 16:16:44 EDT
Well well that explains why my wget on www.linux.org.uk was failing. You are
100% right. ?a=b&amp;c=d is the right HTML for ?a=b&c=d to be posted
Comment 2 Trond Eivind Glomsrxd 2001-06-06 12:15:15 EDT
Could you please try wget-1.7 (available from http://people.redhat.com/teg/wget/
for a limited time, soon from Rawhide) and see if this solves the problem?

If not, please give me a test case.
Comment 3 Need Real Name 2001-06-06 12:36:38 EDT
wget-1.7-1  seems ok with this, it parses &amp;

wget --span-hosts -r http://127.0.0.1/jj.html
--12:42:36--  http://127.0.0.1/jj.html
           => `127.0.0.1/jj.html'
Connecting to 127.0.0.1:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 111 [text/html]

    0K                                                       100% @ 108.40 KB/s

12:42:36 (108.40 KB/s) - `127.0.0.1/jj.html' saved [111/111]

Loading robots.txt; please ignore errors.
--12:42:36--  http://people.redhat.com/robots.txt
           => `people.redhat.com/robots.txt'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 404 Not Found
12:42:36 ERROR 404: Not Found.

--12:42:36--  http://people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO
           => `people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    0K                                         
==============================================================================


wget-1.5.3-10 does not parse &amp;

wget --span-hosts -r http://127.0.0.1/jj.html
--12:47:00--  http://127.0.0.1:80/jj.html
           => `127.0.0.1/jj.html'
Connecting to 127.0.0.1:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 111 [text/html]

    0K ->                                                        [100%]

12:47:00 (2.31 KB/s) - `127.0.0.1/jj.html' saved [111/111]

Loading robots.txt; please ignore errors.
--12:47:01--  http://people.redhat.com:80/robots.txt
           => `people.redhat.com/robots.txt'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 404 Not Found
12:47:01 ERROR 404: Not Found.

--12:47:01--  http://people.redhat.com:80/teg/wget/?HH=jj&amp;hw=iww&amp;KK=OO
           => `people.redhat.com/teg/wget/?HH=jj&amp;hw=iww&amp;KK=OO'
Connecting to people.redhat.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    0K ->

Note You need to log in before you can comment on or make changes to this bug.