wget, when extractiong URLs from HTML pages <A HREF="http://url.com/?param=a&b">text</A> will not HTML-decode & in URL http://url.com/?param=a&b This is a bug. HTML should be parsed first (& replaced to &) then URL should be extracted. netscape, explorer and other browsers do this correctly, wget does not. rpm -q wget wget-1.5.3-10
Well well that explains why my wget on www.linux.org.uk was failing. You are 100% right. ?a=b&c=d is the right HTML for ?a=b&c=d to be posted
Could you please try wget-1.7 (available from http://people.redhat.com/teg/wget/ for a limited time, soon from Rawhide) and see if this solves the problem? If not, please give me a test case.
wget-1.7-1 seems ok with this, it parses & wget --span-hosts -r http://127.0.0.1/jj.html --12:42:36-- http://127.0.0.1/jj.html => `127.0.0.1/jj.html' Connecting to 127.0.0.1:80... connected! HTTP request sent, awaiting response... 200 OK Length: 111 [text/html] 0K 100% @ 108.40 KB/s 12:42:36 (108.40 KB/s) - `127.0.0.1/jj.html' saved [111/111] Loading robots.txt; please ignore errors. --12:42:36-- http://people.redhat.com/robots.txt => `people.redhat.com/robots.txt' Connecting to people.redhat.com:80... connected! HTTP request sent, awaiting response... 404 Not Found 12:42:36 ERROR 404: Not Found. --12:42:36-- http://people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO => `people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO' Connecting to people.redhat.com:80... connected! HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K ============================================================================== wget-1.5.3-10 does not parse & wget --span-hosts -r http://127.0.0.1/jj.html --12:47:00-- http://127.0.0.1:80/jj.html => `127.0.0.1/jj.html' Connecting to 127.0.0.1:80... connected! HTTP request sent, awaiting response... 200 OK Length: 111 [text/html] 0K -> [100%] 12:47:00 (2.31 KB/s) - `127.0.0.1/jj.html' saved [111/111] Loading robots.txt; please ignore errors. --12:47:01-- http://people.redhat.com:80/robots.txt => `people.redhat.com/robots.txt' Connecting to people.redhat.com:80... connected! HTTP request sent, awaiting response... 404 Not Found 12:47:01 ERROR 404: Not Found. --12:47:01-- http://people.redhat.com:80/teg/wget/?HH=jj&hw=iww&KK=OO => `people.redhat.com/teg/wget/?HH=jj&hw=iww&KK=OO' Connecting to people.redhat.com:80... connected! HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K ->