Bug 674817 - wget: unable to resolve host address “http://..."
Summary: wget: unable to resolve host address “http://..."
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: wget
Version: rawhide
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Karsten Hopp
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-03 13:07 UTC by Milos Malik
Modified: 2011-02-04 00:34 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-03 23:41:04 UTC
Type: ---


Attachments (Terms of Use)

Description Milos Malik 2011-02-03 13:07:59 UTC
Description of problem:


Version-Release number of selected component (if applicable):
wget-1.12-2.fc12.i686 (Fedora 12)
wget-1.12-1.4.el6.x86_64 (RHEL-6.0)

How reproducible:
always

Steps to Reproduce:
$ wget -c http%3A%2F%2Ftdn.howestreet.com%2Faudio%2Fjohn_rubino_2011_0202.mp3
--2011-02-03 14:03:10--  http://[http://tdn.howestreet.com/audio/john_rubino_2011_0202.mp3]/
Resolving http://tdn.howestreet.com/audio/john_rubino_2011_0202.mp3... failed: Name or service not known.
wget: unable to resolve host address “http://tdn.howestreet.com/audio/john_rubino_2011_0202.mp3”
$ echo $?
4

Actual results:
the file is not downloaded

Expected results:
the file is downloaded

Additional info:
It seems that wget thinks that “http://tdn.howestreet.com/audio/john_rubino_2011_0202.mp3” is a host address.

Comment 1 Micah Cowan 2011-02-03 18:26:53 UTC
This is obviously not a bug. You're asking for a website whose "hostname" begins with "http://". No browser I've ever seen would handle your broken URL any differently from wget. Wget thinks it's a host address, because you told it it's a host address (by percent-encoding it).

Presumably, you didn't actually want to percent-encode all that. Try it again with

  wget -c http://tdn.howestreet.com/audio/john_rubino_2011_0202.mp3

which is what I assume you actually meant to do.

Comment 2 Karsten Hopp 2011-02-03 22:56:12 UTC
Micah: Would it be that bad to run the URL through url_unescape before trying to figure out the hostname and path ?
The following works for me:

diff -urN wget-1.12/src/url.c wget-1.12_new/src/url.c
--- wget-1.12/src/url.c 2009-09-22 05:05:53.000000000 +0200
+++ wget-1.12_new/src/url.c     2011-02-03 23:48:51.000000000 +0100
@@ -547,6 +547,10 @@
   if (url_scheme (url) != SCHEME_INVALID)
     return NULL;
 
+  if (strchr (url, '%'))
+    {
+      url_unescape (url);
+    }
   /* Look for a ':' or '/'.  The former signifies NcFTP syntax, the
      latter Netscape.  */
   p = strpbrk (url, ":/");

Comment 3 Micah Cowan 2011-02-03 23:18:38 UTC
The problem with that is it breaks legitimate URLs in order to support illegitimate ones. URLs are allowed to have percent-encoded characters in the hostname portion: this can particularly happen if the URL was translated from an IRI (internationalized resource identifier), without automatically punycoding the hostname.

There are also people out there that set up DNS registrations with bizarre hostnames, which could conceivably include colons and slashes (I wouldn't be surprised if there are hostnames out there that include those characters). For reaching such a server, using percent encoding would be the only way to reach them (even though such hostnames are obviously non-conforming - but there are plenty of those).

In other words, it's even conceivable that someone (silly) really would have a hostname that starts "http://", and this would be the legal way to specify that.

Comment 4 Karsten Hopp 2011-02-03 23:41:04 UTC
@Micah: Thanks for the explanation. I didn't know that those 'bizarre hostnames' are allowed. 

@Milos: FYI: Micah is the upstream maintainer of wget who is monitoring our wget bugzillas. Thanks a lot for that !

Closing per comment #1

Comment 5 Micah Cowan 2011-02-04 00:34:37 UTC
"Former maintainer", actually. :)

Giuseppe Scrivano is the current maintainer of wget (I'm no longer active in its development). So if there are any lingering doubts, it'd be best to bring this up to him (via the bug-wget mailing list, not personal email, of course); just be sure to link here for context.


Note You need to log in before you can comment on or make changes to this bug.