| Summary: | wget: unable to resolve host address “http://..." | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Milos Malik <mmalik> |
| Component: | wget | Assignee: | Karsten Hopp <karsten> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | karsten, micah |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-02-03 23:41:04 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Milos Malik
2011-02-03 13:07:59 UTC
This is obviously not a bug. You're asking for a website whose "hostname" begins with "http://". No browser I've ever seen would handle your broken URL any differently from wget. Wget thinks it's a host address, because you told it it's a host address (by percent-encoding it). Presumably, you didn't actually want to percent-encode all that. Try it again with wget -c http://tdn.howestreet.com/audio/john_rubino_2011_0202.mp3 which is what I assume you actually meant to do. Micah: Would it be that bad to run the URL through url_unescape before trying to figure out the hostname and path ?
The following works for me:
diff -urN wget-1.12/src/url.c wget-1.12_new/src/url.c
--- wget-1.12/src/url.c 2009-09-22 05:05:53.000000000 +0200
+++ wget-1.12_new/src/url.c 2011-02-03 23:48:51.000000000 +0100
@@ -547,6 +547,10 @@
if (url_scheme (url) != SCHEME_INVALID)
return NULL;
+ if (strchr (url, '%'))
+ {
+ url_unescape (url);
+ }
/* Look for a ':' or '/'. The former signifies NcFTP syntax, the
latter Netscape. */
p = strpbrk (url, ":/");
The problem with that is it breaks legitimate URLs in order to support illegitimate ones. URLs are allowed to have percent-encoded characters in the hostname portion: this can particularly happen if the URL was translated from an IRI (internationalized resource identifier), without automatically punycoding the hostname. There are also people out there that set up DNS registrations with bizarre hostnames, which could conceivably include colons and slashes (I wouldn't be surprised if there are hostnames out there that include those characters). For reaching such a server, using percent encoding would be the only way to reach them (even though such hostnames are obviously non-conforming - but there are plenty of those). In other words, it's even conceivable that someone (silly) really would have a hostname that starts "http://", and this would be the legal way to specify that. @Micah: Thanks for the explanation. I didn't know that those 'bizarre hostnames' are allowed. @Milos: FYI: Micah is the upstream maintainer of wget who is monitoring our wget bugzillas. Thanks a lot for that ! Closing per comment #1 "Former maintainer", actually. :) Giuseppe Scrivano is the current maintainer of wget (I'm no longer active in its development). So if there are any lingering doubts, it'd be best to bring this up to him (via the bug-wget mailing list, not personal email, of course); just be sure to link here for context. |