Bug 51539 - -I/foo/bar/../baz not same as -I/foo/baz (all versions of wget)
Summary: -I/foo/bar/../baz not same as -I/foo/baz (all versions of wget)
Alias: None
Product: Red Hat Raw Hide
Classification: Retired
Component: wget
Version: 1.0
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Trond Eivind Glomsrxd
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2001-08-12 08:48 UTC by j. alan eldridge
Modified: 2008-05-01 15:38 UTC (History)
0 users

Clone Of:
Last Closed: 2001-08-12 08:48:53 UTC

Attachments (Terms of Use)

Description j. alan eldridge 2001-08-12 08:48:49 UTC
Description of Problem:

Args to -I, -X aren't normalized. It's debatable how far this should go, since URLs 
are not exactly the same as paths in terms of semantics. In particular, whether a 
target URL should be normalized is iffy, at best. However, path args to -I, -X most 
likely correspond directly to real paths in the server space, and should be subject 
to common pathname normalization techniques. By this I mean, elimination of 
/foo/.. sequences, and elimination of "." components. 

Without this, the feature doesn't really work right, since inclusion/exclusion is done 
by string comparison. This affects all versions of wget >= 1.6 in all applicable RH 
I'll work up a patch against 1.7 and send it in sometime in the next couple of days 
(it's not that hard ... I've patched wget enough I know my way around the code 
pretty well. It's kinda messy, due to complexity, but well written.)

Note that this only is an issue when you're mechanically generating args to wget, 
like when doing "form scraping", since I don't think most people are perverse 
enough to deliberately enter in a non-normalized path as an arg to -I or -X, and it's 
definitely something you'd have to consciously make an effort to do.

Comment 1 Trond Eivind Glomsrxd 2001-08-13 01:19:23 UTC
I'm not sure I think it should be normalized - if you can convice the wget
authors, it will be, but until then I don't see it as a problem. 

Also, if the level is implemented as a symlink it may not correspond foo/../bar
may be different from bar/

Comment 2 j. alan eldridge 2001-08-13 01:26:20 UTC
OK, that last bit (which was what I was getting at with URLs having different 
semantics) convinces me it's not a good idea.

AFA convincing the wget authors of anything, I've never been able to convince them to 
even answer email, or acknowledge a bug report or a patch, let alone agree with 
something. Maybe it's just several isolated occurrences (isn't that a contradiction?), but 
they don't seem to acknowledge any contact from the outside world at all. 

Comment 3 Trond Eivind Glomsrxd 2001-08-13 01:30:06 UTC
Last time I sent in a (trivial) patch, I got a response after a couple of months
so they are acking, just with a high latency. You might have better luck on the
wget mailing list, if anything like that exists.

Note You need to log in before you can comment on or make changes to this bug.