Red Hat Bugzilla – Bug 51539
-I/foo/bar/../baz not same as -I/foo/baz (all versions of wget)
Last modified: 2008-05-01 11:38:00 EDT
Description of Problem:
Args to -I, -X aren't normalized. It's debatable how far this should go, since URLs
are not exactly the same as paths in terms of semantics. In particular, whether a
target URL should be normalized is iffy, at best. However, path args to -I, -X most
likely correspond directly to real paths in the server space, and should be subject
to common pathname normalization techniques. By this I mean, elimination of
/foo/.. sequences, and elimination of "." components.
Without this, the feature doesn't really work right, since inclusion/exclusion is done
by string comparison. This affects all versions of wget >= 1.6 in all applicable RH
I'll work up a patch against 1.7 and send it in sometime in the next couple of days
(it's not that hard ... I've patched wget enough I know my way around the code
pretty well. It's kinda messy, due to complexity, but well written.)
Note that this only is an issue when you're mechanically generating args to wget,
like when doing "form scraping", since I don't think most people are perverse
enough to deliberately enter in a non-normalized path as an arg to -I or -X, and it's
definitely something you'd have to consciously make an effort to do.
I'm not sure I think it should be normalized - if you can convice the wget
authors, it will be, but until then I don't see it as a problem.
Also, if the level is implemented as a symlink it may not correspond foo/../bar
may be different from bar/
OK, that last bit (which was what I was getting at with URLs having different
semantics) convinces me it's not a good idea.
AFA convincing the wget authors of anything, I've never been able to convince them to
even answer email, or acknowledge a bug report or a patch, let alone agree with
something. Maybe it's just several isolated occurrences (isn't that a contradiction?), but
they don't seem to acknowledge any contact from the outside world at all.
Last time I sent in a (trivial) patch, I got a response after a couple of months
so they are acking, just with a high latency. You might have better luck on the
wget mailing list, if anything like that exists.