Hide Forgot
Description of problem: _join_url does not work as intended. rel_url is put after a http get variable. For example: When trying to join url parts ( https://myMirror/update?TOKEN', repodata/repomd.xml ), we receive 'https://myMirror/update/?TOKEN/repodata/repomd.xml' instead of 'https://myMirror/update/repodata/repomd.xml?TOKEN' Version-Release number of selected component (if applicable): 3.10-7 How reproducible: Always reproducible Steps to Reproduce: 1. Parse urls using the _join_url method from python-urlgrabber/mirror.py _join_url('https://myMirror/update?TOKEN', 'repodata/repomd.xml') Actual results: 'https://myMirror/update/?TOKEN/repodata/repomd.xml' Expected results: 'https://myMirror/update/repodata/repomd.xml?TOKEN' Additional info: Following patch provides a possible solution. diff --unified -u -r urlgrabber-3.9.1.orig/urlgrabber/mirror.py urlgrabber-3.9.1/urlgrabber/mirror.py --- urlgrabber-3.9.1.orig/urlgrabber/mirror.py 2014-09-16 14:44:54.582048746 +0200 +++ urlgrabber-3.9.1/urlgrabber/mirror.py 2014-09-16 14:49:24.138034099 +0200 @@ -88,6 +88,7 @@ import random +import urlparse import thread # needed for locking to make this threadsafe from grabber import URLGrabError, CallbackObject, DEBUG @@ -366,11 +367,12 @@ # by overriding the configuration methods :) def _join_url(self, base_url, rel_url): - if base_url.endswith('/') or rel_url.startswith('/'): - return base_url + rel_url + (scheme, netloc, path, query, fragid) = urlparse.urlsplit(base_url) + if path.endswith('/') or rel_url.startswith('/'): + return urlparse.urlunsplit((scheme, netloc, path + rel_url, query, fragid)) else: - return base_url + '/' + rel_url - + return urlparse.urlunsplit((scheme, netloc, path + '/' + rel_url, query, fragid)) + def _mirror_try(self, func, url, kw): gr = GrabRequest() gr.func = func
Hello Philipp, (In reply to Philipp Joos from comment #0) > _join_url does not work as intended. Could you please elaborate on this ^^^? Also, what's your motivation for having _join_url() behave this way? Thanks!
Hello Michal, > Could you please elaborate on this ^^^? > Also, what's your motivation for having _join_url() behave this way? As stated in the bug report, the position of the GET variables when joined using '_join_url()' is wrong. This results in wrong HTTP requests etc.
(In reply to Philipp Joos from comment #3) > Hello Michal, > > > Could you please elaborate on this ^^^? > > Also, what's your motivation for having _join_url() behave this way? > > As stated in the bug report, the position of the GET variables when joined > using '_join_url()' is wrong. > > This results in wrong HTTP requests etc. Hey Philipp, Thanks for the response. My concern is that the behavior of _join_url() you're requesting doesn't really seem to be justified by anything other than your specific use case (for which I can imagine it makes a lot of sense). Note that _join_url() doesn't follow RFC 2396 for constructing absolute URIs from a base and relative URI. However, even if it did (much like urlparse.urljoin()), the correct result of joining "https://myMirror/update?TOKEN" and "repodata/repomd.xml" would rather be: "https://myMirror/repodata/repomd.xml". Please let me know if I understand you correctly.
In other words, it sounds like an RFE to me, rather than a bug. Is that correct?
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/