Bug 1318715

Summary: Urlgrabber.mirror._join_url() can not handle http get requests in base_url
Product: Red Hat Enterprise Linux 6 Reporter: Philipp Joos <joos>
Component: python-urlgrabberAssignee: Valentina Mukhamedzhanova <vmukhame>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.9CC: bnater, hlawatschek, joos, mdomonko, packaging-team-maint
Target Milestone: rcFlags: mdomonko: needinfo? (joos)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 11:02:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Philipp Joos 2016-03-17 15:05:48 UTC
Description of problem:

_join_url does not work as intended. rel_url is put after a  http get variable. 

For example:
When trying to join url parts ( https://myMirror/update?TOKEN', repodata/repomd.xml ), we receive 

'https://myMirror/update/?TOKEN/repodata/repomd.xml'

instead of 

'https://myMirror/update/repodata/repomd.xml?TOKEN'

Version-Release number of selected component (if applicable):
3.10-7

How reproducible:
Always reproducible


Steps to Reproduce:
1. Parse urls using the _join_url method from python-urlgrabber/mirror.py
_join_url('https://myMirror/update?TOKEN', 'repodata/repomd.xml')


Actual results:
'https://myMirror/update/?TOKEN/repodata/repomd.xml'

Expected results:
'https://myMirror/update/repodata/repomd.xml?TOKEN'

Additional info:

Following patch provides a possible solution.

diff --unified -u -r urlgrabber-3.9.1.orig/urlgrabber/mirror.py urlgrabber-3.9.1/urlgrabber/mirror.py
--- urlgrabber-3.9.1.orig/urlgrabber/mirror.py	2014-09-16 14:44:54.582048746 +0200
+++ urlgrabber-3.9.1/urlgrabber/mirror.py	2014-09-16 14:49:24.138034099 +0200
@@ -88,6 +88,7 @@
 
 
 import random
+import urlparse
 import thread  # needed for locking to make this threadsafe
 
 from grabber import URLGrabError, CallbackObject, DEBUG
@@ -366,11 +367,12 @@
     # by overriding the configuration methods :)
 
     def _join_url(self, base_url, rel_url):
-        if base_url.endswith('/') or rel_url.startswith('/'):
-            return base_url + rel_url
+        (scheme, netloc, path, query, fragid) = urlparse.urlsplit(base_url)
+        if path.endswith('/') or rel_url.startswith('/'):
+            return urlparse.urlunsplit((scheme, netloc, path + rel_url, query, fragid))
         else:
-            return base_url + '/' + rel_url
-        
+            return urlparse.urlunsplit((scheme, netloc, path + '/' + rel_url, query, fragid))
+
     def _mirror_try(self, func, url, kw):
         gr = GrabRequest()
         gr.func = func

Comment 2 Michal Domonkos 2016-08-16 13:22:27 UTC
Hello Philipp,

(In reply to Philipp Joos from comment #0)
> _join_url does not work as intended.

Could you please elaborate on this ^^^?

Also, what's your motivation for having _join_url() behave this way?

Thanks!

Comment 3 Philipp Joos 2016-09-05 11:24:22 UTC
Hello Michal,

> Could you please elaborate on this ^^^?
> Also, what's your motivation for having _join_url() behave this way?

As stated in the bug report, the position of the GET variables when joined using '_join_url()' is wrong.

This results in wrong HTTP requests etc.

Comment 4 Michal Domonkos 2016-09-05 12:50:36 UTC
(In reply to Philipp Joos from comment #3)
> Hello Michal,
> 
> > Could you please elaborate on this ^^^?
> > Also, what's your motivation for having _join_url() behave this way?
> 
> As stated in the bug report, the position of the GET variables when joined
> using '_join_url()' is wrong.
> 
> This results in wrong HTTP requests etc.

Hey Philipp,

Thanks for the response.

My concern is that the behavior of _join_url() you're requesting doesn't really seem to be justified by anything other than your specific use case (for which I can imagine it makes a lot of sense).

Note that _join_url() doesn't follow RFC 2396 for constructing absolute URIs from a base and relative URI.  However, even if it did (much like urlparse.urljoin()), the correct result of joining
  "https://myMirror/update?TOKEN" and "repodata/repomd.xml"
would rather be:
  "https://myMirror/repodata/repomd.xml".

Please let me know if I understand you correctly.

Comment 5 Michal Domonkos 2016-09-05 12:53:07 UTC
In other words, it sounds like an RFE to me, rather than a bug.  Is that correct?

Comment 6 Jan Kurik 2017-12-06 11:02:24 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/