Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1318715

Summary: Urlgrabber.mirror._join_url() can not handle http get requests in base_url
Product: Red Hat Enterprise Linux 6 Reporter: Philipp Joos <joos>
Component: python-urlgrabberAssignee: Valentina Mukhamedzhanova <vmukhame>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.9CC: bnater, hlawatschek, joos, mdomonko, packaging-team-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 11:02:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Philipp Joos 2016-03-17 15:05:48 UTC
Description of problem:

_join_url does not work as intended. rel_url is put after a  http get variable. 

For example:
When trying to join url parts ( https://myMirror/update?TOKEN', repodata/repomd.xml ), we receive 

'https://myMirror/update/?TOKEN/repodata/repomd.xml'

instead of 

'https://myMirror/update/repodata/repomd.xml?TOKEN'

Version-Release number of selected component (if applicable):
3.10-7

How reproducible:
Always reproducible


Steps to Reproduce:
1. Parse urls using the _join_url method from python-urlgrabber/mirror.py
_join_url('https://myMirror/update?TOKEN', 'repodata/repomd.xml')


Actual results:
'https://myMirror/update/?TOKEN/repodata/repomd.xml'

Expected results:
'https://myMirror/update/repodata/repomd.xml?TOKEN'

Additional info:

Following patch provides a possible solution.

diff --unified -u -r urlgrabber-3.9.1.orig/urlgrabber/mirror.py urlgrabber-3.9.1/urlgrabber/mirror.py
--- urlgrabber-3.9.1.orig/urlgrabber/mirror.py	2014-09-16 14:44:54.582048746 +0200
+++ urlgrabber-3.9.1/urlgrabber/mirror.py	2014-09-16 14:49:24.138034099 +0200
@@ -88,6 +88,7 @@
 
 
 import random
+import urlparse
 import thread  # needed for locking to make this threadsafe
 
 from grabber import URLGrabError, CallbackObject, DEBUG
@@ -366,11 +367,12 @@
     # by overriding the configuration methods :)
 
     def _join_url(self, base_url, rel_url):
-        if base_url.endswith('/') or rel_url.startswith('/'):
-            return base_url + rel_url
+        (scheme, netloc, path, query, fragid) = urlparse.urlsplit(base_url)
+        if path.endswith('/') or rel_url.startswith('/'):
+            return urlparse.urlunsplit((scheme, netloc, path + rel_url, query, fragid))
         else:
-            return base_url + '/' + rel_url
-        
+            return urlparse.urlunsplit((scheme, netloc, path + '/' + rel_url, query, fragid))
+
     def _mirror_try(self, func, url, kw):
         gr = GrabRequest()
         gr.func = func

Comment 2 Michal Domonkos 2016-08-16 13:22:27 UTC
Hello Philipp,

(In reply to Philipp Joos from comment #0)
> _join_url does not work as intended.

Could you please elaborate on this ^^^?

Also, what's your motivation for having _join_url() behave this way?

Thanks!

Comment 3 Philipp Joos 2016-09-05 11:24:22 UTC
Hello Michal,

> Could you please elaborate on this ^^^?
> Also, what's your motivation for having _join_url() behave this way?

As stated in the bug report, the position of the GET variables when joined using '_join_url()' is wrong.

This results in wrong HTTP requests etc.

Comment 4 Michal Domonkos 2016-09-05 12:50:36 UTC
(In reply to Philipp Joos from comment #3)
> Hello Michal,
> 
> > Could you please elaborate on this ^^^?
> > Also, what's your motivation for having _join_url() behave this way?
> 
> As stated in the bug report, the position of the GET variables when joined
> using '_join_url()' is wrong.
> 
> This results in wrong HTTP requests etc.

Hey Philipp,

Thanks for the response.

My concern is that the behavior of _join_url() you're requesting doesn't really seem to be justified by anything other than your specific use case (for which I can imagine it makes a lot of sense).

Note that _join_url() doesn't follow RFC 2396 for constructing absolute URIs from a base and relative URI.  However, even if it did (much like urlparse.urljoin()), the correct result of joining
  "https://myMirror/update?TOKEN" and "repodata/repomd.xml"
would rather be:
  "https://myMirror/repodata/repomd.xml".

Please let me know if I understand you correctly.

Comment 5 Michal Domonkos 2016-09-05 12:53:07 UTC
In other words, it sounds like an RFE to me, rather than a bug.  Is that correct?

Comment 6 Jan Kurik 2017-12-06 11:02:24 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Comment 7 Red Hat Bugzilla 2023-09-14 03:19:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days