Description of problem: In the urllib.parse (or urlparse on Python 2.X) module there is this function: 157 def urlsplit(url, scheme='', allow_fragments=True): 158 """Parse a URL into 5 components: 159 <scheme>://<netloc>/<path>?<query>#<fragment> 160 Return a 5-tuple: (scheme, netloc, path, query, fragment). 161 Note that we don't break the components up in smaller bits 162 (e.g. netloc is a single string) and we don't expand % escapes.""" 163 allow_fragments = bool(allow_fragments) 164 key = url, scheme, allow_fragments, type(url), type(scheme) 165 cached = _parse_cache.get(key, None) 166 if cached: 167 return cached 168 if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth 169 clear_cache() 170 netloc = query = fragment = '' 171 i = url.find(':') 172 if i > 0: 173 if url[:i] == 'http': # optimize the common case 174 scheme = url[:i].lower() 175 url = url[i+1:] 176 if url[:2] == '//': 177 netloc, url = _splitnetloc(url, 2) 178 if allow_fragments and '#' in url: 179 url, fragment = url.split('#', 1) 180 if '?' in url: 181 url, query = url.split('?', 1) 182 v = SplitResult(scheme, netloc, url, query, fragment) 183 _parse_cache[key] = v 184 return v 185 for c in url[:i]: 186 if c not in scheme_chars: 187 break 188 else: 189 scheme, url = url[:i].lower(), url[i+1:] 190 191 if url[:2] == '//': 192 netloc, url = _splitnetloc(url, 2) 193 if allow_fragments and '#' in url: 194 url, fragment = url.split('#', 1) 195 if '?' in url: 196 url, query = url.split('?', 1) 197 v = SplitResult(scheme, netloc, url, query, fragment) 198 _parse_cache[key] = v 199 return v There is an issue here (or a few of them) as follows: * if url[:1] is already lowercase (equals "http") (line 173) then .lower() on line 174 is reduntant: 174 scheme = url[:i].lower() # <--- no need for .lower() b/c value is "http" * OTOH line 173 could refactor the condition and match URLs where the scheme is uppercase. For example 173 if url[:i].lower() == 'http': # optimize the common case * The code as is returns the same results (as far as I've tested it) for both: urlsplit("http://github.com/atodorov/repo.git?param=value#myfragment") urlsplit("HTTP://github.com/atodorov/repo.git?param=value#myfragment") urlsplit("HTtP://github.com/atodorov/repo.git?param=value#myfragment") but the last 2 invocations also go through lines 185-199 * Lines 174-184 are essentially the same as lines 189-199. The only optimization I can see is avoiding the for loop around lines 185-187 which checks for valid characters in the URL scheme and executes only a few loops b/c scheme names are quite short usually. My personal vote goes for removal of lines 173-184. Version-Release number of selected component (if applicable): This is present in both Python 3 and Python 2 on all versions I have access to: python3-libs-3.4.1-16.fc21.x86_64.rpm python-libs-2.7.8-5.fc21.x86_64.rpm python-libs-2.7.5-16.el7.x86_64.rpm python-libs-2.6.6-52.el6.x86_64 Lets resolve in Rawhide first and then I'll clone it for other versions/products.
Just to make sure I understand this correctly, this is not about the function being broken. It's about refactoring/removing duplicated code/improving performance, is that right? If so, would you consider opening an upstream bug? I think you pretty much outlined the solution here and I wouldn't want to get credit for your solution (although I can do that if you insist). I suggest you create upstream patch and link it here. Once it's accepted, I'll be happy to backport it to our downstream repos.
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle. Changing version to '22'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.