Bug 261961

Summary: Yum does not like non-ascii package names
Product: [Fedora] Fedora Reporter: Nicolas Mailhot <nicolas.mailhot>
Component: yumAssignee: Jeremy Katz <katzj>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: medium    
Version: rawhideCC: a.badger, bruno, herrold, james.antill, pmatilai, tim.lauridsen
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: 3.2.16-2.fc9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-29 02:37:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 235704    
Attachments:
Description Flags
iri to uri function from Django
none
Sample yum output plus rpm -qa output
none
encode to utf-8 just before calling urlgrabber
none
only convert unicode objects to utf8 none

Description Nicolas Mailhot 2007-08-28 22:50:50 UTC
Testing with a package which name includes an accented character:
(http://nim.fedorapeople.org/%c3%a9colier-fonts-1.00-0.1.20070628.fc8.src.rpm)
1. rpmbuild, build-in-mock & rpmlint are happy
2. createrepo seems fine too
3. yum search works if you don't include the accented character in the search
string. 

yum search colier
Loading "skip-broken" plugin
Excluding Packages in global exclude list
Finished



écolier-fonts.noarch                     1.00-0.1.20070628.fc8  local           
Matched from:
écolier-fonts
Écolier court fonts
Écolier are a set of latin fonts created by Jean-Marie Douteau to mimick the
traditionnal cursive writing French children are taught in school.

He kindly released two of them under the OFL, which are redistributed in this
package.
http://perso.orange.fr/jm.douteau/page_ecolier.htm

4. If you do it dies with

yum search écolier
Loading "skip-broken" plugin
Excluding Packages in global exclude list
Finished
Cleaning up Everything
Loading "skip-broken" plugin         
Excluding Packages in global exclude list
Finished
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in <module>
    yummain.main(sys.argv[1:])
  File "/usr/share/yum-cli/yummain.py", line 102, in main
    result, resultmsgs = base.doCommands()
  File "/usr/share/yum-cli/cli.py", line 272, in doCommands
    return self.yum_cli_commands[self.basecmd].doCommand(self, self.basecmd,
self.extcmds)
  File "/usr/share/yum-cli/yumcommands.py", line 343, in doCommand
    return base.search(extcmds)
  File "/usr/share/yum-cli/cli.py", line 829, in search
    for (po, matched_value) in matching:
  File "/usr/lib/python2.5/site-packages/yum/__init__.py", line 1240, in
searchGenerator
    if value and value.lower().find(s.lower()) != -1:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal
not in range(128)

5. yum install initially works then dies too

 yum install "écolier-fonts"
Loading "skip-broken" plugin
Excluding Packages in global exclude list
Finished
Setting up Install Process
Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package écolier-fonts.noarch 0:1.00-0.1.20070628.fc8 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================
 Package                 Arch       Version          Repository        Size 
=============================================================================
Installing:
 écolier-fonts           noarch     1.00-0.1.20070628.fc8  local              66 k

Transaction Summary
=============================================================================
Install      1 Package(s)         
Update       0 Package(s)         
Remove       0 Package(s)         

Total download size: 66 k
Is this ok [y/N]: y
Downloading Packages:
/usr/lib64/python2.5/urllib.py:1205: UnicodeWarning: Unicode equal comparison
failed to convert both arguments to Unicode - interpreting them as being unequal
  res = map(safe_map.__getitem__, s)
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in <module>
    yummain.main(sys.argv[1:])
  File "/usr/share/yum-cli/yummain.py", line 180, in main
    base.doTransaction()
  File "/usr/share/yum-cli/cli.py", line 310, in doTransaction
    problems = self.downloadPkgs(downloadpkgs) 
  File "/usr/lib/python2.5/site-packages/yum/__init__.py", line 833, in downloadPkgs
    cache=po.repo.http_caching != 'none',
  File "/usr/lib/python2.5/site-packages/yum/yumRepo.py", line 605, in getPackage
    cache=cache
  File "/usr/lib/python2.5/site-packages/yum/yumRepo.py", line 583, in _getFile
    http_headers=headers,
  File "/usr/lib/python2.5/site-packages/urlgrabber/mirror.py", line 411, in urlgrab
    return self._mirror_try(func, url, kw)
  File "/usr/lib/python2.5/site-packages/urlgrabber/mirror.py", line 397, in
_mirror_try
    return func_ref( *(fullurl,), **kwargs )
  File "/usr/lib/python2.5/site-packages/urlgrabber/grabber.py", line 893, in
urlgrab
    (url,parts) = opts.urlparser.parse(url, opts) 
  File "/usr/lib/python2.5/site-packages/urlgrabber/grabber.py", line 671, in parse
    parts = self.quote(parts)
  File "/usr/lib/python2.5/site-packages/urlgrabber/grabber.py", line 707, in quote
    path = urllib.quote(path)
  File "/usr/lib64/python2.5/urllib.py", line 1205, in quote
    res = map(safe_map.__getitem__, s)
KeyError: u'\xe9'

Comment 1 Toshio Ernie Kuratomi 2007-08-28 23:51:55 UTC
Created attachment 177681 [details]
iri to uri function from Django

Licensed under three clause, new style BSD.

Comment 2 Seth Vidal 2007-08-29 05:11:23 UTC
Toshio,
 Is that license gpl compat?

I don't want to look at the attachment unless it is.



Comment 3 Toshio Ernie Kuratomi 2007-08-29 05:36:14 UTC
Understood.  It is GPL 2 & 3 compatible according to:
http://fedoraproject.org/wiki/Licensing

BSD License (no advertising) aka 3 Clause BSD.  

Although reading the license again, the whole license text probably needs to be
included at the top of the file if you use it:

http://code.djangoproject.com/browser/django/trunk/LICENSE


Comment 4 Bruno Wolff III 2008-03-25 15:00:06 UTC
Created attachment 299034 [details]
Sample yum output plus rpm -qa output

This seemed to be the closest of three bugs with unicode and yum.
yum search is consistantly failing on my machine with the kitchen sink
installed which also happens to be x86_64 arch. This doesn't happen on either
of two other machines I am running up to date rawhide on. The other two
machines are x86 arch.

Comment 5 Nicolas Mailhot 2008-03-25 15:36:17 UTC
To my knowledge there are no packages with unicode names in Fedora right now. So
this is probably the wrong bug. However there are multiple package with unicode
descriptions or changelogs, which may trigger the bug you hit. Please open a
separate ticket.

Comment 6 Bruno Wolff III 2008-03-25 16:26:16 UTC
OK, I'll do that.

Comment 7 Tim Lauridsen 2008-03-25 16:35:27 UTC
Bruno, please check 
https://bugzilla.redhat.com/show_bug.cgi?id=438633
your issue is fixed upstream


Comment 8 Toshio Ernie Kuratomi 2008-04-23 22:32:36 UTC
I just took another look at this and the problem is not as bad as I thought at
first.  urlgrabber handles non-ASCII filenames fine, it's internationalized
domain names where it needs help.

The solution is to realize that urlgrabber doesn't understand how to deal with
unicode which is understandable because it's providing you an interface to
something with no explicit encoding.  So it's dealing with things at the byte
encoded level, not the abstract unicode level.

Assuming that all filenames will be in utf-8 on the filesystem in question all
you need to do is convert to utf-8 before passing the url to urlgrabber::

url = repourl + packagename
type(url)
<type 'unicode'>
urlgrabber.urlgrab(url.encode('utf-8')

What do you do if the remote server does not encode its filesystem filenames in
utf-8?  You fail.  Unless you can query the server to find out what encoding the
filenames are using, there's no way to make this translation.

Comment 9 Toshio Ernie Kuratomi 2008-04-23 22:33:46 UTC
- urlgrabber.urlgrab(url.encode('utf-8')
+ urlgrabber.urlgrab(url.encode('utf-8'))

Comment 10 Toshio Ernie Kuratomi 2008-05-03 01:12:40 UTC
Created attachment 304434 [details]
encode to utf-8 just before calling urlgrabber

Here's a patch against git HEAD to encode the url to utf-8 just before calling
urlgrabber.  Tested on a file:// and http:// repo fine.

Comment 11 Toshio Ernie Kuratomi 2008-05-03 01:15:31 UTC
Also looked at the search problem in #4 and that seems to have been fixed in git
HEAD by decoding the args and the database results to unicode strings before
using them.

Comment 12 Toshio Ernie Kuratomi 2008-05-03 04:37:49 UTC
Created attachment 304439 [details]
only convert unicode objects to utf8

So apparently, the sometimes hand off unicode objects to urlgrabber and
sometimes we hand off YumRepository objects.

This new patch creates a to_utf8() method like to_unicode() that only converts
if the object passed to it is unicode.	Using it in _get_file() does the right
thing whether we are handing in a unicode or YumRepository.

Comment 13 James Antill 2008-05-03 22:37:59 UTC
 Ok, that last patch looks fine. Applying.


Comment 14 Fedora Update System 2008-05-16 19:05:50 UTC
yum-3.2.16-1.fc9 has been submitted as an update for Fedora 9

Comment 15 Fedora Update System 2008-05-18 15:16:57 UTC
yum-3.2.16-2.fc9 has been submitted as an update for Fedora 9

Comment 16 Fedora Update System 2008-05-29 02:37:02 UTC
yum-3.2.16-2.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.