Bug 261961 - Yum does not like non-ascii package names
Summary: Yum does not like non-ascii package names
Alias: None
Product: Fedora
Classification: Fedora
Component: yum
Version: rawhide
Hardware: All
OS: All
Target Milestone: ---
Assignee: Jeremy Katz
QA Contact: Fedora Extras Quality Assurance
Depends On:
Blocks: F8Target
TreeView+ depends on / blocked
Reported: 2007-08-28 22:50 UTC by Nicolas Mailhot
Modified: 2014-01-21 22:59 UTC (History)
6 users (show)

Fixed In Version: 3.2.16-2.fc9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2008-05-29 02:37:15 UTC
Type: ---

Attachments (Terms of Use)
iri to uri function from Django (1.60 KB, text/plain)
2007-08-28 23:51 UTC, Toshio Ernie Kuratomi
no flags Details
Sample yum output plus rpm -qa output (143.70 KB, text/plain)
2008-03-25 15:00 UTC, Bruno Wolff III
no flags Details
encode to utf-8 just before calling urlgrabber (1.21 KB, patch)
2008-05-03 01:12 UTC, Toshio Ernie Kuratomi
no flags Details | Diff
only convert unicode objects to utf8 (1.88 KB, patch)
2008-05-03 04:37 UTC, Toshio Ernie Kuratomi
no flags Details | Diff

Description Nicolas Mailhot 2007-08-28 22:50:50 UTC
Testing with a package which name includes an accented character:
1. rpmbuild, build-in-mock & rpmlint are happy
2. createrepo seems fine too
3. yum search works if you don't include the accented character in the search

yum search colier
Loading "skip-broken" plugin
Excluding Packages in global exclude list

écolier-fonts.noarch                     1.00-0.1.20070628.fc8  local           
Matched from:
Écolier court fonts
Écolier are a set of latin fonts created by Jean-Marie Douteau to mimick the
traditionnal cursive writing French children are taught in school.

He kindly released two of them under the OFL, which are redistributed in this

4. If you do it dies with

yum search écolier
Loading "skip-broken" plugin
Excluding Packages in global exclude list
Cleaning up Everything
Loading "skip-broken" plugin         
Excluding Packages in global exclude list
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in <module>
  File "/usr/share/yum-cli/yummain.py", line 102, in main
    result, resultmsgs = base.doCommands()
  File "/usr/share/yum-cli/cli.py", line 272, in doCommands
    return self.yum_cli_commands[self.basecmd].doCommand(self, self.basecmd,
  File "/usr/share/yum-cli/yumcommands.py", line 343, in doCommand
    return base.search(extcmds)
  File "/usr/share/yum-cli/cli.py", line 829, in search
    for (po, matched_value) in matching:
  File "/usr/lib/python2.5/site-packages/yum/__init__.py", line 1240, in
    if value and value.lower().find(s.lower()) != -1:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal
not in range(128)

5. yum install initially works then dies too

 yum install "écolier-fonts"
Loading "skip-broken" plugin
Excluding Packages in global exclude list
Setting up Install Process
Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package écolier-fonts.noarch 0:1.00-0.1.20070628.fc8 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

 Package                 Arch       Version          Repository        Size 
 écolier-fonts           noarch     1.00-0.1.20070628.fc8  local              66 k

Transaction Summary
Install      1 Package(s)         
Update       0 Package(s)         
Remove       0 Package(s)         

Total download size: 66 k
Is this ok [y/N]: y
Downloading Packages:
/usr/lib64/python2.5/urllib.py:1205: UnicodeWarning: Unicode equal comparison
failed to convert both arguments to Unicode - interpreting them as being unequal
  res = map(safe_map.__getitem__, s)
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in <module>
  File "/usr/share/yum-cli/yummain.py", line 180, in main
  File "/usr/share/yum-cli/cli.py", line 310, in doTransaction
    problems = self.downloadPkgs(downloadpkgs) 
  File "/usr/lib/python2.5/site-packages/yum/__init__.py", line 833, in downloadPkgs
    cache=po.repo.http_caching != 'none',
  File "/usr/lib/python2.5/site-packages/yum/yumRepo.py", line 605, in getPackage
  File "/usr/lib/python2.5/site-packages/yum/yumRepo.py", line 583, in _getFile
  File "/usr/lib/python2.5/site-packages/urlgrabber/mirror.py", line 411, in urlgrab
    return self._mirror_try(func, url, kw)
  File "/usr/lib/python2.5/site-packages/urlgrabber/mirror.py", line 397, in
    return func_ref( *(fullurl,), **kwargs )
  File "/usr/lib/python2.5/site-packages/urlgrabber/grabber.py", line 893, in
    (url,parts) = opts.urlparser.parse(url, opts) 
  File "/usr/lib/python2.5/site-packages/urlgrabber/grabber.py", line 671, in parse
    parts = self.quote(parts)
  File "/usr/lib/python2.5/site-packages/urlgrabber/grabber.py", line 707, in quote
    path = urllib.quote(path)
  File "/usr/lib64/python2.5/urllib.py", line 1205, in quote
    res = map(safe_map.__getitem__, s)
KeyError: u'\xe9'

Comment 1 Toshio Ernie Kuratomi 2007-08-28 23:51:55 UTC
Created attachment 177681 [details]
iri to uri function from Django

Licensed under three clause, new style BSD.

Comment 2 Seth Vidal 2007-08-29 05:11:23 UTC
 Is that license gpl compat?

I don't want to look at the attachment unless it is.

Comment 3 Toshio Ernie Kuratomi 2007-08-29 05:36:14 UTC
Understood.  It is GPL 2 & 3 compatible according to:

BSD License (no advertising) aka 3 Clause BSD.  

Although reading the license again, the whole license text probably needs to be
included at the top of the file if you use it:


Comment 4 Bruno Wolff III 2008-03-25 15:00:06 UTC
Created attachment 299034 [details]
Sample yum output plus rpm -qa output

This seemed to be the closest of three bugs with unicode and yum.
yum search is consistantly failing on my machine with the kitchen sink
installed which also happens to be x86_64 arch. This doesn't happen on either
of two other machines I am running up to date rawhide on. The other two
machines are x86 arch.

Comment 5 Nicolas Mailhot 2008-03-25 15:36:17 UTC
To my knowledge there are no packages with unicode names in Fedora right now. So
this is probably the wrong bug. However there are multiple package with unicode
descriptions or changelogs, which may trigger the bug you hit. Please open a
separate ticket.

Comment 6 Bruno Wolff III 2008-03-25 16:26:16 UTC
OK, I'll do that.

Comment 7 Tim Lauridsen 2008-03-25 16:35:27 UTC
Bruno, please check 
your issue is fixed upstream

Comment 8 Toshio Ernie Kuratomi 2008-04-23 22:32:36 UTC
I just took another look at this and the problem is not as bad as I thought at
first.  urlgrabber handles non-ASCII filenames fine, it's internationalized
domain names where it needs help.

The solution is to realize that urlgrabber doesn't understand how to deal with
unicode which is understandable because it's providing you an interface to
something with no explicit encoding.  So it's dealing with things at the byte
encoded level, not the abstract unicode level.

Assuming that all filenames will be in utf-8 on the filesystem in question all
you need to do is convert to utf-8 before passing the url to urlgrabber::

url = repourl + packagename
<type 'unicode'>

What do you do if the remote server does not encode its filesystem filenames in
utf-8?  You fail.  Unless you can query the server to find out what encoding the
filenames are using, there's no way to make this translation.

Comment 9 Toshio Ernie Kuratomi 2008-04-23 22:33:46 UTC
- urlgrabber.urlgrab(url.encode('utf-8')
+ urlgrabber.urlgrab(url.encode('utf-8'))

Comment 10 Toshio Ernie Kuratomi 2008-05-03 01:12:40 UTC
Created attachment 304434 [details]
encode to utf-8 just before calling urlgrabber

Here's a patch against git HEAD to encode the url to utf-8 just before calling
urlgrabber.  Tested on a file:// and http:// repo fine.

Comment 11 Toshio Ernie Kuratomi 2008-05-03 01:15:31 UTC
Also looked at the search problem in #4 and that seems to have been fixed in git
HEAD by decoding the args and the database results to unicode strings before
using them.

Comment 12 Toshio Ernie Kuratomi 2008-05-03 04:37:49 UTC
Created attachment 304439 [details]
only convert unicode objects to utf8

So apparently, the sometimes hand off unicode objects to urlgrabber and
sometimes we hand off YumRepository objects.

This new patch creates a to_utf8() method like to_unicode() that only converts
if the object passed to it is unicode.	Using it in _get_file() does the right
thing whether we are handing in a unicode or YumRepository.

Comment 13 James Antill 2008-05-03 22:37:59 UTC
 Ok, that last patch looks fine. Applying.

Comment 14 Fedora Update System 2008-05-16 19:05:50 UTC
yum-3.2.16-1.fc9 has been submitted as an update for Fedora 9

Comment 15 Fedora Update System 2008-05-18 15:16:57 UTC
yum-3.2.16-2.fc9 has been submitted as an update for Fedora 9

Comment 16 Fedora Update System 2008-05-29 02:37:02 UTC
yum-3.2.16-2.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.