Bug 963698

Summary: UnicodeEncodeError in pkcon install-local
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: PackageKitAssignee: Richard Hughes <rhughes>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: awilliam, jonathan, rdieter, rhughes, robatino, smparrish, vbocek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-05-30 09:06:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 834090    

Description Kamil Páral 2013-05-16 12:16:40 UTC
Description of problem:

[kparal@dhcp-29-206 ~]$ pkcon install-local Downloads/rpmfusion-free-release-branched.noarch.rpm
Instalují se soubory         [=========================]         
Čeká se v řadě            [=========================]         
Spouští se                  [=========================]         
Provádí se                  [=========================]         
Vážná chyba: Error Type: <type 'exceptions.UnicodeEncodeError'>
Error Value: 'ascii' codec can't encode character u'\u0159' in position 2: ordinal not in range(128)
  File : /usr/share/PackageKit/helpers/yum/yumBackend.py, line 3590, in <module>
    main()
  File : /usr/share/PackageKit/helpers/yum/yumBackend.py, line 3587, in main
    backend.dispatcher(sys.argv[1:])
  File : /usr/lib/python2.7/site-packages/packagekit/backend.py, line 711, in dispatcher
    self.dispatch_command(args[0], args[1:])
  File : /usr/lib/python2.7/site-packages/packagekit/backend.py, line 616, in dispatch_command
    self.install_files(transaction_flags, files_to_inst)
  File : /usr/share/PackageKit/helpers/yum/yumBackend.py, line 2115, in install_files
    self.error(ERROR_MISSING_GPG_SIGNATURE, _to_unicode(e), exit=False)
  File : /usr/lib/python2.7/site-packages/packagekit/backend.py, line 156, in error
    print("error\t$s\t$s" $ (err, _to_utf8(description)))
  File : /usr/lib/python2.7/site-packages/packagekit/backend.py, line 48, in _to_utf8
    return str(txt)


If I run the command in en_US.UTF-8 locale, everything works.

Version-Release number of selected component (if applicable):
PackageKit-0.8.8-2.fc19.x86_64

How reproducible:
always

Steps to Reproduce:
1. have cs_CZ.UTF-8 locale
2. run pkcon install-local rpmfusion-free-release-branched.noarch.rpm (or some other third-party RPM)

Additional info:
I think this could be a duplicate of bug 960687.

Comment 1 Kamil Páral 2013-05-16 12:19:28 UTC
We don't have a proper Final release criteria when it comes to third-party packages, but it's a so common operation that it should warrant at least a blocker bug discussion. Nominating.

Expected impact: All languages that have a non-ascii string returned from yum backend. Probably a lot of languages.

Comment 2 Kamil Páral 2013-05-16 12:22:38 UTC
This might be the same bug with a different traceback:
http://lists.fedoraproject.org/pipermail/test/2013-May/115495.html

Comment 3 Richard Hughes 2013-05-16 14:37:41 UTC
This reproducer illustrates the problem:

Save this as gpg-lang-test.py:

#!/usr/bin/python

import os
import yum
from yum.packages import YumLocalPackage

yb = yum.YumBase()

for attrname in ("gpgcheck", "repo_gpgcheck", "localpkg_gpgcheck"):
    if hasattr(yb.conf, attrname):
        setattr(yb.conf, attrname, True)

po = YumLocalPackage(ts=yb.rpmdb.readOnlyTS(), filename="/home/hughsie/Downloads/rpmfusion-free-release-branched.noarch.rpm")
try:
    yb._checkSignatures([po], None)
except yum.Errors.YumGPGCheckError, e:
    print str(e)
print "all working"

running it like this "sudo LANG=cs_CZ.UTF-8 ./gpg-lang-test.py" gives:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0159' in position 2: ordinal not in range(128)

Even doing this "print unicode(e, 'utf-8', errors='replace')" produces:

TypeError: coercing to Unicode: need string or buffer, YumGPGCheckError found

Which seems to suggest something like this is required?

diff --git a/backends/yum/yumBackend.py b/backends/yum/yumBackend.py
index 8e04999..289f10f 100755
--- a/backends/yum/yumBackend.py
+++ b/backends/yum/yumBackend.py
@@ -44,6 +44,7 @@ from yum.callbacks import *
 from yum.misc import prco_tuple_to_string, unique
 from yum.packages import YumLocalPackage, parsePackages
 from yum.packageSack import MetaSack
+from yum.Errors import YumBaseError
 import rpmUtils
 import exceptions
 import types
@@ -120,7 +121,9 @@ def sigquit(signum, frame):
     sys.exit(1)
 
 def _to_unicode(txt, encoding='utf-8'):
-    if isinstance(txt, basestring):
+    if isinstance(txt, YumBaseError):
+        txt = unicode(txt)
+    elif isinstance(txt, basestring):
         if not isinstance(txt, unicode):
             txt = unicode(txt, encoding, errors='replace')
     return txt

This certainly solves the bug for me, but I'm kinda wondering why YumBaseError.__str__() would return anything other than unicode.

Comment 4 Zdeněk Pavlas 2013-05-17 07:51:02 UTC
def _to_unicode(txt, encoding='utf-8'):
    if isinstance(txt, str):
        # decode str to unicode
        return unicode(txt, encoding, errors='replace')
    # exception instances
    return unicode(txt)

I don't think we need isinstance(txt, YumBaseError) check there.
This is simpler, and works as expected in all 4 cases:

>>> _to_unicode(u'ěšč')
u'\u011b\u0161\u010d'
>>> _to_unicode('ěšč')
u'\u011b\u0161\u010d'
>>> _to_unicode(yum.Errors.YumBaseError('ěšč'))
u'\u011b\u0161\u010d'
>>> _to_unicode(yum.Errors.YumBaseError(u'ěšč'))
u'\u011b\u0161\u010d'

> but I'm kinda wondering why YumBaseError.__str__() would return anything other than unicode.

__str__() is called when Python needs bytes, and yes, it should always work and not raise encoding errors.  The recommended practice is to do all string formatting in __unicode__() and have __str__() just return self.__unicode__().encode('utf-8').  This implements it.

http://lists.baseurl.org/pipermail/yum-devel/2013-May/010157.html

Comment 5 Richard Hughes 2013-05-17 11:54:18 UTC
(In reply to comment #4)
> self.__unicode__().encode('utf-8').  This implements it.
> http://lists.baseurl.org/pipermail/yum-devel/2013-May/010157.html

Awesome. Can you pin me when this gets into a released yum version and I'll drop the hack in PK and dep on the new yum version. Thanks!

Comment 6 Adam Williamson 2013-05-29 18:24:48 UTC
How does this relate to https://bugzilla.redhat.com/show_bug.cgi?id=963810 ? Does that PackageKit (PackageKit-0.8.9-1.fc19 ) fix the problem?

Comment 7 Adam Williamson 2013-05-29 18:33:06 UTC
Discussed at 2013-05-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-05-29/f19final-blocker-review-1.2013-05-29-16.02.log.txt . We'd at least like to see the results of testing with the latest PK before making a call on this.

Comment 8 Vojtěch Boček 2013-05-30 08:23:21 UTC
It does not crash for me with PackageKit-0.8.9-1.fc19.x86_64.

Comment 9 Kamil Páral 2013-05-30 09:06:24 UTC
Verified fixed with PackageKit-0.8.9-1.fc19.x86_64 and yum-3.4.3-83.fc19.noarch. Closing.

Richard, if you're still waiting for something from Zdeněk, feel free to reopen to track it.