614767 – encoding problems

Bug 614767 - encoding problems

Summary: encoding problems

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora EPEL
Classification:	Fedora
Component:	koji
Sub Component:
Version:	el6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dennis Gilmore
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-07-15 08:17 UTC by Florian La Roche
Modified:	2017-02-21 16:16 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-02-21 16:16:38 UTC
Type:	---
Embargoed:

Attachments	(Terms of Use)

Description Florian La Roche 2010-07-15 08:17:33 UTC

Description of problem:

With koji-1.4 running on RHEL6-beta2, some email notifications
are not sent out correctly:

Traceback (most recent call last):
  File "/usr/sbin/kojid", line 1437, in runTask
    response = (handler.run(),)
  File "/usr/sbin/kojid", line 1513, in run
    return self.handler(*self.params,**self.opts)
  File "/usr/sbin/kojid", line 3649, in handler
    message = message.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 546: ordinal not in range(128)

Thie seems to happen e.g. on dejavu-fonts, ctags, crontabs,
cronie, amanda, akonadi.

The following change seems to work (tested with ctags and crontabs):

--- builder/kojid
+++ builder/kojid
@@ -3455,7 +3455,7 @@ Status: %(status)s\r

         message = self.message_templ % locals()
         # ensure message is in UTF-8
-        message = message.encode('utf-8')
+        message = koji.fixEncoding(message)

         server = smtplib.SMTP(options.smtphost)
         #server.set_debuglevel(True)
@@ -3646,7 +3646,7 @@ Build Info: %(weburl)s/buildinfo?buildID
         subject = self.subject_templ % locals()
         message = self.message_templ % locals()
         # ensure message is in UTF-8
-        message = message.encode('utf-8')
+        message = koji.fixEncoding(message)

         server = smtplib.SMTP(options.smtphost)
         # server.set_debuglevel(True)



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Florian La Roche 2010-08-31 08:00:32 UTC

Another fix suggestion for this problem from Toshio:
http://lists.fedoraproject.org/pipermail/buildsys/2010-August/003223.html

regards,

Florian La Roche

Comment 2 Toshio Ernie Kuratomi 2010-08-31 16:14:19 UTC

As Florian suggests, using fixEncoding() for that section of code and changing how fixEncoding works is probably the best outcome.  Here's my revised fixEncoding()::

import warnings
def fixEncoding(value, from_encoding=None, fallback=None):
    # fallback is used for backwards compatibility
    if not from_encoding:
        if fallback:
            warnings.warn('fixEncoding() no longer takes a fallback'
                ' keyword arg.  Use from_encoding instead.',
                DeprecationWarning, stacklevel=2)
            from_encoding = fallback
        else:
            from_encoding = 'utf8'

    if isinstance(value, unicode):
        # value is already unicode, so just convert it
        # to a utf8-encoded str
        # Note: with python3, this can fail unless you use an error
        # argument because a unicode string could have been created using
        # the surrogateescape error handler.
        return value.encode('utf8', 'replace')
    else:
        # value is a str but may not be valid utf8 (encoded in latin1, for
        # instance).  Note that the string is almost certain to be mangled
        # in these instances unless you know what encoding the string is in
        # and have set from_encoding to that encoding.
        return value.decode(from_encoding, 'replace').encode('utf8', 'replace')

Note that there's three separate issues with varying severity:

1) UnicodeError known to be thrown for certain notifications.  Florian's fix in this bug's description will solve that.

2) Potential for UnicodeError to be thrown with empty strings.  Removing ``if not value: return value`` from fixEncoding() will fix that.

3) Cosmetically and for debugability, using replacement characters instead of characters from a hardcoded charset is better.  My fixEncoding() will fix that.

Comment 4 Dennis Gilmore 2017-02-21 16:16:38 UTC

closing this bug, if issues still persist then please reopen it

Note You need to log in before you can comment on or make changes to this bug.