Red Hat Bugzilla – Bug 663674
BugzillaBase.openattachment fails with UTF-8 attachment file names (patch included)
Last modified: 2013-03-03 18:02:19 EST
Created attachment 469147 [details]
I downloaded some attachments and got a traceback:
> # here bz is a RHBugzilla object
> file = bz.openattachment(attachment_id)
File "/usr/lib/python2.7/site-packages/bugzilla/base.py", line 681, in openattachment
(dummy,filename) = filename_parm.split('=')
ValueError: too many values to unpack
The filename_parm contained the following string:
So the problem was that the filename contained '=' characters and the code wasn't expecting this. This was fixed by limiting the split count:
(dummy,filename) = filename_parm.split('=', 1)
This way the filename became "right":
I saw a comment in the openattachment() method about RFC 2045. email.header.decode_header() function from standard Python library can decode RFC 2045 strings. Unfortunately the simplest way fails:
email.header.decode_header() was not able to handle the ".pdf" suffix after the encoded sequence. This is discussed in Python upstream: http://bugs.python.org/issue1079. Some Python developers seem to think this is not a bug on Python side. However, this is what we get from Bugzilla.
So the easiest solution was to transform one encoded sequence at a time:
# email.header.decode_header cannot handle strings not ending with '?=',
# so let's transform one =?...?= part at a time
match = re.search("=\?.*?\?=", filename)
if match is None:
filename = filename[:match.start()] + email.header.decode_header(match.group(0)) + filename[match.end():]
After that change the file name of the attachment was ok:
Meteorologické zprávy 5_04.pdf
Steps to Reproduce:
Try to download the second attachment of bug #586615 using python-bugzilla
Exception is thrown
Attachment downloaded with proper file name.
Please consider the attached patch. It makes it possible to work with attachments with UTF-8 names that are returned by Red Hat Bugzilla.
Created attachment 470041 [details]
Better proposed patch
I saw this approach in pymailheaders, it's much better solution.
Created attachment 502352 [details]
An empty test file with a utf-8 filename
This file is being used to test fixes for this bug.
Fix pushed to git master:
F14 is EOL. Doesn't look like this fix ended up there, but it the fixed version is in F16.