Bug 663674 - BugzillaBase.openattachment fails with UTF-8 attachment file names (patch included)
BugzillaBase.openattachment fails with UTF-8 attachment file names (patch inc...
Product: Fedora
Classification: Fedora
Component: python-bugzilla (Show other bugs)
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: Will Woods
Fedora Extras Quality Assurance
whiteboard test
Depends On:
  Show dependency treegraph
Reported: 2010-12-16 10:28 EST by Karel Klíč
Modified: 2013-03-03 18:02 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-02-04 17:39:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Proposed patch (1.36 KB, application/octet-stream)
2010-12-16 10:28 EST, Karel Klíč
no flags Details
Better proposed patch (1.32 KB, patch)
2010-12-21 12:57 EST, Karel Klíč
no flags Details | Diff
An empty test file with a utf-8 filename (45 bytes, text/plain)
2011-06-01 14:57 EDT, Will Woods
no flags Details

  None (edit)
Description Karel Klíč 2010-12-16 10:28:01 EST
Created attachment 469147 [details]
Proposed patch

I downloaded some attachments and got a traceback:
> # here bz is a RHBugzilla object
> file = bz.openattachment(attachment_id)
File "/usr/lib/python2.7/site-packages/bugzilla/base.py", line 681, in openattachment
    (dummy,filename) = filename_parm.split('=')
ValueError: too many values to unpack

The filename_parm contained the following string:

So the problem was that the filename contained '=' characters and the code wasn't expecting this. This was fixed by limiting the split count:
    (dummy,filename) = filename_parm.split('=', 1)

This way the filename became "right":

I saw a comment in the openattachment() method about RFC 2045. email.header.decode_header() function from standard Python library can decode RFC 2045 strings. Unfortunately the simplest way fails:

email.header.decode_header() was not able to handle the ".pdf" suffix after the encoded sequence. This is discussed in Python upstream: http://bugs.python.org/issue1079. Some Python developers seem to think this is not a bug on Python side. However, this is what we get from Bugzilla.

So the easiest solution was to transform one encoded sequence at a time:
# email.header.decode_header cannot handle strings not ending with '?=',
# so let's transform one =?...?= part at a time
while True:
    match = re.search("=\?.*?\?=", filename)
    if match is None:
    filename = filename[:match.start()] + email.header.decode_header(match.group(0))[0][0] + filename[match.end():]

After that change the file name of the attachment was ok:
Meteorologické zprávy 5_04.pdf

How reproducible:

Steps to Reproduce:
Try to download the second attachment of bug #586615 using python-bugzilla
Actual results:
Exception is thrown

Expected results:
Attachment downloaded with proper file name.

Please consider the attached patch. It makes it possible to work with attachments with UTF-8 names that are returned by Red Hat Bugzilla.
Comment 1 Karel Klíč 2010-12-21 12:57:36 EST
Created attachment 470041 [details]
Better proposed patch
Comment 2 Karel Klíč 2010-12-21 12:59:35 EST
I saw this approach in pymailheaders, it's much better solution.
Comment 3 Will Woods 2011-06-01 14:57:50 EDT
Created attachment 502352 [details]
An empty test file with a utf-8 filename

This file is being used to test fixes for this bug.
Comment 4 Will Woods 2011-06-01 17:30:30 EDT
Fix pushed to git master:

Comment 5 Cole Robinson 2012-02-04 17:39:28 EST
F14 is EOL. Doesn't look like this fix ended up there, but it the fixed version is in F16.

Note You need to log in before you can comment on or make changes to this bug.