Bug 663674 - BugzillaBase.openattachment fails with UTF-8 attachment file names (patch included)
Summary: BugzillaBase.openattachment fails with UTF-8 attachment file names (patch inc...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: python-bugzilla
Version: 14
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Will Woods
QA Contact: Fedora Extras Quality Assurance
URL: http://example.com
Whiteboard: whiteboard test
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-16 15:28 UTC by Karel Klíč
Modified: 2013-03-03 23:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-04 22:39:28 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Proposed patch (1.36 KB, application/octet-stream)
2010-12-16 15:28 UTC, Karel Klíč
no flags Details
Better proposed patch (1.32 KB, patch)
2010-12-21 17:57 UTC, Karel Klíč
no flags Details | Diff
An empty test file with a utf-8 filename (45 bytes, text/plain)
2011-06-01 18:57 UTC, Will Woods
no flags Details

Description Karel Klíč 2010-12-16 15:28:01 UTC
Created attachment 469147 [details]
Proposed patch

I downloaded some attachments and got a traceback:
> # here bz is a RHBugzilla object
> file = bz.openattachment(attachment_id)
File "/usr/lib/python2.7/site-packages/bugzilla/base.py", line 681, in openattachment
    (dummy,filename) = filename_parm.split('=')
ValueError: too many values to unpack


The filename_parm contained the following string:
filename="=?UTF-8?Q?Meteorologick=C3=A9=20zpr=C3=A1vy=205=5F04?=.pdf"

So the problem was that the filename contained '=' characters and the code wasn't expecting this. This was fixed by limiting the split count:
    (dummy,filename) = filename_parm.split('=', 1)

This way the filename became "right":
=?UTF-8?Q?Meteorologick=C3=A9=20zpr=C3=A1vy=205=5F04?=.pdf

I saw a comment in the openattachment() method about RFC 2045. email.header.decode_header() function from standard Python library can decode RFC 2045 strings. Unfortunately the simplest way fails:
email.header.decode_header("=?UTF-8?Q?Meteorologick=C3=A9=20zpr=C3=A1vy=205=5F04?=.pdf")

email.header.decode_header() was not able to handle the ".pdf" suffix after the encoded sequence. This is discussed in Python upstream: http://bugs.python.org/issue1079. Some Python developers seem to think this is not a bug on Python side. However, this is what we get from Bugzilla.

So the easiest solution was to transform one encoded sequence at a time:
# email.header.decode_header cannot handle strings not ending with '?=',
# so let's transform one =?...?= part at a time
while True:
    match = re.search("=\?.*?\?=", filename)
    if match is None:
        break
    filename = filename[:match.start()] + email.header.decode_header(match.group(0))[0][0] + filename[match.end():]

After that change the file name of the attachment was ok:
Meteorologické zprávy 5_04.pdf

How reproducible:
always

Steps to Reproduce:
Try to download the second attachment of bug #586615 using python-bugzilla
  
Actual results:
Exception is thrown

Expected results:
Attachment downloaded with proper file name.



Please consider the attached patch. It makes it possible to work with attachments with UTF-8 names that are returned by Red Hat Bugzilla.

Comment 1 Karel Klíč 2010-12-21 17:57:36 UTC
Created attachment 470041 [details]
Better proposed patch

Comment 2 Karel Klíč 2010-12-21 17:59:35 UTC
I saw this approach in pymailheaders, it's much better solution.

Comment 3 Will Woods 2011-06-01 18:57:50 UTC
Created attachment 502352 [details]
An empty test file with a utf-8 filename

This file is being used to test fixes for this bug.

Comment 4 Will Woods 2011-06-01 21:30:30 UTC
Fix pushed to git master:

http://git.fedorahosted.org/git/?p=python-bugzilla.git;a=commitdiff;h=38d0834

Comment 5 Cole Robinson 2012-02-04 22:39:28 UTC
F14 is EOL. Doesn't look like this fix ended up there, but it the fixed version is in F16.


Note You need to log in before you can comment on or make changes to this bug.