663674 – BugzillaBase.openattachment fails with UTF-8 attachment file names (patch included)

Bug 663674 - BugzillaBase.openattachment fails with UTF-8 attachment file names (patch included)

Summary: BugzillaBase.openattachment fails with UTF-8 attachment file names (patch inc...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	python-bugzilla
Sub Component:
Version:	14
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Will Woods
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	http://example.com
Whiteboard:	whiteboard test
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-12-16 15:28 UTC by Karel Klíč
Modified:	2013-03-03 23:02 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-02-04 22:39:28 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Proposed patch (1.36 KB, application/octet-stream) 2010-12-16 15:28 UTC, Karel Klíč	no flags	Details
Better proposed patch (1.32 KB, patch) 2010-12-21 17:57 UTC, Karel Klíč	no flags	Details \| Diff
An empty test file with a utf-8 filename (45 bytes, text/plain) 2011-06-01 18:57 UTC, Will Woods	no flags	Details
Show Obsolete (1) View All

Description Karel Klíč 2010-12-16 15:28:01 UTC

Created attachment 469147 [details]
Proposed patch

I downloaded some attachments and got a traceback:
> # here bz is a RHBugzilla object
> file = bz.openattachment(attachment_id)
File "/usr/lib/python2.7/site-packages/bugzilla/base.py", line 681, in openattachment
    (dummy,filename) = filename_parm.split('=')
ValueError: too many values to unpack


The filename_parm contained the following string:
filename="=?UTF-8?Q?Meteorologick=C3=A9=20zpr=C3=A1vy=205=5F04?=.pdf"

So the problem was that the filename contained '=' characters and the code wasn't expecting this. This was fixed by limiting the split count:
    (dummy,filename) = filename_parm.split('=', 1)

This way the filename became "right":
=?UTF-8?Q?Meteorologick=C3=A9=20zpr=C3=A1vy=205=5F04?=.pdf

I saw a comment in the openattachment() method about RFC 2045. email.header.decode_header() function from standard Python library can decode RFC 2045 strings. Unfortunately the simplest way fails:
email.header.decode_header("=?UTF-8?Q?Meteorologick=C3=A9=20zpr=C3=A1vy=205=5F04?=.pdf")

email.header.decode_header() was not able to handle the ".pdf" suffix after the encoded sequence. This is discussed in Python upstream: http://bugs.python.org/issue1079. Some Python developers seem to think this is not a bug on Python side. However, this is what we get from Bugzilla.

So the easiest solution was to transform one encoded sequence at a time:
# email.header.decode_header cannot handle strings not ending with '?=',
# so let's transform one =?...?= part at a time
while True:
    match = re.search("=\?.*?\?=", filename)
    if match is None:
        break
    filename = filename[:match.start()] + email.header.decode_header(match.group(0))[0][0] + filename[match.end():]

After that change the file name of the attachment was ok:
Meteorologické zprávy 5_04.pdf

How reproducible:
always

Steps to Reproduce:
Try to download the second attachment of bug #586615 using python-bugzilla
  
Actual results:
Exception is thrown

Expected results:
Attachment downloaded with proper file name.



Please consider the attached patch. It makes it possible to work with attachments with UTF-8 names that are returned by Red Hat Bugzilla.

Comment 1 Karel Klíč 2010-12-21 17:57:36 UTC

Created attachment 470041 [details]
Better proposed patch

Comment 2 Karel Klíč 2010-12-21 17:59:35 UTC

I saw this approach in pymailheaders, it's much better solution.

Comment 3 Will Woods 2011-06-01 18:57:50 UTC

Created attachment 502352 [details]
An empty test file with a utf-8 filename

This file is being used to test fixes for this bug.

Comment 4 Will Woods 2011-06-01 21:30:30 UTC

Fix pushed to git master:

http://git.fedorahosted.org/git/?p=python-bugzilla.git;a=commitdiff;h=38d0834

Comment 5 Cole Robinson 2012-02-04 22:39:28 UTC

F14 is EOL. Doesn't look like this fix ended up there, but it the fixed version is in F16.

Note You need to log in before you can comment on or make changes to this bug.