481786 – pydot does not support unicode :)

Bug 481786 - pydot does not support unicode :)

Summary: pydot does not support unicode :)

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	pydot
Sub Component:
Version:	10
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Tom "spot" Callaway
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-01-27 16:22 UTC by Pierre-YvesChibon
Modified:	2009-08-31 16:59 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-08-31 16:59:59 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pierre-YvesChibon 2009-01-27 16:22:18 UTC

Description of problem:

#!/usr/bin/python
#-*- coding: UTF-8 -*-

import pydot
n1 = u"Thérèse Doe"
n2 = u"Jean-Pierre Toué"

# Does not work
g = pydot.Dot()
g.add_edge(pydot.Edge(n1, n2))
g.write_jpeg('test.jpg')

# Work :)
g = pydot.Dot()
g.add_edge(pydot.Edge(n1.encode('UTF-8'), n2.encode('UTF-8')))
g.write_jpeg('test.jpg')

Version-Release number of selected component (if applicable):
pydot-1.0.2-1.fc10.noarch


How reproducible:
Always

Steps to Reproduce:
1.run the code given above
2.
3.
  
Actual results:
Crashed

Expected results:
See part two of the code :)

Additional info:
Not that I foreseen

Comment 1 Tom "spot" Callaway 2009-02-03 20:27:34 UTC

Unicode in python gives me a headache. All my attempts to fix this just made it worse. Hopefully upstream will have better luck than I did.

Filed upstream here:
http://code.google.com/p/pydot/issues/detail?id=24

Comment 2 Pierre-YvesChibon 2009-02-03 20:35:50 UTC

I've had quite some headache trying to understand it :)

Thanks for reporting it.

Comment 3 Toshio Ernie Kuratomi 2009-05-29 07:26:44 UTC

This one looks like it isn't a bug to me. Rather, it's a request for an API change.

Right now, pydot accepts str type. It does not accept unicode type. So the user is forced to change the unicode strings that they have into byte strings before sending it into a pydot function. That's why n1.encode('UTF-8') is necessary.

This makes some sense as pydot must interact with the world outside of python in the form of the /usr/bin/dot command. pydot communicates with that command by writing the information for /usr/bin/dot to a temporary file and then having /usr/bin/dot operate on that file. In order to create the temporary file, pydot must deal in byte strings (str). In the current code, the user gives pydot byte strings and pydot writes those out directly to the file. The user performs the conversion from unicode type to utf-8 encoded byte string.

In order for pydot to handle unicode strings instead of byte strings, it would need to make the conversion that the user is currently doing. That shouldn't be too hard as /usr/bin/dot will accept utf-8 and all unicode strings can be encoded to utf-8. However, for sanity of the pydot upstream, pydot probably should stop accepting byte strings when it makes this switch. So end-user code similar to this will start to fail:

g.add_edge(pydot.Edge('Th\xe9r\xe8se Doe'))

If pydot upstream chooses to accept both byte strings and unicode type, it will have to take into account what happens when the user provides byte strings that are not valid utf-8 and also unicode strings. If they aren't careful, pydot will get confused about what it needs to do in this situation and either crash or output garbage.

Making this sort of API change should only be done by upstream.

Comment 4 Tom "spot" Callaway 2009-08-31 16:59:59 UTC

This is filed upstream, but upstream seems to be gone. :/

http://code.google.com/p/pydot/issues/detail?id=24

Since I agree with Toshio, we won't be doing a one-off fix here, especially since it breaks API, I'm closing this ticket out.

Note You need to log in before you can comment on or make changes to this bug.