Bug 173990 - Exported Python ABI not upstream compatible
Summary: Exported Python ABI not upstream compatible
Alias: None
Product: Fedora
Classification: Fedora
Component: python
Version: 4
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Mihai Ibanescu
QA Contact: Brock Organ
Depends On:
TreeView+ depends on / blocked
Reported: 2005-11-23 14:35 UTC by Mike Hearn
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Clone Of:
Last Closed: 2005-11-23 16:12:16 UTC

Attachments (Terms of Use)

Description Mike Hearn 2005-11-23 14:35:09 UTC
Python in FC4 is compiled with UCS4 unicode strings instead of the upstream
default of UCS2, this changes the exported ABI and renders Fedora libpython
incompatible with other distributions. That's a serious problem, it means nobody
who wishes to distribute binaries on Linux can realistically embed Python or use
Python C modules in their app!

Comment 1 Mihai Ibanescu 2005-11-23 16:12:16 UTC
Red Hat has been shipping Python with UCS4 support since Red Hat Linux 9, if my
memory serves right. Doing so gives us access to large character sets, and it
also uncovered a number of bugs that were fixed upstream ever since. I don't
think we are in the position at this point to go back to UCS2, since we perceive
the future to be UCS4.

The way people solve the ABI incompatibility problem is by shipping packages for
specific distributions (which is probably the right thing to do in general).

Comment 2 Rahul Sundaram 2005-11-23 16:16:47 UTC
If the future is UCS4 why isnt the upstream python package providing that by
default?. Has these patches been pushed there?. Distribution specific packages
for all third party software embedding python is a no go

Comment 3 Mihai Ibanescu 2005-11-23 16:42:43 UTC
I believe the answer is the amount of memory you are willing to use.
Python encodes Unicode characters as a fixed-size array of chars. This has the
advantage of simplifying a lot of the string operations, but it doubles or
quadruples the memory requirements if what you use mostly is ASCII only. The
flipside would have been to use UTF8 for the internal representation of Unicode
chars, which saves you memory when you use ASCII only (or mostly ASCII with very
few non-ASCII), but then string operations would be slowed down, since computing
the length of a string would require examining each character in the string to
see if it's a single-byte or multi-byte char.
UCS2 is using 2 bytes for each char, UCS4 uses 4 bytes. Unicode defines more
than 65535 characters, so a 2-byte representation is not enough (although, if I
understand correctly, characters outside of UCS2 are not that frequently used).

Moving from UCS2 to UCS4 (or the other way around) is a rather major undertake;
we decided to make that move some time ago, and we are trying to preserve the
ABI within Red Hat's products and avoid the exact type of problem you describe
_within_ Fedora/RHEL.

There is no one-size-fits-all, unfortunately: people who don't care about
characters outside of UCS2 would probably want the extra memory back; OTOH, some
people want the ability to represent those chars.

That being said: which distros are still shipping UCS2? As far as I know, SuSE
ships UCS4 since 9, Debian ships UCS4. Mandriva 2006 seems to still be UCS2. 

Note You need to log in before you can comment on or make changes to this bug.