327501 – Confusion regarding some characters in UTF-8 encoding used for Romanian

Bug 327501 - Confusion regarding some characters in UTF-8 encoding used for Romanian

Summary: Confusion regarding some characters in UTF-8 encoding used for Romanian

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	http://www.secarica.ro
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-11 11:39 UTC by Răzvan Sandu
Modified:	2007-11-30 22:12 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-10-11 21:18:27 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Răzvan Sandu 2007-10-11 11:39:47 UTC

Description of problem:

Due to some very old (historical) incorrect implementations made by Microsoft
long ago, there is a confusion between the following four characters in Romanian
national set, when using UTF-8 encoding:

- "S with comma below" (Unicode 0218) - incorrectly implemented as "S with
cedilla below" (Unicode 015E)
- "s with comma below" (Unicode 0219) - incorrectly implemented as "s with
cedilla below" (Unicode 015F)
- "T with comma below" (Unicode 021A) - incorrectly implemented as "T with
cedilla below" (Unicode 0162)
- "t with comma below" (Unicode 021B) - incorrectly implemented as "t with
cedilla below" (Unicode 0163)

The Romanian National Standard SR 13392:2004 explicitly removes this confusion.

The problem was finally corrected in Microsoft Windows Vista.


Version-Release number of selected component (if applicable):


How reproducible:
Always.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Some programs, even if they are UTF-8-aware, use the incorrect characters
("cedilla below").
Example: Fedora/Red Hat system initialization scripts, when one sets Romanian as
system-wide language.


Expected results:
glibc and all programs which depend on it should remove the confusion and use
the "comma below" characters, all-over.

Additional info:
Please see additional info, mainly Microsoft-based, on http://www.secarica.ro
(partly in Romanian).

Comment 1 Jakub Jelinek 2007-10-11 21:18:27 UTC

This has nothing to do with glibc.  You need to file bugs against badly
translated packages.
grep -l "`echo -e '[\xc8\x98\xc8\x99\xc8\x9a\xc8\x9b]'`" \
/usr/share/locale/ro/LC_MESSAGES/*.mo
/usr/share/locale/ro/LC_MESSAGES/evolution-2.10.mo
/usr/share/locale/ro/LC_MESSAGES/GConf2.mo
/usr/share/locale/ro/LC_MESSAGES/gnome-games.mo
/usr/share/locale/ro/LC_MESSAGES/gnome-session-2.0.mo
/usr/share/locale/ro/LC_MESSAGES/gnome-utils-2.0.mo
/usr/share/locale/ro/LC_MESSAGES/gtk20.mo
/usr/share/locale/ro/LC_MESSAGES/libgnomeui-2.0.mo
/usr/share/locale/ro/LC_MESSAGES/metacity.mo
/usr/share/locale/ro/LC_MESSAGES/system-config-network.mo
/usr/share/locale/ro/LC_MESSAGES/usermode.mo
which is from your description correct, while
grep -l "`echo -e '[\xc5\x9e\xc5\x9f\xc5\xa2\xc5\xa3]'`" \
/usr/share/locale/ro/LC_MESSAGES/*.mo
shows what you say is bad, right?

If so, file a bug against packages listed in
rpm -qf $( grep -l "`echo -e '[\xc5\x9e\xc5\x9f\xc5\xa2\xc5\xa3]'`"
/usr/share/locale/ro/LC_MESSAGES/*.mo )| LC_ALL=C sort -u

Note You need to log in before you can comment on or make changes to this bug.