Red Hat Bugzilla – Bug 327501
Confusion regarding some characters in UTF-8 encoding used for Romanian
Last modified: 2007-11-30 17:12:18 EST
Description of problem:
Due to some very old (historical) incorrect implementations made by Microsoft
long ago, there is a confusion between the following four characters in Romanian
national set, when using UTF-8 encoding:
- "S with comma below" (Unicode 0218) - incorrectly implemented as "S with
cedilla below" (Unicode 015E)
- "s with comma below" (Unicode 0219) - incorrectly implemented as "s with
cedilla below" (Unicode 015F)
- "T with comma below" (Unicode 021A) - incorrectly implemented as "T with
cedilla below" (Unicode 0162)
- "t with comma below" (Unicode 021B) - incorrectly implemented as "t with
cedilla below" (Unicode 0163)
The Romanian National Standard SR 13392:2004 explicitly removes this confusion.
The problem was finally corrected in Microsoft Windows Vista.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Some programs, even if they are UTF-8-aware, use the incorrect characters
Example: Fedora/Red Hat system initialization scripts, when one sets Romanian as
glibc and all programs which depend on it should remove the confusion and use
the "comma below" characters, all-over.
Please see additional info, mainly Microsoft-based, on http://www.secarica.ro
(partly in Romanian).
This has nothing to do with glibc. You need to file bugs against badly
grep -l "`echo -e '[\xc8\x98\xc8\x99\xc8\x9a\xc8\x9b]'`" \
which is from your description correct, while
grep -l "`echo -e '[\xc5\x9e\xc5\x9f\xc5\xa2\xc5\xa3]'`" \
shows what you say is bad, right?
If so, file a bug against packages listed in
rpm -qf $( grep -l "`echo -e '[\xc5\x9e\xc5\x9f\xc5\xa2\xc5\xa3]'`"
/usr/share/locale/ro/LC_MESSAGES/*.mo )| LC_ALL=C sort -u