Red Hat Bugzilla – Bug 456188
Incorrect characters for Romanian in gucharmap
Last modified: 2013-01-09 23:44:25 EST
Description of problem:
There are 4 incorrect characters in gucharmap's Latin section for the Romanian
Here are the characters that are incorrect for the Romanian language:
- Incorrect "S with cedilla below" (Unicode O1E) instead of correct "S with
comma below" (Unicode 0218);
- Incorrect "s with cedilla below" (Unicode O1F) instead of correct "s with
comma below" (Unicode 0219);
- Incorrect "T with cedilla below" (Unicode 0162) instead of correct "T with
comma below" (Unicode 021A);
- Incorrect "t with cedilla below" (Unicode 0163) instead of correct "t with
comma below" (Unicode 021B).
Please note that cedilla-below characters *are not* part of the Romanian
alphabet at all (it is simply a historical bug).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Launch gucharmap
2. At left, choose Latin set of characters.
3. Select "t with cedilla below" (U+0163)
4. At bottom, Romanian language is referred in text
5. Verify the same issue for the other three characters
6. There are no characters with comma-below in the Latin set
- wrong characters (cedilla-below) are reffered as being part of the Romanian
alphabet, instead of comma-below ones;
- there are no correct comma-below characters in the Latin set, to be used in
- character map should include comma-below characters in the Latin set, along
with the cedilla-below ones;
- correct (comma-below) characters should be reffered as being part of the
Romanian alphabet, instead of the cedilla-below ones.
Not sure what you are complaining about here.
Certainly both variants of the character are present.
Are you complaining that the Unicode standard got them wrong ?
There is nothing we can do about that here, you'll have to complain to the
Unicode consortium at www.unicode.org
Thank you for your response !
When one selects, for example, U+0613 (small t with cedilla below) the text in
gucharmap (bottom line) identifies it as being a Romanian-used character.
The four characters with cedilla below are simply *not a part* of the Romanian
language/alphabet - only the comma-below ones are.
This confusion is due to a very old bug in Windows implementations (pre-Linux,
before 1993...). It was corrected in Windows Vista; patches are also available
for pre-Vista Windowses.
The bug in Windows leads to a very large number of Romanian documents, webpages,
UIs, etc. containing wrong characters (cedilla-below instead of comma-below
ones); correctly-generated documents are incorrectly displayed because the lack
of Romanian-correct fonts, etc.
Please help eliminating for good this confusion about the Romanian language and
documents, which is not easily "visible" to non-Romanian speakers/developers.
How does ancillary (informative only, no actual usage of it) text change
anything about the characters used in documents?
By being disinformative (i.e. by help to maintain a very old and "popular"
confusion among Romanian users, especially non-technical ones).
If not offered (or aware of) other direct means (like national keyboard layout)
of inserting Romanian-specific characters in his documents, a non-technical user
seeks an easy-to-use way to do it. He finds gucharmap among the graphical tools
in his standard Gnome menus and cuts & paste these characters into the document.
If he uses the cedilla-below chars (because of the informative text below)
instead of the comma ones, he inserts wrong Unicode in the document.
*Especially* because this confusion is very old and affects a large number of
users, we seek ways to eliminate it in all aspects/components of Linux (distro
doesn't matter): fonts, tools, keyboard maps, etc.
Thanks again for your kind help,
I'm sorry, but you'll have to complain to the Unicode consortium. The text that
gucharmap displays in the statusbar is taken directly from the Unicode character
If one compares these two files from the Unicode Consortium:
http://www.unicode.org/charts/PDF/U0180.pdf (Latin Extended-B)
http://www.unicode.org/charts/PDF/U0100.pdf (Latin Extended-A)
will note (in the first document, page 6) that the comma-below characters are
given as the *preffered* ones for Romanian language.
They still didn't *entirely* remove the historical error in the second document,
since they still list cedilla-below characters as *valid* (i.e. acceptable) for
the Romanian language. This is simply a bug in the standard for which I'll fill
a bug report.
According to the Romanian Academy rules (an to the common, day-by-day practice
in Romanian schools, that every Romanian pupil knows), the only acceptable
characters are s and t with *comma* below. Of course, this is not so evident to
non-Romanian speakers, so the bug in the Unicode standard is easy to understand.
Please also note that this bug is *very* old (since 1988, I think), when
Microsoft made the first implementations for the Romanian language in
DOS/Windows, without and possibility to officially consult the Romanian
(Communist) Academy or authorities of the era.
But there is no valid reason to perpetuate this bug today (i.e. continue to
produce *new* documents and webpages with the wrong characters).
The problem *was fixed* in Windows Vista; drivers for a correct Romanian
keyboard for pre-Vista Windowses are available at http://www.secarica.ro. An
*official* national standard for the Romanian keyboard, with comma-below
characters, do exist, i.e. the SR 13392:2004 standard.
So please help us fixing this longstanding bug (and *confusion* related to it,
among users), even the official correction in the Unicode documents will still
last for a while...
gucharmap pulls its data directly from the Unicode standard.
Thats not going to change...
For the record, here's the official answer I've received from the Unicode Consortium when I've reported this issue:
Dear Mr. Sandu,
My apologies for taking so long to respond to your email.
Unfortunately there is nothing we can do about this. It isn't a "subtle error" and it is perfectly obvious to us non-Romanian speakers by the way -- in fact this is a *notorious* issue, and was long ago decided by ISO ballot in WG2. Please see page 228 in TUS 5.0, (http://www.unicode.org/versions/Unicode5.0.0/ch07.pdf) which talks about this issue, and is the best we can do at this point. We already have appropriate annotations for all these characters.
Sr. Administrative Director
The Unicode Consortium
Now the presence of these "foreign" characters in Romanian setups seems to be a *well-known issue*, that surpasses far beyond the gucharmap issue reported by me in this actual bug.
Until the Unicode Consortium fix the actual text of the standard (which may take years...), can we make the Fedora Project developers aware of this and eliminate this bug (foreign characters in a language...) at least in the Fedora distro, all over ?
Thanks a lot,
This is not a distribution-level decision, period.
If the correct characters are not in a font? Fix the font package.
If the unicode standard isn't correct? Fix the standard. (If the packages aren't willing to fork the standard from upstream, well, I understand that. Ergo, restoring previous component and resolution.)
If existing documents use the wrong characters... well, there's not much we can do to help that at the distribution level.