456188 – Incorrect characters for Romanian in gucharmap

Bug 456188 - Incorrect characters for Romanian in gucharmap

Summary: Incorrect characters for Romanian in gucharmap

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gucharmap
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Matthias Clasen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	http://www.secarica.ro
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-07-22 01:59 UTC by Răzvan Sandu
Modified:	2013-01-10 04:44 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-08-25 19:00:34 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Răzvan Sandu 2008-07-22 01:59:27 UTC

Description of problem:

Hello,

There are 4 incorrect characters in gucharmap's Latin section for the Romanian
language.

Here are the characters that are incorrect for the Romanian language:

- Incorrect "S with cedilla below" (Unicode O1E) instead of correct "S with
comma below" (Unicode 0218);

- Incorrect "s with cedilla below" (Unicode O1F) instead of correct "s with
comma below" (Unicode 0219);

- Incorrect "T with cedilla below" (Unicode 0162) instead of correct "T with
comma below" (Unicode 021A);

- Incorrect "t with cedilla below" (Unicode 0163) instead of correct "t with
comma below" (Unicode 021B).

Please note that cedilla-below characters *are not* part of the Romanian
alphabet at all (it is simply a historical bug).


Version-Release number of selected component (if applicable):
gucharmap-2.22.1-1.fc9.x86_64
gucharmap-2.22.1-1.fc9.i386

How reproducible:
Always.

Steps to Reproduce:
1. Launch gucharmap
2. At left, choose Latin set of characters.
3. Select "t with cedilla below" (U+0163)
4. At bottom, Romanian language is referred in text
5. Verify the same issue for the other three characters
6. There are no characters with comma-below in the Latin set

  
Actual results:

- wrong characters (cedilla-below) are reffered as being part of the Romanian
alphabet, instead of comma-below ones;

- there are no correct comma-below characters in the Latin set, to be used in
Romanian language.


Expected results:

- character map should include comma-below characters in the Latin set, along
with the cedilla-below ones;

- correct (comma-below) characters should be reffered as being part of the
Romanian alphabet, instead of the cedilla-below ones.


Best regards,
Răzvan

Comment 1 Matthias Clasen 2008-07-25 03:44:22 UTC

Not sure what you are complaining about here. 
Certainly both variants of the character are present.
Are you complaining that the Unicode standard got them wrong ? 
There is nothing we can do about that here, you'll have to complain to the 
Unicode consortium at www.unicode.org

Comment 2 Răzvan Sandu 2008-07-25 12:48:36 UTC

Thank you  for your response !

When one selects, for example, U+0613 (small t with cedilla below) the text in
gucharmap (bottom line) identifies it as being a Romanian-used character.

The four characters with cedilla below are simply *not a part* of the Romanian
language/alphabet - only the comma-below ones are.

This confusion is due to a very old bug in Windows implementations (pre-Linux,
before 1993...). It was corrected in Windows Vista; patches are also available
for pre-Vista Windowses.

The bug in Windows leads to a very large number of Romanian documents, webpages,
UIs, etc. containing wrong characters (cedilla-below instead of comma-below
ones); correctly-generated documents are incorrectly displayed because the lack
of Romanian-correct fonts, etc.

Please help eliminating for good this confusion about the Romanian language and
documents, which is not easily "visible" to non-Romanian speakers/developers.


Best regards,
Răzvan

Comment 3 Bill Nottingham 2008-07-25 15:05:23 UTC

How does ancillary (informative only, no actual usage of it) text change
anything about the characters used in documents?

Comment 4 Răzvan Sandu 2008-07-25 16:50:56 UTC

By being disinformative (i.e. by help to maintain a very old and "popular"
confusion among Romanian users, especially non-technical ones).

If not offered (or aware of) other direct means (like national keyboard layout)
of inserting Romanian-specific characters in his documents, a non-technical user
seeks an easy-to-use way to do it. He finds gucharmap among the graphical tools
in his standard Gnome menus and cuts & paste these characters into the document.
If he uses the cedilla-below chars (because of the informative text below)
instead of the comma ones, he inserts wrong Unicode in the document.

*Especially* because this confusion is very old and affects a large number of
users, we seek ways to eliminate it in all aspects/components of Linux (distro
doesn't matter): fonts, tools, keyboard maps, etc.


Thanks again for your kind help,
Răzvan

Comment 5 Matthias Clasen 2008-08-01 04:11:07 UTC

I'm sorry, but you'll have to complain to the Unicode consortium. The text that
gucharmap displays in the statusbar is taken directly from the Unicode character
database.

Comment 6 Răzvan Sandu 2008-08-01 06:44:10 UTC

Hello,

If one compares these two files from the Unicode Consortium:

http://www.unicode.org/charts/PDF/U0180.pdf (Latin Extended-B)
http://www.unicode.org/charts/PDF/U0100.pdf (Latin Extended-A)

will note (in the first document, page 6) that the comma-below characters are
given as the *preffered* ones for Romanian language.

They still didn't *entirely* remove the historical error in the second document,
since they still list cedilla-below characters as *valid* (i.e. acceptable) for
the Romanian language. This is simply a bug in the standard for which I'll fill
a bug report.

According to the Romanian Academy rules (an to the common, day-by-day practice
in Romanian schools, that every Romanian pupil knows), the only acceptable
characters are s and t with *comma* below. Of course, this is not so evident to
non-Romanian speakers, so the bug in the Unicode standard is easy to understand.

Please also note that this bug is *very* old (since 1988, I think), when
Microsoft made the first implementations for the Romanian language in
DOS/Windows, without and possibility to officially consult the Romanian
(Communist) Academy or authorities of the era.

But there is no valid reason to perpetuate this bug today (i.e. continue to
produce *new* documents and webpages with the wrong characters).

The problem *was fixed* in Windows Vista; drivers for a correct Romanian
keyboard for pre-Vista Windowses are available at http://www.secarica.ro. An
*official* national standard for the Romanian keyboard, with comma-below
characters, do exist, i.e. the SR 13392:2004 standard.

So please help us fixing this longstanding bug (and *confusion* related to it,
among users), even the official correction in the Unicode documents will still
last for a while...

Many thanks,
Răzvan

Comment 7 Matthias Clasen 2008-08-01 17:45:34 UTC

gucharmap pulls its data directly from the Unicode standard. 
Thats not going to change...

Comment 8 Răzvan Sandu 2008-08-19 06:17:27 UTC

Hello,

For the record, here's the official answer I've received from the Unicode Consortium when I've reported this issue:

-------------------------------------------------------------------

Dear Mr. Sandu,

My apologies for taking so long to respond to your email.
Unfortunately there is nothing we can do about this. It isn't a "subtle error" and it is perfectly obvious to us non-Romanian speakers by the way -- in fact this is a *notorious* issue, and was long ago decided by ISO ballot in WG2. Please see page 228 in TUS 5.0, (http://www.unicode.org/versions/Unicode5.0.0/ch07.pdf) which talks about this issue, and is the best we can do at this point. We already have appropriate annotations for all these characters.

Best regards,
---------------------------
Magda Danish
Sr. Administrative Director
The Unicode Consortium
650.693.3921
magda

-------------------------------------------------------------------


Now the presence of these "foreign" characters in Romanian setups seems to be a *well-known issue*, that surpasses far beyond the gucharmap issue reported  by me in this actual bug.

Until the Unicode Consortium fix the actual text of the standard (which may take years...), can we make the Fedora Project developers aware of this and eliminate this bug (foreign characters in a language...) at least in the Fedora distro, all over ?


Thanks a lot,
Răzvan

Comment 9 Bill Nottingham 2008-08-25 19:00:34 UTC

This is not a distribution-level decision, period.

If the correct characters are not in a font? Fix the font package.

If the unicode standard isn't correct? Fix the standard. (If the packages aren't willing to fork the standard from upstream, well, I understand that. Ergo, restoring previous component and resolution.)

If existing documents use the wrong characters... well, there's not much we can do to help that at the distribution level.

Note You need to log in before you can comment on or make changes to this bug.