Description of problem: Hello, There are 4 incorrect characters in gucharmap's Latin section for the Romanian language. Here are the characters that are incorrect for the Romanian language: - Incorrect "S with cedilla below" (Unicode O1E) instead of correct "S with comma below" (Unicode 0218); - Incorrect "s with cedilla below" (Unicode O1F) instead of correct "s with comma below" (Unicode 0219); - Incorrect "T with cedilla below" (Unicode 0162) instead of correct "T with comma below" (Unicode 021A); - Incorrect "t with cedilla below" (Unicode 0163) instead of correct "t with comma below" (Unicode 021B). Please note that cedilla-below characters *are not* part of the Romanian alphabet at all (it is simply a historical bug). Version-Release number of selected component (if applicable): gucharmap-2.22.1-1.fc9.x86_64 gucharmap-2.22.1-1.fc9.i386 How reproducible: Always. Steps to Reproduce: 1. Launch gucharmap 2. At left, choose Latin set of characters. 3. Select "t with cedilla below" (U+0163) 4. At bottom, Romanian language is referred in text 5. Verify the same issue for the other three characters 6. There are no characters with comma-below in the Latin set Actual results: - wrong characters (cedilla-below) are reffered as being part of the Romanian alphabet, instead of comma-below ones; - there are no correct comma-below characters in the Latin set, to be used in Romanian language. Expected results: - character map should include comma-below characters in the Latin set, along with the cedilla-below ones; - correct (comma-below) characters should be reffered as being part of the Romanian alphabet, instead of the cedilla-below ones. Best regards, Răzvan
Not sure what you are complaining about here. Certainly both variants of the character are present. Are you complaining that the Unicode standard got them wrong ? There is nothing we can do about that here, you'll have to complain to the Unicode consortium at www.unicode.org
Thank you for your response ! When one selects, for example, U+0613 (small t with cedilla below) the text in gucharmap (bottom line) identifies it as being a Romanian-used character. The four characters with cedilla below are simply *not a part* of the Romanian language/alphabet - only the comma-below ones are. This confusion is due to a very old bug in Windows implementations (pre-Linux, before 1993...). It was corrected in Windows Vista; patches are also available for pre-Vista Windowses. The bug in Windows leads to a very large number of Romanian documents, webpages, UIs, etc. containing wrong characters (cedilla-below instead of comma-below ones); correctly-generated documents are incorrectly displayed because the lack of Romanian-correct fonts, etc. Please help eliminating for good this confusion about the Romanian language and documents, which is not easily "visible" to non-Romanian speakers/developers. Best regards, Răzvan
How does ancillary (informative only, no actual usage of it) text change anything about the characters used in documents?
By being disinformative (i.e. by help to maintain a very old and "popular" confusion among Romanian users, especially non-technical ones). If not offered (or aware of) other direct means (like national keyboard layout) of inserting Romanian-specific characters in his documents, a non-technical user seeks an easy-to-use way to do it. He finds gucharmap among the graphical tools in his standard Gnome menus and cuts & paste these characters into the document. If he uses the cedilla-below chars (because of the informative text below) instead of the comma ones, he inserts wrong Unicode in the document. *Especially* because this confusion is very old and affects a large number of users, we seek ways to eliminate it in all aspects/components of Linux (distro doesn't matter): fonts, tools, keyboard maps, etc. Thanks again for your kind help, Răzvan
I'm sorry, but you'll have to complain to the Unicode consortium. The text that gucharmap displays in the statusbar is taken directly from the Unicode character database.
Hello, If one compares these two files from the Unicode Consortium: http://www.unicode.org/charts/PDF/U0180.pdf (Latin Extended-B) http://www.unicode.org/charts/PDF/U0100.pdf (Latin Extended-A) will note (in the first document, page 6) that the comma-below characters are given as the *preffered* ones for Romanian language. They still didn't *entirely* remove the historical error in the second document, since they still list cedilla-below characters as *valid* (i.e. acceptable) for the Romanian language. This is simply a bug in the standard for which I'll fill a bug report. According to the Romanian Academy rules (an to the common, day-by-day practice in Romanian schools, that every Romanian pupil knows), the only acceptable characters are s and t with *comma* below. Of course, this is not so evident to non-Romanian speakers, so the bug in the Unicode standard is easy to understand. Please also note that this bug is *very* old (since 1988, I think), when Microsoft made the first implementations for the Romanian language in DOS/Windows, without and possibility to officially consult the Romanian (Communist) Academy or authorities of the era. But there is no valid reason to perpetuate this bug today (i.e. continue to produce *new* documents and webpages with the wrong characters). The problem *was fixed* in Windows Vista; drivers for a correct Romanian keyboard for pre-Vista Windowses are available at http://www.secarica.ro. An *official* national standard for the Romanian keyboard, with comma-below characters, do exist, i.e. the SR 13392:2004 standard. So please help us fixing this longstanding bug (and *confusion* related to it, among users), even the official correction in the Unicode documents will still last for a while... Many thanks, Răzvan
gucharmap pulls its data directly from the Unicode standard. Thats not going to change...
Hello, For the record, here's the official answer I've received from the Unicode Consortium when I've reported this issue: ------------------------------------------------------------------- Dear Mr. Sandu, My apologies for taking so long to respond to your email. Unfortunately there is nothing we can do about this. It isn't a "subtle error" and it is perfectly obvious to us non-Romanian speakers by the way -- in fact this is a *notorious* issue, and was long ago decided by ISO ballot in WG2. Please see page 228 in TUS 5.0, (http://www.unicode.org/versions/Unicode5.0.0/ch07.pdf) which talks about this issue, and is the best we can do at this point. We already have appropriate annotations for all these characters. Best regards, --------------------------- Magda Danish Sr. Administrative Director The Unicode Consortium 650.693.3921 magda ------------------------------------------------------------------- Now the presence of these "foreign" characters in Romanian setups seems to be a *well-known issue*, that surpasses far beyond the gucharmap issue reported by me in this actual bug. Until the Unicode Consortium fix the actual text of the standard (which may take years...), can we make the Fedora Project developers aware of this and eliminate this bug (foreign characters in a language...) at least in the Fedora distro, all over ? Thanks a lot, Răzvan
This is not a distribution-level decision, period. If the correct characters are not in a font? Fix the font package. If the unicode standard isn't correct? Fix the standard. (If the packages aren't willing to fork the standard from upstream, well, I understand that. Ergo, restoring previous component and resolution.) If existing documents use the wrong characters... well, there's not much we can do to help that at the distribution level.