Bug 83082 - gedit has problems with 3 gb18030 testing files
gedit has problems with 3 gb18030 testing files
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: gedit (Show other bugs)
9
i386 Linux
high Severity medium
: ---
: ---
Assigned To: Havoc Pennington
: Triaged
Depends On:
Blocks: 79579
  Show dependency treegraph
 
Reported: 2003-01-29 20:42 EST by Yu Shao
Modified: 2007-03-27 00:00 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-05-06 03:11:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Screenshot of the problem part of double1.txt (155.39 KB, image/png)
2003-02-11 02:24 EST, David Joo
no flags Details
another problem part in double2.txt (233.89 KB, image/png)
2003-02-11 02:28 EST, David Joo
no flags Details

  None (edit)
Description Yu Shao 2003-01-29 20:42:43 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

Description of problem:
gedit has problems with three standard gb18030 testing files, `cat` them in
gnome-terminal is ok.

1. double1.txt, there is one character whose GB code is A9F0(UxE801), it should
appear as "unicode squre" in gedit, but right now a strange character looks like
a Tibetan. The position is the 1st character of the last line.

2. user1.txt, this file should be totally unicode squares in gedit, but right
now, the first 2 paragraphs have strange characters and spaces.

3. wei.txt, strange wrong bold characters are randomly appeared.
 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.GinGin-re0128.nightly
2.Open those three testing files from here: http://junk.brisbane.redhat.com/gb18030/
3.
    

Additional info:
Comment 1 Yu Shao 2003-01-29 20:47:46 EST
Sorry, about double1.txt, it should be 3 characters, not one, although they are
looked like one large. the unicodes are from UxE801-UxE803.
Comment 2 Leon Ho 2003-01-29 21:15:58 EST
Owen, is it because we have some new fonts that covers these codepoints? 
gnome-terminal will be unified across the screen because it uses one font across 
the way. 
Comment 3 Havoc Pennington 2003-02-03 15:49:55 EST
Is this gedit-specific or does it happen in any GtkTextView widget?
Comment 4 Leon Ho 2003-02-10 19:58:00 EST
it happens in any GtkTextView widget 
Comment 5 Paul Gampe 2003-02-10 21:46:24 EST
This is a blocker for our GB18030 compliance test so raising priority. To meet 
the current GM schedule we must ship CDs for testing to the PRC by Thursday. 
Comment 6 Owen Taylor 2003-02-10 22:44:54 EST
I assume user1.txt is PUA characters. there is no definition of
these at all, so why do you expect them to show as Unicode
hex squares?

Yes, the values presumably come from some font on the system
which has some glyphs in the PUA... if you strace -e open,
you should be able to figure out which one fairly easily.
Comment 7 Yu Shao 2003-02-10 22:58:30 EST
Yeah, user1.txt is in PUA. When testing centre opens the user1.txt, because
there is no corresponding glyph(in GB18030), so they wanted to see full unicode
squares like user2 and user3. 

Comment 8 David Joo 2003-02-11 02:24:49 EST
Created attachment 89987 [details]
Screenshot of the problem part of double1.txt
Comment 9 David Joo 2003-02-11 02:28:18 EST
Created attachment 89988 [details]
another problem part in double2.txt

This is another problem that eng-asia discovered just then, for double1.txt it
looks like some hebrew, but for double2.txt, it looks something like arabic
characters.
Comment 10 Owen Taylor 2003-02-11 15:31:18 EST
double1.txt and double2.txt are just like user1.txt -- since these
codes are not assigned in GB18030, they aren't in the font, fontconfig
displays whatever font in the system does contain them, and some
of our hebrew and arabic fonts have miscellaneous glyphs at various
points in the PUA space.

I don't see how this is incorrect behavior. But can't you remove
fonts-hebrew and fonts-arabic for GB18030 compliance testing?

The characters appearing in Wei.txt are for various reasons:

 - In some cases Zysong doesn't contain assigned Unicode code points
   (such as Arabic digits or Arabic presentation forms)
 - In other cases, some of the fonts on the system contain glyphs
   at unassigned code points for one reason or the other

In no cases does it appear that characters that are present in
Zysong.ttf are appearingly in a different font.

[I'd really like to see us in the future shipping a GB18030
font without the "copied from the Unicode book" glyphs for Arabic 
and other similar languages. It causes considerable problems to 
have a font that pretends to cover languages, but doesn't really 
cover them.]
Comment 11 Leon Ho 2003-02-11 19:38:31 EST
Jeremy, can we exclude the those font packages in comps if it is zh_CN? 
Comment 12 David Joo 2003-02-11 21:06:30 EST
The font pkgs that we have problem with are:
double1.txt - fonts-hebrew
double2.txt - fonts-arabic
user1.txt - bitmap-fonts

I have uninstalled those pkgs and tested the files and display is 'fine' after that.
Comment 13 Leon Ho 2003-05-06 03:11:00 EDT
Closing it for now. 

Note You need to log in before you can comment on or make changes to this bug.