Bug 59866

Summary: gedit has problem with characters whose unicodes are unassinged
Product: [Retired] Red Hat Linux Reporter: Yu Shao <yshao>
Component: libcAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 7.3CC: drepper, eng-asia-list, otaylor, pcormier
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-06-26 04:27:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
four.txt none

Description Yu Shao 2002-02-14 00:15:28 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901

Description of problem:
If a text file contains a Chinese gb18030 character whose unicode is unassinged,
then gedit will show a blank page, nothing comes up.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.set locale as zh_CN.GB18030 in /etc/sysconfig/i18n
2.start gnome and use gedit open http://junk.brisbane.redhat.com/gb18030/four.txt
3.you will see blank screen
	

Actual Results:  blank screen

Expected Results:  should be able to see the whole file

Additional info:

If we delete the first line which contains 0x8139ee38(Ux33ff), then everything
will be ok. Ux33ff is a unassigned unicde value.

four.txt is from Chinese government, a standard test file for the new Chinese
locale gb18030

Comment 1 Yu Shao 2002-02-14 00:16:37 UTC
Created attachment 45623 [details]
four.txt

Comment 2 Havoc Pennington 2002-02-14 15:24:33 UTC
According to Owen, the problem is that libc considers unassigned unicode to be
invalid gb18030, and there was recently a big flamewar about this on the libc 
list. Owen says his opinion is that unassigned unicode should not be treated as
invalid.

Moving to gedit2 using GTK 2 in the release after Hampton will resolve this issue
if nothing else does.


Comment 3 Jay Turner 2002-03-19 22:17:21 UTC
Deferring.

Comment 4 Yu Shao 2002-06-26 04:19:02 UTC
Tested with Milan Beta2's gedit with four.txt, gedit reports "couldn't open
because it contains invalid UTF-8 data".


Comment 5 Yu Shao 2002-08-27 02:40:00 UTC
Fixed with the latest glibc gb18030 converter patch.