Bug 59866 - gedit has problem with characters whose unicodes are unassinged
gedit has problem with characters whose unicodes are unassinged
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: libc (Show other bugs)
7.3
All Linux
high Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-02-13 19:15 EST by Yu Shao
Modified: 2013-01-10 16:38 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-06-26 00:27:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
four.txt (26.82 KB, text/plain)
2002-02-13 19:16 EST, Yu Shao
no flags Details

  None (edit)
Description Yu Shao 2002-02-13 19:15:28 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901

Description of problem:
If a text file contains a Chinese gb18030 character whose unicode is unassinged,
then gedit will show a blank page, nothing comes up.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.set locale as zh_CN.GB18030 in /etc/sysconfig/i18n
2.start gnome and use gedit open http://junk.brisbane.redhat.com/gb18030/four.txt
3.you will see blank screen
	

Actual Results:  blank screen

Expected Results:  should be able to see the whole file

Additional info:

If we delete the first line which contains 0x8139ee38(Ux33ff), then everything
will be ok. Ux33ff is a unassigned unicde value.

four.txt is from Chinese government, a standard test file for the new Chinese
locale gb18030
Comment 1 Yu Shao 2002-02-13 19:16:37 EST
Created attachment 45623 [details]
four.txt
Comment 2 Havoc Pennington 2002-02-14 10:24:33 EST
According to Owen, the problem is that libc considers unassigned unicode to be
invalid gb18030, and there was recently a big flamewar about this on the libc 
list. Owen says his opinion is that unassigned unicode should not be treated as
invalid.

Moving to gedit2 using GTK 2 in the release after Hampton will resolve this issue
if nothing else does.
Comment 3 Jay Turner 2002-03-19 17:17:21 EST
Deferring.
Comment 4 Yu Shao 2002-06-26 00:19:02 EDT
Tested with Milan Beta2's gedit with four.txt, gedit reports "couldn't open
because it contains invalid UTF-8 data".
Comment 5 Yu Shao 2002-08-26 22:40:00 EDT
Fixed with the latest glibc gb18030 converter patch.

Note You need to log in before you can comment on or make changes to this bug.