Red Hat Bugzilla – Bug 59866
gedit has problem with characters whose unicodes are unassinged
Last modified: 2013-01-10 16:38:07 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901
Description of problem:
If a text file contains a Chinese gb18030 character whose unicode is unassinged,
then gedit will show a blank page, nothing comes up.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.set locale as zh_CN.GB18030 in /etc/sysconfig/i18n
2.start gnome and use gedit open http://junk.brisbane.redhat.com/gb18030/four.txt
3.you will see blank screen
Actual Results: blank screen
Expected Results: should be able to see the whole file
If we delete the first line which contains 0x8139ee38(Ux33ff), then everything
will be ok. Ux33ff is a unassigned unicde value.
four.txt is from Chinese government, a standard test file for the new Chinese
Created attachment 45623 [details]
According to Owen, the problem is that libc considers unassigned unicode to be
invalid gb18030, and there was recently a big flamewar about this on the libc
list. Owen says his opinion is that unassigned unicode should not be treated as
Moving to gedit2 using GTK 2 in the release after Hampton will resolve this issue
if nothing else does.
Tested with Milan Beta2's gedit with four.txt, gedit reports "couldn't open
because it contains invalid UTF-8 data".
Fixed with the latest glibc gb18030 converter patch.