Bug 107107 - Simplified Chinese support
Summary: Simplified Chinese support
Alias: None
Product: Red Hat Raw Hide
Classification: Retired
Component: vim   
(Show other bugs)
Version: 1.0
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Karsten Hopp
QA Contact: David Lawrence
Depends On:
TreeView+ depends on / blocked
Reported: 2003-10-15 03:38 UTC by Need Real Name
Modified: 2007-04-18 16:58 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-10-29 14:29:41 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Need Real Name 2003-10-15 03:38:47 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030701

Description of problem:
Two problems in gvim with the simplified Chinese locales:

1. The width of some punctuation chars, such as the unicode
char [0xff 0xfe 0x1c 0x20] (or UTF-8 [0xe2 0x80 0x9c]) is incorrectly

2. gvim only recognizes the locale 'zh_CN.GB2312', the other locales
such as 'zh_CN.GB18030' (the default locale for simplified Chinese
in RH9) and 'zh_CN.GBK' are not correctly handled.

I should report these problems to the developers of vim, but
I cannot connect to www.vim.org, maybe someone here can help
to forward this message to them.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
For reproducing the second bug:

env LANG=zh_CN.GB18030 gvim -U NONE any_file_with_Chinese_chars

The Chinese characters will be incorrectly displayed.

Additional info:

Below is a patchfile which solves the problems.

--- vim62/src/mbyte.c   2003-06-01 00:12:56.000000000 +0800
+++ vim62.new/src/mbyte.c       2003-07-30 23:06:57.000000000 +0800
@@ -267,6 +267,7 @@
     {"5601",           IDX_EUC_KR},    /* Sun: KS C 5601 */
     {"euccn",          IDX_EUC_CN},
     {"gb2312",         IDX_EUC_CN},
+    {"gb18030",                IDX_EUC_CN},
     {"euctw",          IDX_EUC_TW},
 #if defined(WIN3264) || defined(WIN32UNIX) || defined(MACOS)
     {"japan",          IDX_CP932},
@@ -959,9 +960,19 @@
  * When p_ambw is "double", return 2 for a character with East Asian Width
  * class 'A'(mbiguous).
-    int
-    int                c;
+static int _utf_char2cells(int c);
+int utf_char2cells(c)
+#if 0
+    fprintf(stderr, "enc_dbcs=%d(%d), c = %x, ret = %d\n",
+           enc_dbcs, DBCS_CHSU, c, _utf_char2cells(c));
+    return (c >= 0x80 && enc_dbcs == DBCS_CHSU) ? 2 : _utf_char2cells(c);
+static int _utf_char2cells(c)
+  int          c;
     /* sorted list of non-overlapping intervals of East Asian Ambiguous
      * characters, generated with:
@@ -4997,6 +5008,12 @@
     int                from_prop;
     int                to_prop;
+    if (from != NULL && !strcmp(from, "euc-cn"))
+        from = "gb18030";
+    if (to != NULL && !strcmp(to, "euc-cn"))
+        to = "gb18030";
     /* Reset to no conversion. */
 # ifdef USE_ICONV
     if (vcp->vc_type == CONV_ICONV && vcp->vc_fd != (iconv_t)-1)

Comment 1 Karsten Hopp 2003-10-29 14:29:41 UTC
Patch was rejected by the upstream maintainer:
>The patch suggest changing something in the Unicode function for when
>the encoding is using double-byte characters.  That can't be right...

> GBK contains GB2312 by design as a proper sub-set of GBK in both
> characters and encoding, so using euc-cn for GBK is an ok short term
> fix as long as only characters from GB2312 are used.  I don't have
> complete information on gb18030, but again, it seems that GBK is a
> sub-set of it so again euc-cn would be an ok short term fix with the
> above proviso.
> More background for those so inclined here:
> http://www.anycities.com/gb18030/introduce.htm

>I don't like including half a solution.  I would prefer that someone 
> who knows the differences between the encodings can come up with a 
> good solution and verify that it works as expected.  This also 
> requires adding some documentation.

Note You need to log in before you can comment on or make changes to this bug.