107107 – Simplified Chinese support

Bug 107107 - Simplified Chinese support

Summary: Simplified Chinese support

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Raw Hide
Classification:	Retired
Component:	vim
Sub Component:
Version:	1.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Karsten Hopp
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-10-15 03:38 UTC by Need Real Name
Modified:	2007-04-18 16:58 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-10-29 14:29:41 UTC
Embargoed:

Attachments	(Terms of Use)

Description Need Real Name 2003-10-15 03:38:47 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030701

Description of problem:
Two problems in gvim with the simplified Chinese locales:

1. The width of some punctuation chars, such as the unicode
char [0xff 0xfe 0x1c 0x20] (or UTF-8 [0xe2 0x80 0x9c]) is incorrectly
calculated.

2. gvim only recognizes the locale 'zh_CN.GB2312', the other locales
such as 'zh_CN.GB18030' (the default locale for simplified Chinese
in RH9) and 'zh_CN.GBK' are not correctly handled.

I should report these problems to the developers of vim, but
I cannot connect to www.vim.org, maybe someone here can help
to forward this message to them.


Version-Release number of selected component (if applicable):
vim-6.2.98-1.1

How reproducible:
Always

Steps to Reproduce:
For reproducing the second bug:

env LANG=zh_CN.GB18030 gvim -U NONE any_file_with_Chinese_chars

The Chinese characters will be incorrectly displayed.

Additional info:

Below is a patchfile which solves the problems.

--- vim62/src/mbyte.c   2003-06-01 00:12:56.000000000 +0800
+++ vim62.new/src/mbyte.c       2003-07-30 23:06:57.000000000 +0800
@@ -267,6 +267,7 @@
     {"5601",           IDX_EUC_KR},    /* Sun: KS C 5601 */
     {"euccn",          IDX_EUC_CN},
     {"gb2312",         IDX_EUC_CN},
+    {"gb18030",                IDX_EUC_CN},
     {"euctw",          IDX_EUC_TW},
 #if defined(WIN3264) || defined(WIN32UNIX) || defined(MACOS)
     {"japan",          IDX_CP932},
@@ -959,9 +960,19 @@
  * When p_ambw is "double", return 2 for a character with East Asian Width
  * class 'A'(mbiguous).
  */
-    int
-utf_char2cells(c)
-    int                c;
+
+static int _utf_char2cells(int c);
+int utf_char2cells(c)
+{
+#if 0
+    fprintf(stderr, "enc_dbcs=%d(%d), c = %x, ret = %d\n",
+           enc_dbcs, DBCS_CHSU, c, _utf_char2cells(c));
+#endif
+    return (c >= 0x80 && enc_dbcs == DBCS_CHSU) ? 2 : _utf_char2cells(c);
+}
+
+static int _utf_char2cells(c)
+  int          c;
 {
     /* sorted list of non-overlapping intervals of East Asian Ambiguous
      * characters, generated with:
@@ -4997,6 +5008,12 @@
     int                from_prop;
     int                to_prop;
 
+    if (from != NULL && !strcmp(from, "euc-cn"))
+        from = "gb18030";
+
+    if (to != NULL && !strcmp(to, "euc-cn"))
+        to = "gb18030";
+
     /* Reset to no conversion. */
 # ifdef USE_ICONV
     if (vcp->vc_type == CONV_ICONV && vcp->vc_fd != (iconv_t)-1)

Comment 1 Karsten Hopp 2003-10-29 14:29:41 UTC

Patch was rejected by the upstream maintainer:
>The patch suggest changing something in the Unicode function for when
>the encoding is using double-byte characters.  That can't be right...

> GBK contains GB2312 by design as a proper sub-set of GBK in both
> characters and encoding, so using euc-cn for GBK is an ok short term
> fix as long as only characters from GB2312 are used.  I don't have
> complete information on gb18030, but again, it seems that GBK is a
> sub-set of it so again euc-cn would be an ok short term fix with the
> above proviso.
> More background for those so inclined here:
> http://www.anycities.com/gb18030/introduce.htm

>I don't like including half a solution.  I would prefer that someone 
> who knows the differences between the encodings can come up with a 
> good solution and verify that it works as expected.  This also 
> requires adding some documentation.

Note You need to log in before you can comment on or make changes to this bug.