From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030701 Description of problem: Two problems in gvim with the simplified Chinese locales: 1. The width of some punctuation chars, such as the unicode char [0xff 0xfe 0x1c 0x20] (or UTF-8 [0xe2 0x80 0x9c]) is incorrectly calculated. 2. gvim only recognizes the locale 'zh_CN.GB2312', the other locales such as 'zh_CN.GB18030' (the default locale for simplified Chinese in RH9) and 'zh_CN.GBK' are not correctly handled. I should report these problems to the developers of vim, but I cannot connect to www.vim.org, maybe someone here can help to forward this message to them. Version-Release number of selected component (if applicable): vim-6.2.98-1.1 How reproducible: Always Steps to Reproduce: For reproducing the second bug: env LANG=zh_CN.GB18030 gvim -U NONE any_file_with_Chinese_chars The Chinese characters will be incorrectly displayed. Additional info: Below is a patchfile which solves the problems. --- vim62/src/mbyte.c 2003-06-01 00:12:56.000000000 +0800 +++ vim62.new/src/mbyte.c 2003-07-30 23:06:57.000000000 +0800 @@ -267,6 +267,7 @@ {"5601", IDX_EUC_KR}, /* Sun: KS C 5601 */ {"euccn", IDX_EUC_CN}, {"gb2312", IDX_EUC_CN}, + {"gb18030", IDX_EUC_CN}, {"euctw", IDX_EUC_TW}, #if defined(WIN3264) || defined(WIN32UNIX) || defined(MACOS) {"japan", IDX_CP932}, @@ -959,9 +960,19 @@ * When p_ambw is "double", return 2 for a character with East Asian Width * class 'A'(mbiguous). */ - int -utf_char2cells(c) - int c; + +static int _utf_char2cells(int c); +int utf_char2cells(c) +{ +#if 0 + fprintf(stderr, "enc_dbcs=%d(%d), c = %x, ret = %d\n", + enc_dbcs, DBCS_CHSU, c, _utf_char2cells(c)); +#endif + return (c >= 0x80 && enc_dbcs == DBCS_CHSU) ? 2 : _utf_char2cells(c); +} + +static int _utf_char2cells(c) + int c; { /* sorted list of non-overlapping intervals of East Asian Ambiguous * characters, generated with: @@ -4997,6 +5008,12 @@ int from_prop; int to_prop; + if (from != NULL && !strcmp(from, "euc-cn")) + from = "gb18030"; + + if (to != NULL && !strcmp(to, "euc-cn")) + to = "gb18030"; + /* Reset to no conversion. */ # ifdef USE_ICONV if (vcp->vc_type == CONV_ICONV && vcp->vc_fd != (iconv_t)-1)
Patch was rejected by the upstream maintainer: >The patch suggest changing something in the Unicode function for when >the encoding is using double-byte characters. That can't be right... > GBK contains GB2312 by design as a proper sub-set of GBK in both > characters and encoding, so using euc-cn for GBK is an ok short term > fix as long as only characters from GB2312 are used. I don't have > complete information on gb18030, but again, it seems that GBK is a > sub-set of it so again euc-cn would be an ok short term fix with the > above proviso. > More background for those so inclined here: > http://www.anycities.com/gb18030/introduce.htm >I don't like including half a solution. I would prefer that someone > who knows the differences between the encodings can come up with a > good solution and verify that it works as expected. This also > requires adding some documentation.