1187469 – Can't input non-ASCII UTF-8 characters in mysql 10.0.x client in Fedora 21

Bug 1187469 - Can't input non-ASCII UTF-8 characters in mysql 10.0.x client in Fedora 21

Summary: Can't input non-ASCII UTF-8 characters in mysql 10.0.x client in Fedora 21

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mariadb
Sub Component:
Version:	21
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jan Staněk
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:	1202022 1460155
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-30 06:18 UTC by LiuYan
Modified:	2017-06-09 09:19 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Clones:	1202022 (view as bug list)
Environment:
Last Closed:	2015-03-19 04:06:07 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description LiuYan 2015-01-30 06:18:00 UTC

Description of problem:
When I type or paste Chinese characters in mysql client, the Chinese characters are not inputed.

My `locale` result:
    LANG=zh_CN.utf8
    LC_CTYPE="zh_CN.utf8"
    LC_NUMERIC="zh_CN.utf8"
    LC_TIME="zh_CN.utf8"
    LC_COLLATE="zh_CN.utf8"
    LC_MONETARY="zh_CN.utf8"
    LC_MESSAGES="zh_CN.utf8"
    LC_PAPER="zh_CN.utf8"
    LC_NAME="zh_CN.utf8"
    LC_ADDRESS="zh_CN.utf8"
    LC_TELEPHONE="zh_CN.utf8"
    LC_MEASUREMENT="zh_CN.utf8"
    LC_IDENTIFICATION="zh_CN.utf8"
    LC_ALL=
     

My `localectl` result
       System Locale: LANG=zh_CN.UTF-8
           VC Keymap: us
          X11 Layout: us,cn
         X11 Variant: ,
     

Version-Release number of selected component (if applicable):
10.0.x


How reproducible:
Always.

Steps to Reproduce:
1. run mysql client
2. copy & paste the following clauses in mysql client:
   SELECT 'Chinese characters <汉字> are stripped';
3. '汉字' are stripped out in mysql client

Actual results:
MariaDB [(none)]> SELECT 'Chinese characters <> are stripped';
+------------------------------------+
| Chinese characters <> are stripped |
+------------------------------------+
| Chinese characters <> are stripped |
+------------------------------------+


Expected results:
MariaDB [(none)]> SELECT 'Chinese characters <汉字> are stripped';
+------------------------------------------+
| Chinese characters <汉字> are stripped   |
+------------------------------------------+
| Chinese characters <汉字> are stripped   |
+------------------------------------------+


Additional info:
The binaries from mariadb.org works ok:
1. win32 version works ok
2. Linux x86_64 works ok
3. Linux glibc_2.14+ x86_64 works ok

and a user in #mariadb IRC channel report it works in Ubuntu.

The 5.5.x version in Fedora 20 also works.

But, I can choose the history command (which is issued in Linux x86_64 version from mariadb.org, and contains Chinese characters) and the history command works ok.

Comment 1 Jan Staněk 2015-02-03 13:03:44 UTC

Hi, the trouble seems to originate from the fact that unlike the upstream-provided binaries, fedora's mariadb packages uses libedit for cli client input handling (you could verify that by invoking mysql -V; the output should end with "using EditLine wrapper" if using libedit). I also tried to input some czech chracters not in ASCII range (č, š, etc.) and was not able to input them, so this issue is not limited to Chinese.

According to quick google search, this library had in the past issues with handling UTF-8 input. The official page [1] however claims UTF-8 support, if compiled with correct configure options (which the version included in Fedora is).
So this might be fault of either libedit itself, or it could be that mariadb has wrapped it badly. Asking for input from Boris Ranto, maintainer of libedit.

Boris, do you know if current version of libedit truly supports UTF-8 input?

[1] http://thrysoee.dk/editline/

Comment 2 Boris Ranto 2015-02-26 15:59:35 UTC

Hi, it does support utf-8 but the support is not as straightforward as you might expect. The source code needs to be modified in order to support utf-8. Please see file examples/wtc1.c provided in dist tarball [1] for more details on how to benefit from the utf-8 support.

[1] http://thrysoee.dk/editline/libedit-20141030-3.1.tar.gz

Comment 3 LiuYan 2015-03-09 07:18:27 UTC

I found this feature request of gnuplot project from search engine:
https://sourceforge.net/p/gnuplot/feature-requests/265/#112c
https://sourceforge.net/p/gnuplot/feature-requests/265/#c343

>> Feeding it a UTF8 string causes a segfault.
So far as I can make out, it does not even try to support UTF8. Instead it supports MS-style wide characters (16-bit fixed width), which is not very useful in a UTF8 environment. 

>> I see no mention of UTF encodings in the editline source code. I think there is a typo or misunderstanding in the description that you saw. It would in any case do us no good to convert UTF-8 on input to widechar, because the internal gnuplot code and the support libraries (at least on linux) do not use widechar. We really do want the UTF-8 byte sequence passed through without any conversion.

I'm not sure if it's useful, because the reply date is 2010-09-09 (about 4.5 years ago). maybe editline had added utf-8 support in these years, I don't know.

But if those reply messages are still be true, then maybe mariadb need to be build against readline like the official mariadb binaries did to support UTF-8 characters.

Comment 4 Honza Horak 2015-03-14 10:45:23 UTC

I think it may be fixed directly in mariadb or in libedit, we just need to find the best way. Just reported: https://mariadb.atlassian.net/browse/MDEV-7777

Comment 5 Honza Horak 2015-03-14 19:39:54 UTC

Cross-reporting also to MySQL upstream, since they have the same issues there:
bugs.mysql.com/bug.php?id=76324

Note You need to log in before you can comment on or make changes to this bug.