Bug 1202022 - Allow input of non-ASCII UTF-8 characters in readline() function
Summary: Allow input of non-ASCII UTF-8 characters in readline() function
Alias: None
Product: Fedora
Classification: Fedora
Component: libedit
Version: 21
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Boris Ranto
QA Contact: Fedora Extras Quality Assurance
Depends On:
Blocks: 1180403 1187469 1197479 1460155
TreeView+ depends on / blocked
Reported: 2015-03-14 15:58 UTC by Honza Horak
Modified: 2017-06-09 09:19 UTC (History)
9 users (show)

Fixed In Version: libedit-3.1-10.20141030cvs.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of: 1187469
: 1460155 (view as bug list)
Last Closed: 2015-03-21 04:53:28 UTC
Type: Bug

Attachments (Terms of Use)
patch that allows non-ascii characters in readline() (1003 bytes, patch)
2015-03-14 15:59 UTC, Honza Horak
no flags Details | Diff
patch that allows non-ascii characters in readline() (612 bytes, patch)
2015-03-15 09:43 UTC, Honza Horak
no flags Details | Diff
proposed patch to support few basics key bindings (1.53 KB, patch)
2015-03-15 09:46 UTC, Honza Horak
no flags Details | Diff

Description Honza Horak 2015-03-14 15:58:57 UTC
There are actually two issues mariadb reported for Fedora: bug #1187469 that non-ASCII characters don't work in interactive mode and bug #1180403 that backward search doesn't work -- both caused by missing features of libedit's readline() function, in comparison to readline's readline() :)

The problem is that mysql binary uses just readline() function from libedit library and this function only offers basic functionality. So the possible fix is either to use el_wgets() function from libedit and configure the backward search feature in mysql binary or (imho better for much more components) fix libedit's readline() function to offer this by default (or as another function, but just offer a simpler usage, as it does readline library).

+++ This bug was initially created as a clone of Bug #1187469 +++

Description of problem:
When I type or paste Chinese characters in mysql client, the Chinese characters are not inputed.

My `locale` result:

My `localectl` result
       System Locale: LANG=zh_CN.UTF-8
           VC Keymap: us
          X11 Layout: us,cn
         X11 Variant: ,

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. run mysql client
2. copy & paste the following clauses in mysql client:
   SELECT 'Chinese characters <汉字> are stripped';
3. '汉字' are stripped out in mysql client

Actual results:
MariaDB [(none)]> SELECT 'Chinese characters <> are stripped';
| Chinese characters <> are stripped |
| Chinese characters <> are stripped |

Expected results:
MariaDB [(none)]> SELECT 'Chinese characters <汉字> are stripped';
| Chinese characters <汉字> are stripped   |
| Chinese characters <汉字> are stripped   |

Additional info:
The binaries from mariadb.org works ok:
1. win32 version works ok
2. Linux x86_64 works ok
3. Linux glibc_2.14+ x86_64 works ok

and a user in #mariadb IRC channel report it works in Ubuntu.

The 5.5.x version in Fedora 20 also works.

But, I can choose the history command (which is issued in Linux x86_64 version from mariadb.org, and contains Chinese characters) and the history command works ok.

--- Additional comment from Jan Staněk on 2015-02-03 08:03:44 EST ---

Hi, the trouble seems to originate from the fact that unlike the upstream-provided binaries, fedora's mariadb packages uses libedit for cli client input handling (you could verify that by invoking mysql -V; the output should end with "using EditLine wrapper" if using libedit). I also tried to input some czech chracters not in ASCII range (č, š, etc.) and was not able to input them, so this issue is not limited to Chinese.

According to quick google search, this library had in the past issues with handling UTF-8 input. The official page [1] however claims UTF-8 support, if compiled with correct configure options (which the version included in Fedora is).
So this might be fault of either libedit itself, or it could be that mariadb has wrapped it badly. Asking for input from Boris Ranto, maintainer of libedit.

Boris, do you know if current version of libedit truly supports UTF-8 input?

[1] http://thrysoee.dk/editline/

--- Additional comment from Boris Ranto on 2015-02-26 10:59:35 EST ---

Hi, it does support utf-8 but the support is not as straightforward as you might expect. The source code needs to be modified in order to support utf-8. Please see file examples/wtc1.c provided in dist tarball [1] for more details on how to benefit from the utf-8 support.

[1] http://thrysoee.dk/editline/libedit-20141030-3.1.tar.gz

--- Additional comment from LiuYan on 2015-03-09 03:18:27 EDT ---

I found this feature request of gnuplot project from search engine:

>> Feeding it a UTF8 string causes a segfault.
So far as I can make out, it does not even try to support UTF8. Instead it supports MS-style wide characters (16-bit fixed width), which is not very useful in a UTF8 environment. 

>> I see no mention of UTF encodings in the editline source code. I think there is a typo or misunderstanding in the description that you saw. It would in any case do us no good to convert UTF-8 on input to widechar, because the internal gnuplot code and the support libraries (at least on linux) do not use widechar. We really do want the UTF-8 byte sequence passed through without any conversion.

I'm not sure if it's useful, because the reply date is 2010-09-09 (about 4.5 years ago). maybe editline had added utf-8 support in these years, I don't know.

But if those reply messages are still be true, then maybe mariadb need to be build against readline like the official mariadb binaries did to support UTF-8 characters.

--- Additional comment from Honza Horak on 2015-03-14 06:45:23 EDT ---

I think it may be fixed directly in mariadb or in libedit, we just need to find the best way. Just reported: https://mariadb.atlassian.net/browse/MDEV-7777

Comment 1 Honza Horak 2015-03-14 15:59:55 UTC
Created attachment 1001719 [details]
patch that allows non-ascii characters in readline()

Comment 2 Honza Horak 2015-03-15 09:43:17 UTC
Created attachment 1001894 [details]
patch that allows non-ascii characters in readline()

I've found that upstream tarball from community-mysql package bundles libedit that actually supports non-ASCII characters and also supports recursive search and other key bindings.

The attached patch does the trick to handles wchars if the source is UTF-8 and seems to be more correct patch than the previous. Btw. this patch is proofed to work since it has been used in mysql tar ball for some time already.

Comment 3 Honza Horak 2015-03-15 09:46:54 UTC
Created attachment 1001895 [details]
proposed patch to support few basics key bindings

This is also part of the bundled libedit from mysql, which except others enables recursive search in readline().

Comment 4 Boris Ranto 2015-03-17 14:53:05 UTC
The first patch looks perfectly reasonable and it looks like it really fixes a bug in libedit sources (the function el_getc already did that sort of thing, not sure why el_gets did not) so I'll go ahead and apply it.

The other patch looks a bit more convoluted and I'm not sure whether this is a thing that should be handled in libedit or mariadb. I'll contact libedit upstream to get their opinion on this matter.

Comment 5 Fedora Update System 2015-03-18 18:13:03 UTC
libedit-3.1-10.20141030cvs.fc22 has been submitted as an update for Fedora 22.

Comment 6 Fedora Update System 2015-03-18 18:13:12 UTC
libedit-3.1-10.20141030cvs.fc21 has been submitted as an update for Fedora 21.

Comment 7 LiuYan 2015-03-19 03:40:05 UTC
looks like it fixed the mariadb/mysql utf8 characters input issue (although i don't why it does not work for Asterisk package, maybe it's another story). 

# mysql -V
mysql  Ver 15.1 Distrib 10.0.17-MariaDB, for Linux (x86_64) using  EditLine wrapper

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.0.17-MariaDB MariaDB Server

Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select '汉字';
| 汉字   |
| 汉字   |
1 row in set (0.01 sec)

Comment 8 Fedora Update System 2015-03-19 18:38:09 UTC
Package libedit-3.1-10.20141030cvs.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libedit-3.1-10.20141030cvs.fc21'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2015-03-21 04:53:28 UTC
libedit-3.1-10.20141030cvs.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2015-03-21 04:53:56 UTC
libedit-3.1-10.20141030cvs.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 11 Boris Ranto 2015-03-25 11:05:45 UTC
btw: Good news, both patches were accepted upstream so the next rebase will contain even the key bindings patch.

Comment 12 Boris Ranto 2015-03-26 11:30:53 UTC
I've created the update requests for the rebased packages that contain the keyboard bindings patch:


Test the updated packages please and upvote them if you want to see them in stable ASAP.

Comment 13 LiuYan 2015-03-27 08:00:56 UTC
Karma +1

Note You need to log in before you can comment on or make changes to this bug.