Bug 788091

Summary: message charset conversion not working
Product: Red Hat Enterprise Linux 6 Reporter: Karel Volný <kvolny>
Component: mysqlAssignee: Tom Lane <tgl>
Status: CLOSED WONTFIX QA Contact: qe-baseos-daemons
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: byte, hhorak
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-14 17:38:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Karel Volný 2012-02-07 13:17:53 UTC
Description of problem:
When mysql uses non-latin1 language, the character conversion does not work.

Version-Release number of selected component (if applicable):
mysql-server-5.1.61-1.el6_2.1

How reproducible:
always

Steps to Reproduce:
1. start mysqld with "--language=czech --character-set-server=utf8"
2. mysql --default-character-set=utf8
3. mysql> select bla;
  
Actual results:
ERROR 1054 (42S22): NeznB�m� sloupec 'bla' v field list

Expected results:
ERROR 1054 (42S22): Neznámý sloupec 'bla' ...

("v field list" is incorrect but it is another case)

Additional info:
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.61, for redhat-linux-gnu (i386) using readline 5.1

Connection id:          2
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.1.61 Source distribution
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /var/lib/mysql/mysql.sock
Uptime:                 2 min 4 sec

Threads: 1  Questions: 7  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.56
--------------

Comment 1 Tom Lane 2012-02-07 16:01:16 UTC
This looks like your terminal window is set to use some encoding other than what you told mysql to use (ie, utf8).

Comment 2 Karel Volný 2012-02-08 09:49:08 UTC
(In reply to comment #1)
> This looks like your terminal window is set to use some encoding other than
> what you told mysql to use (ie, utf8).

I've got settings forwarded from my local machine

.qa.[root@x86-64-6s-m1 tps]# locale
LANG=cs_CZ.UTF-8
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=

And it seems to work:

.qa.[root@x86-64-6s-m1 tps]# echo -e "\x50\xC5\x99\xC3\xAD\x6C\x69\xC5\xA1\x20\xC5\xBE\x6C\x75\xC5\xA5\x6F\x75\xC4\x8D\x6B\xC3\xBD\x20\x6B\xC5\xAF\xC5\x88\x20\xC3\xBA\x70\xC4\x9B\x6C\x20\xC4\x8F\xC3\xA1\x62\x65\x6C\x73\x6B\xC3\xA9\x20\x6B\xC3\xB3\x64\x79\x2E"
Příliš žluťoučký kůň úpěl ďábelské kódy.

And in addition, connecting to mysql without charst specified leads to the same output:

.qa.[root@x86-64-6s-m1 tps]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.61 Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.61, for redhat-linux-gnu (x86_64) using readline 5.1

Connection id:          2
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.1.61 Source distribution
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    latin1
Conn.  characterset:    latin1
UNIX socket:            /var/lib/mysql/mysql.sock
Uptime:                 17 min 7 sec

Threads: 1  Questions: 5  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.4
--------------

mysql> select bla;
ERROR 1054 (42S22): NeznB�m� sloupec 'bla' v field list

Comment 3 Karel Volný 2012-02-10 00:01:02 UTC
just for the record, the same happens in RHEL 5 with mysql-5.0.95-1.el5_7.1
(not cloning yet, if this doesn't get resolved in RHEL6, I doubt there would be any chance to get it in RHEL5 ...)

Comment 4 Honza Horak 2012-02-13 16:16:48 UTC
I've reproduced it with default configuration after a fresh install in RHEL-6. What's more, I've tried many combinations, but haven't found a working configuration. 

It seems to be the same as an upstream bug report [1], which has been fixed in mysql-5.4. I tried it in mysql-5.5.20, which is currently in all maintained Fedora releases, and except [2] it works fine there.

Also, 5.1 documentation mentions [3] possible issues with error message encoding and users are redirected to a current mysql-5.5, which is fixed:
"The preceding method of error-message construction can result in messages that contain a mix of character sets unless all items involved contain only ASCII characters. This issue is resolved in MySQL 5.5, in which error messages are constructed internally within the server using UTF-8 and returned to the client in the character set specified by the character_set_results system variable." [3]

Unfortunately, I haven't found a patch that could be easily applied. It looks like a more complicated issue, that probably won't be fixed by upstream in 5.1 any more :(

[1] http://bugs.mysql.com/bug.php?id=1406
[2] http://bugs.mysql.com/bug.php?id=64310
[3] http://dev.mysql.com/doc/refman/5.1/en/charset-errors.html

Comment 5 Tom Lane 2012-02-13 16:34:18 UTC
I'm inclined to consider this a WONTFIX.  Even if we could extract a reasonably-sized patch from mysql 5.5, I would be hesitant to apply it because it would amount to a significant behavioral change, which is exactly the kind of thing our users don't want in a stable RHEL release.  It's not hard to imagine that there are apps out there that are looking at error message texts and will be broken by a change that affects their encoding, even if the new behavior is "more correct".