Bug 788091 - message charset conversion not working
Summary: message charset conversion not working
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: mysql
Version: 6.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Tom Lane
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-07 13:17 UTC by Karel Volný
Modified: 2012-02-14 17:38 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-14 17:38:30 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Karel Volný 2012-02-07 13:17:53 UTC
Description of problem:
When mysql uses non-latin1 language, the character conversion does not work.

Version-Release number of selected component (if applicable):
mysql-server-5.1.61-1.el6_2.1

How reproducible:
always

Steps to Reproduce:
1. start mysqld with "--language=czech --character-set-server=utf8"
2. mysql --default-character-set=utf8
3. mysql> select bla;
  
Actual results:
ERROR 1054 (42S22): NeznB�m� sloupec 'bla' v field list

Expected results:
ERROR 1054 (42S22): Neznámý sloupec 'bla' ...

("v field list" is incorrect but it is another case)

Additional info:
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.61, for redhat-linux-gnu (i386) using readline 5.1

Connection id:          2
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.1.61 Source distribution
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /var/lib/mysql/mysql.sock
Uptime:                 2 min 4 sec

Threads: 1  Questions: 7  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.56
--------------

Comment 1 Tom Lane 2012-02-07 16:01:16 UTC
This looks like your terminal window is set to use some encoding other than what you told mysql to use (ie, utf8).

Comment 2 Karel Volný 2012-02-08 09:49:08 UTC
(In reply to comment #1)
> This looks like your terminal window is set to use some encoding other than
> what you told mysql to use (ie, utf8).

I've got settings forwarded from my local machine

.qa.[root@x86-64-6s-m1 tps]# locale
LANG=cs_CZ.UTF-8
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=

And it seems to work:

.qa.[root@x86-64-6s-m1 tps]# echo -e "\x50\xC5\x99\xC3\xAD\x6C\x69\xC5\xA1\x20\xC5\xBE\x6C\x75\xC5\xA5\x6F\x75\xC4\x8D\x6B\xC3\xBD\x20\x6B\xC5\xAF\xC5\x88\x20\xC3\xBA\x70\xC4\x9B\x6C\x20\xC4\x8F\xC3\xA1\x62\x65\x6C\x73\x6B\xC3\xA9\x20\x6B\xC3\xB3\x64\x79\x2E"
Příliš žluťoučký kůň úpěl ďábelské kódy.

And in addition, connecting to mysql without charst specified leads to the same output:

.qa.[root@x86-64-6s-m1 tps]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.61 Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.61, for redhat-linux-gnu (x86_64) using readline 5.1

Connection id:          2
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.1.61 Source distribution
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    latin1
Conn.  characterset:    latin1
UNIX socket:            /var/lib/mysql/mysql.sock
Uptime:                 17 min 7 sec

Threads: 1  Questions: 5  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.4
--------------

mysql> select bla;
ERROR 1054 (42S22): NeznB�m� sloupec 'bla' v field list

Comment 3 Karel Volný 2012-02-10 00:01:02 UTC
just for the record, the same happens in RHEL 5 with mysql-5.0.95-1.el5_7.1
(not cloning yet, if this doesn't get resolved in RHEL6, I doubt there would be any chance to get it in RHEL5 ...)

Comment 4 Honza Horak 2012-02-13 16:16:48 UTC
I've reproduced it with default configuration after a fresh install in RHEL-6. What's more, I've tried many combinations, but haven't found a working configuration. 

It seems to be the same as an upstream bug report [1], which has been fixed in mysql-5.4. I tried it in mysql-5.5.20, which is currently in all maintained Fedora releases, and except [2] it works fine there.

Also, 5.1 documentation mentions [3] possible issues with error message encoding and users are redirected to a current mysql-5.5, which is fixed:
"The preceding method of error-message construction can result in messages that contain a mix of character sets unless all items involved contain only ASCII characters. This issue is resolved in MySQL 5.5, in which error messages are constructed internally within the server using UTF-8 and returned to the client in the character set specified by the character_set_results system variable." [3]

Unfortunately, I haven't found a patch that could be easily applied. It looks like a more complicated issue, that probably won't be fixed by upstream in 5.1 any more :(

[1] http://bugs.mysql.com/bug.php?id=1406
[2] http://bugs.mysql.com/bug.php?id=64310
[3] http://dev.mysql.com/doc/refman/5.1/en/charset-errors.html

Comment 5 Tom Lane 2012-02-13 16:34:18 UTC
I'm inclined to consider this a WONTFIX.  Even if we could extract a reasonably-sized patch from mysql 5.5, I would be hesitant to apply it because it would amount to a significant behavioral change, which is exactly the kind of thing our users don't want in a stable RHEL release.  It's not hard to imagine that there are apps out there that are looking at error message texts and will be broken by a change that affects their encoding, even if the new behavior is "more correct".


Note You need to log in before you can comment on or make changes to this bug.