Bug 113231

Summary: [PostgreSQL] - ERROR: Invalid UNICODE character sequence found (0xc000)
Product: [Fedora] Fedora Reporter: Antonio <fedora>
Component: postgresqlAssignee: Tom Lane <tgl>
Status: CLOSED ERRATA QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 1CC: hhorak
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-05-12 04:52:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Antonio 2004-01-10 01:56:48 UTC
Description of problem:

Unable to query a UNICODE database with LIKE 'z%'

Version-Release number of selected component (if applicable):

Linux Fedora Core 1.

uname -r returns:
2.4.22-1.2140.nptl

In psql \encoding reports me:
UNICODE

Using psql SELECT version() returns:

PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)

How reproducible:

ALWAYS

Steps to Reproduce:

1.Create a Database using UNICODE encoding:
   createdb -E UNICODE test
2. Run psql:
   psql test
3. Create a test table:
  CREATE TABLE auth_role
  (
    rol_id  int4  not null,
    rol_name    varchar(50)   unique not null,
    rol_enable  boolean       default true,

    primary key(rol_id)
  );

INSERT INTO auth_role(rol_id, rol_name) VALUES (1, 'admin');
INSERT INTO auth_role(rol_id, rol_name) VALUES (2, 'zorro');

4. Execute this query:

SELECT * FROM auth_role WHERE rol_name LIKE 'z%';

or 

SELECT * FROM auth_role WHERE rol_name LIKE 'Z%';
 
Actual results:

ERROR:  Invalid UNICODE character sequence found (0xc000):

Expected results:

0 or more rows without error.

Additional info:

If you replace 'z%' with 'za%' or 'a%' you don't get the error.

Seems like the offending sequence is a 'z' char (upper or lower case)
followed by a '%' char.

I already contacted people at postgres and they said the problem is
only reproducible on Fedora. In other Linux distros even in RedHat9 it
is not reproducible at all.

I think we need to upgrade to postgres or recompile it.

Hope this help.

Comment 1 Antonio 2004-01-11 04:16:28 UTC
Here is the thread in PostgreSQL maillist that concluded the problem
is only with rpm fedora version:

http://archives.postgresql.org/pgsql-jdbc/2004-01/msg00060.php

Comment 2 Tom Lane 2004-02-01 03:39:01 UTC
I've spent quite a bit of time trying to reproduce this, without any
success.  Where exactly did you get the Postgres binaries you are
using?  Also, what locale are you using in the database
(pg_controldata can be used to check)?

Comment 3 Tom Lane 2004-02-02 03:10:19 UTC
Okay, further info from Paul Thomas helped me isolate the problem. 
The failure only occurs when using a multibyte encoding (such as Unicode)
and a non-C locale (probably not all non-C locales will show it).
Paul's example was en_GB.UTF-8.

I've committed a patch into the upstream CVS repository.  This is post
7.3.5 and it's unlikely that there will be any more 7.3.* community
releases, but possibly we can sneak it into Red Hat's SRPM as a patch.
(That's assuming Red Hat makes any more 7.3.* SRPMs; we may go to 7.4.)

Comment 4 Antonio 2004-02-05 01:28:24 UTC
I got the binaries from the Fedora distribution. As you pointed out
maybe the problem is related to locale. My locale is es_NI.UTF-8
pg_controldata returns:

pg_controldata /var/lib/pgsql/data
numero de version de pg_control:        72
numero de version de Catalog:           200211021
Database cluster state:               in production
ultima modificación de pg_control:      mié 04 feb 2004 17:59:59 CST
ID de archivo de log actual:            0
El siguiente segmento archivo log:      78
Latest checkpoint location:           0/4D5AE848
Prior checkpoint location:            0/4D5AC784
Latest checkpoint's REDO location:    0/4D5AE848
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's StartUpID:        105
Latest checkpoint's NextXID:          2703308
Latest checkpoint's NextOID:          1285772
Time of latest checkpoint:            mié 04 feb 2004 17:59:57 CST
Tamaño de bloque de la base de datos:   8192
Bloques/segmento en relacion grande:    131072
Longitud maxima de los identificadores: 64
Numero maximo de argumentos de función: 32
Tipo de almacenamiento de Date/Time:    Punto flotante
Longitud maxima de nombre locales:      128
LC_COLLATE:                                             es_NI.UTF-8
LC_CTYPE:                               es_NI.UTF-8

If you need more info, please let know. I will be glad to help to
resolve this bug.

Comment 5 Tom Lane 2004-03-05 22:53:56 UTC
A fix slated for RHEL3 Update 2 is now in QA.

Comment 6 John Flanagan 2004-05-12 04:52:56 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-116.html