Description of problem: Unable to query a UNICODE database with LIKE 'z%' Version-Release number of selected component (if applicable): Linux Fedora Core 1. uname -r returns: 2.4.22-1.2140.nptl In psql \encoding reports me: UNICODE Using psql SELECT version() returns: PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) How reproducible: ALWAYS Steps to Reproduce: 1.Create a Database using UNICODE encoding: createdb -E UNICODE test 2. Run psql: psql test 3. Create a test table: CREATE TABLE auth_role ( rol_id int4 not null, rol_name varchar(50) unique not null, rol_enable boolean default true, primary key(rol_id) ); INSERT INTO auth_role(rol_id, rol_name) VALUES (1, 'admin'); INSERT INTO auth_role(rol_id, rol_name) VALUES (2, 'zorro'); 4. Execute this query: SELECT * FROM auth_role WHERE rol_name LIKE 'z%'; or SELECT * FROM auth_role WHERE rol_name LIKE 'Z%'; Actual results: ERROR: Invalid UNICODE character sequence found (0xc000): Expected results: 0 or more rows without error. Additional info: If you replace 'z%' with 'za%' or 'a%' you don't get the error. Seems like the offending sequence is a 'z' char (upper or lower case) followed by a '%' char. I already contacted people at postgres and they said the problem is only reproducible on Fedora. In other Linux distros even in RedHat9 it is not reproducible at all. I think we need to upgrade to postgres or recompile it. Hope this help.
Here is the thread in PostgreSQL maillist that concluded the problem is only with rpm fedora version: http://archives.postgresql.org/pgsql-jdbc/2004-01/msg00060.php
I've spent quite a bit of time trying to reproduce this, without any success. Where exactly did you get the Postgres binaries you are using? Also, what locale are you using in the database (pg_controldata can be used to check)?
Okay, further info from Paul Thomas helped me isolate the problem. The failure only occurs when using a multibyte encoding (such as Unicode) and a non-C locale (probably not all non-C locales will show it). Paul's example was en_GB.UTF-8. I've committed a patch into the upstream CVS repository. This is post 7.3.5 and it's unlikely that there will be any more 7.3.* community releases, but possibly we can sneak it into Red Hat's SRPM as a patch. (That's assuming Red Hat makes any more 7.3.* SRPMs; we may go to 7.4.)
I got the binaries from the Fedora distribution. As you pointed out maybe the problem is related to locale. My locale is es_NI.UTF-8 pg_controldata returns: pg_controldata /var/lib/pgsql/data numero de version de pg_control: 72 numero de version de Catalog: 200211021 Database cluster state: in production ultima modificación de pg_control: mié 04 feb 2004 17:59:59 CST ID de archivo de log actual: 0 El siguiente segmento archivo log: 78 Latest checkpoint location: 0/4D5AE848 Prior checkpoint location: 0/4D5AC784 Latest checkpoint's REDO location: 0/4D5AE848 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 105 Latest checkpoint's NextXID: 2703308 Latest checkpoint's NextOID: 1285772 Time of latest checkpoint: mié 04 feb 2004 17:59:57 CST Tamaño de bloque de la base de datos: 8192 Bloques/segmento en relacion grande: 131072 Longitud maxima de los identificadores: 64 Numero maximo de argumentos de función: 32 Tipo de almacenamiento de Date/Time: Punto flotante Longitud maxima de nombre locales: 128 LC_COLLATE: es_NI.UTF-8 LC_CTYPE: es_NI.UTF-8 If you need more info, please let know. I will be glad to help to resolve this bug.
A fix slated for RHEL3 Update 2 is now in QA.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-116.html