Bug 457064

Summary: pcre is configured with no support for Unicode properties
Product: Red Hat Enterprise Linux 5 Reporter: orensol
Component: pcreAssignee: Petr Pisar <ppisar>
Status: CLOSED ERRATA QA Contact: Ondrej Moriš <omoris>
Severity: high Docs Contact:
Priority: high    
Version: 5.5CC: alain.portal, alain.portal, artms, ddevaraj, fumiyas, gerwinkrist, heil, jon, jorton, jwest, kwizart, omoris, ovasik, philip.r.schaffner, pm-rhel, rchidamb, redhat-bugzilla, robert.scheck, rvokal, tao, tis
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcre-6.6-6.el5 Doc Type: Enhancement
Doc Text:
Unicode properties have been enabled to support \p{..}, \P{..}, and \X escape sequences.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 22:09:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 502912, 554476, 577088    
Attachments:
Description Flags
Fix for looping on pattern compilation in non-UTF-8
none
Test case for the loop problem none

Description orensol 2008-07-29 14:37:04 UTC
Description of problem:
pcre package comes configured with utf-8 support, but with no unicode properties
enabled.

Version-Release number of selected component (if applicable):
6.6-2

How reproducible:
always

Steps to Reproduce:
1.install pcre package from repository
  

Actual results:
> pcretest -C
PCRE version 6.6 06-Feb-2006
Compiled with
  UTF-8 support
  No Unicode properties support
  Newline character is LF
  Internal link size = 2
  POSIX malloc threshold = 10
  Default match limit = 10000000
  Default recursion depth limit = 10000000
  Match recursion uses stack


Expected results:
PCRE version 6.6 06-Feb-2006
Compiled with
  UTF-8 support
  Unicode properties support
  Newline character is LF
  Internal link size = 2
  POSIX malloc threshold = 10
  Default match limit = 10000000
  Default recursion depth limit = 10000000
  Match recursion uses stack

Additional info:
add configure option --enable-unicode-properties

Comment 1 Gerwin Krist 2008-08-29 08:30:02 UTC
Same problem here, we need it badly with the Zend Framework for sites with non-latin charsets.

Comment 2 Thomas Heil 2008-11-23 17:06:24 UTC
Is the about to be fixed, so the need for an overlay packages repository would be no longer needed ?

Comment 3 Gerwin Krist 2008-11-23 19:38:46 UTC
No still not fixed. RH has marked it as a feature (it is?!?) request.

Comment 4 Robert Scheck 2009-01-08 09:17:24 UTC
We've exactly the same problem here, in order to use the search engine
delivered with TYPOlight webCMS (PHP), we really need that feature enabled.
I also wonder that this isn't already enabled for a long time - or is Red
Hat less unicode interested as Fedora is for ages now?

I'm adding RHEL Product and Program Management on Cc to get this maybe
solved for RHEL 5.3 as it seems to be a tiny and minor change to me (and I
also hope, that the e-mail address isn't just a dummy one). I'm adding Joe
as he's the PHP downstream maintainer and also should know the issue/the
missing feature in PHP.

Comment 5 Stepan Kasal 2009-01-08 13:46:59 UTC
*** Bug 461712 has been marked as a duplicate of this bug. ***

Comment 6 Gerwin Krist 2009-01-24 09:49:40 UTC
Is may hope that this issue is fixed in the 5.3 release?

Comment 7 Robert Scheck 2009-01-24 11:21:08 UTC
Not that I could see it. Gerwin, you may want to ask your salesguy as well
to get this priorized.

Comment 8 Gerwin Krist 2009-01-28 18:02:54 UTC
For your info: I escalated the support ticket about this ,to high.

Comment 10 RHEL Program Management 2009-03-26 16:48:19 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 11 intel352 2009-04-14 14:33:03 UTC
This page details how to build a fixed PCRE with Unicode support, or you can just download the RPM that has been already compiled for Centos 5.2 (=RHEL 5.2)

http://gaarai.com/2009/01/31/unicode-support-on-centos-52-with-php-and-pcre/

Comment 12 RHEL Program Management 2009-04-16 17:07:13 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 16 RHEL Program Management 2009-05-05 13:07:26 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 22 Alain Portal 2010-01-25 18:12:45 UTC
Can the "Red Hat Product Management" explain why it doesn't want to enable the unicode properties?

Don't it think enabling utf8 support and not unicode properties is contradictory?

Comment 24 intel352 2010-06-29 14:26:07 UTC
wow, it's been two years now... this is an easily fixed issue, that still hasn't been rectified in two years?

I'm glad I moved on to Ubuntu.

Comment 26 Robert Scheck 2010-07-18 11:01:18 UTC
I've cross-filed this issue as Service Request 2041330.

Comment 31 Martin Prpič 2010-11-15 15:00:44 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Unicode properties have been enabled to support \p{..}, \P{..}, and \X escape sequences.

Comment 33 Tuomo Soini 2011-01-12 16:20:52 UTC
Please test for php bug38600. Our internal build with unicode properties caused this bug to hit again. We needed extra patch to address this infinite loop in pcre.

Testing code for the issue is in http://bugs.php.net/bug.php?id=38600

Please test before releasing new pcre.

Comment 34 Petr Pisar 2011-01-13 12:49:10 UTC
(In reply to comment #33)
> Please test for php bug38600. Our internal build with unicode properties caused
> this bug to hit again. We needed extra patch to address this infinite loop in
> pcre.
> 
> Testing code for the issue is in http://bugs.php.net/bug.php?id=38600
> 
> Please test before releasing new pcre.

I can confirm the pcre-6.6-6.el5 does not terminate on compiling the pattern:

#!/bin/sh
PATTERN='/(?<!\w)(0x[\p{N}]+[lL]?|[\p{Nd}]+(e[\p{Nd}]*)?[lLdDfF]?)(?!\w)/'
TEXT='bla bla bla'
printf "${PATTERN}\\n${TEXT}\\n" | pcretest

Comment 37 Petr Pisar 2011-01-13 15:33:32 UTC
Created attachment 473346 [details]
Fix for looping on pattern compilation in non-UTF-8

I believe I found fix for the loop problem. This attachment should fix it.

Comment 38 Petr Pisar 2011-01-13 15:36:39 UTC
Created attachment 473347 [details]
Test case for the loop problem

This C program tests the loop problem. If the problem is fixed, the program returns with success code in finite time. Otherwise it never halts.

Comment 39 Petr Pisar 2011-01-13 16:49:46 UTC
The infinite loop problem is addressed in new bug #669413. Thanks for careful testing.

Comment 40 errata-xmlrpc 2011-01-13 22:09:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0022.html