Bug 759475

Summary: UTF-8 caseless match misses pairs with different encoding length
Product: Red Hat Enterprise Linux 6 Reporter: Petr Pisar <ppisar>
Component: pcreAssignee: Petr Pisar <ppisar>
Status: CLOSED ERRATA QA Contact: Lukáš Zachar <lzachar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2Keywords: Patch
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://bugs.exim.org/show_bug.cgi?id=1179
Whiteboard:
Fixed In Version: pcre-7.8-6.el6 Doc Type: Bug Fix
Doc Text:
Cause: Matching with case-less pattern in UTF-8 mode (e.g. `/ⱥ/8i'). Consequence: The pattern does not match characters at the end of input text whose encoding length is shorter than encoding length of character in the pattern (e.g. `Ⱥ'). Fix: pcre library has been changed to count length of matched characters correctly. Result: Case-less patterns match characters with different encoding length correctly even at the end of input string now.
Story Points: ---
Clone Of: 756675 Environment:
Last Closed: 2012-09-07 10:48:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 836160    
Attachments:
Description Flags
Upstream fix ported to pcre-7.8
none
Test case none

Description Petr Pisar 2011-12-02 13:31:38 UTC
+++ This bug was initially created as a clone of Bug #756675 +++

From upstream <http://bugs.exim.org/show_bug.cgi?id=1179>:

$ pcretest 
PCRE version 8.12 2011-01-15

  re> /ⱥ/8i
data> ⱥ
 0: \x{2c65}
data> Ⱥ
No match
data> Ⱥ_
 0: \x{23a}

The lower-cased variant occupies 3 bytes while the upper-cased variant 2 bytes only. If padding in input exists, the match occurs.

--- Additional comment from ppisar on 2011-12-02 08:06:04 GMT ---

Upstream states this issue has been fixed by commit:

r778 | ph10 | 2011-12-01 18:38:47 +0100 (Thu, 01 Dec 2011) | 3 lines

Fix bug with caseless matching of characters of different lengths when the 
shorter is right at the end of the subject.
-----

RHEL-6 (pcre-7.8-3.1.el6) affected.

Comment 1 Petr Pisar 2011-12-02 13:33:16 UTC
Created attachment 539635 [details]
Upstream fix ported to pcre-7.8

Tests on top of patch for bug #756105.

Comment 2 Petr Pisar 2011-12-02 13:34:44 UTC
Created attachment 539636 [details]
Test case

Comment 4 Suzanne Logcher 2012-02-14 23:22:31 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 13 errata-xmlrpc 2012-09-07 10:48:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1240.html