Bug 759475 - UTF-8 caseless match misses pairs with different encoding length
Summary: UTF-8 caseless match misses pairs with different encoding length
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: pcre
Version: 6.2
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Petr Pisar
QA Contact: Lukáš Zachar
URL: http://bugs.exim.org/show_bug.cgi?id=...
Depends On:
Blocks: 836160
TreeView+ depends on / blocked
Reported: 2011-12-02 13:31 UTC by Petr Pisar
Modified: 2012-09-07 10:48 UTC (History)
0 users

Fixed In Version: pcre-7.8-6.el6
Doc Type: Bug Fix
Doc Text:
Cause: Matching with case-less pattern in UTF-8 mode (e.g. `/ⱥ/8i'). Consequence: The pattern does not match characters at the end of input text whose encoding length is shorter than encoding length of character in the pattern (e.g. `Ⱥ'). Fix: pcre library has been changed to count length of matched characters correctly. Result: Case-less patterns match characters with different encoding length correctly even at the end of input string now.
Clone Of: 756675
Last Closed: 2012-09-07 10:48:58 UTC
Target Upstream Version:

Attachments (Terms of Use)
Upstream fix ported to pcre-7.8 (3.10 KB, patch)
2011-12-02 13:33 UTC, Petr Pisar
no flags Details | Diff
Test case (223 bytes, application/octet-stream)
2011-12-02 13:34 UTC, Petr Pisar
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:1240 0 normal SHIPPED_LIVE pcre bug fix release 2012-09-07 14:47:32 UTC

Description Petr Pisar 2011-12-02 13:31:38 UTC
+++ This bug was initially created as a clone of Bug #756675 +++

From upstream <http://bugs.exim.org/show_bug.cgi?id=1179>:

$ pcretest 
PCRE version 8.12 2011-01-15

  re> /ⱥ/8i
data> ⱥ
 0: \x{2c65}
data> Ⱥ
No match
data> Ⱥ_
 0: \x{23a}

The lower-cased variant occupies 3 bytes while the upper-cased variant 2 bytes only. If padding in input exists, the match occurs.

--- Additional comment from ppisar@redhat.com on 2011-12-02 08:06:04 GMT ---

Upstream states this issue has been fixed by commit:

r778 | ph10 | 2011-12-01 18:38:47 +0100 (Thu, 01 Dec 2011) | 3 lines

Fix bug with caseless matching of characters of different lengths when the 
shorter is right at the end of the subject.

RHEL-6 (pcre-7.8-3.1.el6) affected.

Comment 1 Petr Pisar 2011-12-02 13:33:16 UTC
Created attachment 539635 [details]
Upstream fix ported to pcre-7.8

Tests on top of patch for bug #756105.

Comment 2 Petr Pisar 2011-12-02 13:34:44 UTC
Created attachment 539636 [details]
Test case

Comment 4 Suzanne Logcher 2012-02-14 23:22:31 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support

Comment 13 errata-xmlrpc 2012-09-07 10:48:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.