Bug 1582229

Summary: glibc: regex functions ignore character equivalents
Product: [Fedora] Fedora Reporter: Jaroslav Škarvada <jskarvad>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: aoliva, arjun, codonell, dj, fweimer, law, mfabian, pfrankli, rth, siddhesh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.27-30.fc28 glibc-2.27.9000-38.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-17 15:17:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1582219    
Attachments:
Description Flags
Reproducer none

Description Jaroslav Škarvada 2018-05-24 15:19:21 UTC
Created attachment 1441096 [details]
Reproducer

Description of problem:
E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech.

Version-Release number of selected component (if applicable):
glibc-2.27-14.fc28.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gcc -o regex regex.c
2. ./regex
3.

Actual results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 1

Expected results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 0

Additional info:
It's blocking grep rebuild.

Comment 1 Florian Weimer 2018-05-24 15:21:43 UTC
*** Bug 1582224 has been marked as a duplicate of this bug. ***

Comment 2 Jaroslav Škarvada 2018-05-24 15:32:58 UTC
It's not only about 'á' in cs_CZ.UTF-8 or en_US.UTF-8. There are more matches that worked and don't work now, e.g.:

$ echo 'é' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'è' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'ê' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
...

Comment 3 Florian Weimer 2018-07-09 12:03:56 UTC
This appears to be a deliberate change in character equivalences.  As part of the updates for https://sourceware.org/bugzilla/show_bug.cgi?id=14095, most accented and non-accented characters are no longer considered equivalent.  I do not know if this the intend of the current Unicode version.

Comment 4 Florian Weimer 2018-07-09 15:35:18 UTC
This may be an algorithmic issue after all, not a data problem.

Comment 5 Fedora Update System 2018-07-12 15:41:42 UTC
glibc-2.27-30.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 6 Fedora Update System 2018-07-13 19:29:20 UTC
glibc-2.27-30.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 7 Fedora Update System 2018-07-17 15:17:40 UTC
glibc-2.27-30.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.