Bug 1582229

Summary:

glibc: regex functions ignore character equivalents

Product:

[Fedora] Fedora

Reporter:

Jaroslav Škarvada <jskarvad>

Component:

glibc

Assignee:

Florian Weimer <fweimer>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

aoliva, arjun, codonell, dj, fweimer, law, mfabian, pfrankli, rth, siddhesh

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

glibc-2.27-30.fc28 glibc-2.27.9000-38.fc29

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-07-17 15:17:40 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1582219

Attachments:

Description	Flags
Reproducer	none

Description Jaroslav Škarvada 2018-05-24 15:19:21 UTC

Created attachment 1441096 [details]
Reproducer

Description of problem:
E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech.

Version-Release number of selected component (if applicable):
glibc-2.27-14.fc28.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gcc -o regex regex.c
2. ./regex
3.

Actual results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 1

Expected results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 0

Additional info:
It's blocking grep rebuild.

Comment 1 Florian Weimer 2018-05-24 15:21:43 UTC

*** Bug 1582224 has been marked as a duplicate of this bug. ***

Comment 2 Jaroslav Škarvada 2018-05-24 15:32:58 UTC

It's not only about 'á' in cs_CZ.UTF-8 or en_US.UTF-8. There are more matches that worked and don't work now, e.g.:

$ echo 'é' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'è' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'ê' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
...

Comment 3 Florian Weimer 2018-07-09 12:03:56 UTC

This appears to be a deliberate change in character equivalences.  As part of the updates for https://sourceware.org/bugzilla/show_bug.cgi?id=14095, most accented and non-accented characters are no longer considered equivalent.  I do not know if this the intend of the current Unicode version.

Comment 4 Florian Weimer 2018-07-09 15:35:18 UTC

This may be an algorithmic issue after all, not a data problem.

Comment 5 Fedora Update System 2018-07-12 15:41:42 UTC

glibc-2.27-30.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 6 Fedora Update System 2018-07-13 19:29:20 UTC

glibc-2.27-30.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 7 Fedora Update System 2018-07-17 15:17:40 UTC

glibc-2.27-30.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.