1582229 – glibc: regex functions ignore character equivalents

Bug 1582229 - glibc: regex functions ignore character equivalents

Summary: glibc: regex functions ignore character equivalents

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	28
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Florian Weimer
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1582224 (view as bug list)
Depends On:
Blocks:	1582219
TreeView+	depends on / blocked

Reported:	2018-05-24 15:19 UTC by Jaroslav Škarvada
Modified:	2018-07-17 15:17 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glibc-2.27-30.fc28 glibc-2.27.9000-38.fc29
Clone Of:
Environment:
Last Closed:	2018-07-17 15:17:40 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Reproducer (364 bytes, text/x-csrc) 2018-05-24 15:19 UTC, Jaroslav Škarvada	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1551009	0	unspecified	CLOSED	glibc: collation update and sync with cldr	2021-02-22 00:41:40 UTC
Sourceware	23036	0	None	None	None	2018-05-24 15:19:21 UTC

Internal Links: 1551009

Description Jaroslav Škarvada 2018-05-24 15:19:21 UTC

Created attachment 1441096 [details]
Reproducer

Description of problem:
E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech.

Version-Release number of selected component (if applicable):
glibc-2.27-14.fc28.x86_64

How reproducible:
Always

Steps to Reproduce:
1. gcc -o regex regex.c
2. ./regex
3.

Actual results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 1

Expected results:
locale: cs_CZ.UTF-8
regcomp: 0
regexec: 0

Additional info:
It's blocking grep rebuild.

Comment 1 Florian Weimer 2018-05-24 15:21:43 UTC

*** Bug 1582224 has been marked as a duplicate of this bug. ***

Comment 2 Jaroslav Škarvada 2018-05-24 15:32:58 UTC

It's not only about 'á' in cs_CZ.UTF-8 or en_US.UTF-8. There are more matches that worked and don't work now, e.g.:

$ echo 'é' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'è' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
$ echo 'ê' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]'
...

Comment 3 Florian Weimer 2018-07-09 12:03:56 UTC

This appears to be a deliberate change in character equivalences.  As part of the updates for https://sourceware.org/bugzilla/show_bug.cgi?id=14095, most accented and non-accented characters are no longer considered equivalent.  I do not know if this the intend of the current Unicode version.

Comment 4 Florian Weimer 2018-07-09 15:35:18 UTC

This may be an algorithmic issue after all, not a data problem.

Comment 5 Fedora Update System 2018-07-12 15:41:42 UTC

glibc-2.27-30.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 6 Fedora Update System 2018-07-13 19:29:20 UTC

glibc-2.27-30.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183

Comment 7 Fedora Update System 2018-07-17 15:17:40 UTC

glibc-2.27-30.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.