Bug 1582229
| Summary: | glibc: regex functions ignore character equivalents | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jaroslav Škarvada <jskarvad> | ||||
| Component: | glibc | Assignee: | Florian Weimer <fweimer> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 28 | CC: | aoliva, arjun, codonell, dj, fweimer, law, mfabian, pfrankli, rth, siddhesh | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glibc-2.27-30.fc28 glibc-2.27.9000-38.fc29 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-07-17 15:17:40 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1582219 | ||||||
| Attachments: |
|
||||||
*** Bug 1582224 has been marked as a duplicate of this bug. *** It's not only about 'á' in cs_CZ.UTF-8 or en_US.UTF-8. There are more matches that worked and don't work now, e.g.: $ echo 'é' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]' $ echo 'è' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]' $ echo 'ê' | LC_ALL=fr_FR.UTF-8 grep '[[=e=]]' ... This appears to be a deliberate change in character equivalences. As part of the updates for https://sourceware.org/bugzilla/show_bug.cgi?id=14095, most accented and non-accented characters are no longer considered equivalent. I do not know if this the intend of the current Unicode version. This may be an algorithmic issue after all, not a data problem. glibc-2.27-30.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183 glibc-2.27-30.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-85c0ff9183 glibc-2.27-30.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 1441096 [details] Reproducer Description of problem: E.g. '[[=a=]]' regex doesn't match 'á' in Czech locale as it should. It seems to be regression, because it worked in glibc-2.26 and older. All locales seems to be affected, not only the Czech. Version-Release number of selected component (if applicable): glibc-2.27-14.fc28.x86_64 How reproducible: Always Steps to Reproduce: 1. gcc -o regex regex.c 2. ./regex 3. Actual results: locale: cs_CZ.UTF-8 regcomp: 0 regexec: 1 Expected results: locale: cs_CZ.UTF-8 regcomp: 0 regexec: 0 Additional info: It's blocking grep rebuild.