Bug 1986428

Summary: A character range too hungry in C.UTF-8 with glibc-2.33.9000-53.fc35
Product: [Fedora] Fedora Reporter: Petr Pisar <ppisar>
Component: sedAssignee: Jakub Martisko <jamartis>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: fpokorny, fweimer, hhorak, jamartis, jmoskovc, jpacner, m, ovasik, pbonzini, pstodulk
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 14:04:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Pisar 2021-07-27 14:01:37 UTC
There is a regression:

Fedora 34:

$ printf '4A\n' | LC_ALL=C.UTF-8 sed 's/[4-4]\+/X/'
XA

Fedora 35:

$ printf '4A\n' | LC_ALL=C.UTF-8 sed 's/[4-4]\+/X/'
X

It depends on locale, Fedora 35:

$ printf '4A\n' | LC_ALL=en_US.UTF-8 sed 's/[4-4]\+/X/'
XA

And the regular expression, e.g. character set without a range works:

$ printf '4A\n' | LC_ALL=C.UTF-8 sed 's/[4]\+/X/'
XA

This is very probably triggered by a recent glibc changes. Tested with:

glibc-common-2.33.9000-53.fc35.x86_64
glibc-gconv-extra-2.33.9000-53.fc35.x86_64
glibc-langpack-cs-2.33.9000-53.fc35.x86_64
glibc-langpack-en-2.33.9000-53.fc35.x86_64
glibc-langpack-tr-2.33.9000-53.fc35.x86_64
glibc-2.33.9000-53.fc35.x86_64
glibc-headers-x86-2.33.9000-53.fc35.noarch
glibc-devel-2.33.9000-53.fc35.x86_64
glibc-static-2.33.9000-53.fc35.x86_64
sed-4.8-7.fc34.x86_64

Comment 1 Petr Pisar 2021-07-27 14:04:19 UTC

*** This bug has been marked as a duplicate of bug 1986421 ***