Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 809726

Summary:

Characters omitted from range a-z in finnish locale.

Product:

Red Hat Enterprise Linux 6

Reporter:

Antti Siira <antti>

Component:

glibc

Assignee:

Jeff Law <law>

Status:

CLOSED ERRATA

QA Contact:

qe-baseos-tools-bugs

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

6.2

CC:

fweimer, mbriza, mcermak, mfranc, ovasik, pfrankli

Target Milestone:

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: locale data for the characters in the range a-z where incorrect in the Finnish locale. Consequence: Some characters in the range a-z were not printing correctly in the Finnish locale Fix: Finnish locale data was updated to provide the correct output for these characters. Result: Characters in the Finnish locale should now print correctly.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-02-21 07:04:37 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Potential patch, various fixes to collation in fi_FI locale	none

Description Antti Siira 2012-04-04 08:25:29 UTC

Description of problem:
Characters omitted from range a-z in finnish locale.


Version-Release number of selected component (if applicable):
4.2.1

How reproducible:
Always

Steps to Reproduce:
1. export LC_COLLATE=fi_FI.UTF-8
2. echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
  
Actual results:
Output: a

Expected results:
Output: avb

Additional info:
Version 4.2.1 of sed on Ubuntu and version 4.1.5 of sed on RHEL5.8 works as expected.

Comment 2 Ondrej Vasik 2012-04-23 13:39:16 UTC

Strange... I'm getting correct results on my RHEL-6.
$ LC_COLLATE=fi_FI.UTF-8 echo "avb" | ./sed -e 's/\([a-z]*\).*/\1/'
avb

with sed-4.2.1-7.el6.i686 and sed-4.2.1-9.el6.i686. Maybe something more specific to your system (glibc version?)?

Comment 4 Antti Siira 2012-04-24 09:55:19 UTC

Strage indeed. I'm not certain that it is the locale causing the problem, but
something is off.

$ echo "avb" | sed -e 's/\([a-z]*\)/\1/'
avb
$ echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
a
$ echo "avb" | sed -e 's/\([a-z]*\)b/\1/'
av

My glibc version is the current rhel6 glibc.
$ rpm -q glibc
glibc-2.12-1.47.el6_2.9.x86_64

Comment 5 Vojtech Vitek 2012-05-10 14:24:00 UTC

I couldn't reproduce the behavior on RHEL-6.2 either.

# rpm -q glibc
glibc-2.12-1.47.el6_2.9.x86_64
# rpm -q sed
sed-4.2.1-7.el6.x86_64

# LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\)/\1/'
avb
# LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
avb


Do you have any other environment variable set otherwise than default?

Comment 6 Antti Siira 2012-05-23 08:09:14 UTC

I booted the machine to rescue mode using rhel-server-6.2-x86_64-dvd.iso to ensure known environment.

Choose a Language: English
Keyboard Type: fi

# locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/'; LC_COLLATE=fi_FI.UTF-8; locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/';
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=C
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
av
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=C
LC_TIME="en_US.UTF-8"
LC_COLLATE=fi_FI.UTF-8
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
ab

Comment 7 Martin Bříza 2012-06-06 13:16:12 UTC

The component has changed its owner to me, so I'm assigning the bug to myself.

Comment 8 Martin Bříza 2012-06-07 14:55:09 UTC

I was able to reproduce this bug in Fedora with sed from RHEL. Will investigate this further.

Comment 9 Martin Bříza 2012-06-11 13:09:40 UTC

Well, now I'm sure this bug is not related to sed itself, I'll quote a part of BUGS file that is packed with the source code distribution of sed:

  Another problem is that [a-z] tries to use collation symbols.  This
  only happens if you are on the GNU system, using GNU libc's regular
  expression matcher instead of compiling the one supplied with GNU sed.
  In a Danish locale, for example, the regular expression `^[a-z]$'
  matches the string `aa', because `aa' is a single collating symbol that
  comes after `a' and before `b'; `ll' behaves similarly in Spanish
  locales, or `ij' in Dutch locales.

I believe this is related to your issue.
However, I'll discuss this with my team as there are some limiting factors factors:
Compiling using the regular expression matcher from sed is quite a big change to the package possibly causing other issues. 
And I'm quite sure it won't be possible to change the behavior of the glibc one as it is (as far as I can tell) considered correct under certain circumstances.

Comment 10 Martin Bříza 2012-06-19 11:43:07 UTC

Hello, I discussed this and decided to switch the component to glibc as it is the origin of the behavior.

Note: The reproducer command is as follows:
echo "avb" | LC_COLLATE=fi_FI.UTF-8 sed -e 's/\([a-z]*\).*/\1/'

Comment 11 RHEL Program Management 2012-07-10 08:27:33 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 12 RHEL Program Management 2012-07-10 23:17:37 UTC

This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 13 Jeff Law 2012-07-11 22:28:56 UTC

Created attachment 597676 [details]
Potential patch, various fixes to collation in fi_FI locale

Comment 16 errata-xmlrpc 2013-02-21 07:04:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0279.html