Bug 809726 - Characters omitted from range a-z in finnish locale.
Characters omitted from range a-z in finnish locale.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc (Show other bugs)
6.2
i686 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Jeff Law
qe-baseos-tools
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-04 04:25 EDT by Antti Siira
Modified: 2016-11-24 10:47 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: locale data for the characters in the range a-z where incorrect in the Finnish locale. Consequence: Some characters in the range a-z were not printing correctly in the Finnish locale Fix: Finnish locale data was updated to provide the correct output for these characters. Result: Characters in the Finnish locale should now print correctly.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 02:04:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Potential patch, various fixes to collation in fi_FI locale (2.92 KB, patch)
2012-07-11 18:28 EDT, Jeff Law
no flags Details | Diff

  None (edit)
Description Antti Siira 2012-04-04 04:25:29 EDT
Description of problem:
Characters omitted from range a-z in finnish locale.


Version-Release number of selected component (if applicable):
4.2.1

How reproducible:
Always

Steps to Reproduce:
1. export LC_COLLATE=fi_FI.UTF-8
2. echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
  
Actual results:
Output: a

Expected results:
Output: avb

Additional info:
Version 4.2.1 of sed on Ubuntu and version 4.1.5 of sed on RHEL5.8 works as expected.
Comment 2 Ondrej Vasik 2012-04-23 09:39:16 EDT
Strange... I'm getting correct results on my RHEL-6.
$ LC_COLLATE=fi_FI.UTF-8 echo "avb" | ./sed -e 's/\([a-z]*\).*/\1/'
avb

with sed-4.2.1-7.el6.i686 and sed-4.2.1-9.el6.i686. Maybe something more specific to your system (glibc version?)?
Comment 4 Antti Siira 2012-04-24 05:55:19 EDT
Strage indeed. I'm not certain that it is the locale causing the problem, but
something is off.

$ echo "avb" | sed -e 's/\([a-z]*\)/\1/'
avb
$ echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
a
$ echo "avb" | sed -e 's/\([a-z]*\)b/\1/'
av

My glibc version is the current rhel6 glibc.
$ rpm -q glibc
glibc-2.12-1.47.el6_2.9.x86_64
Comment 5 Vojtech Vitek 2012-05-10 10:24:00 EDT
I couldn't reproduce the behavior on RHEL-6.2 either.

# rpm -q glibc
glibc-2.12-1.47.el6_2.9.x86_64
# rpm -q sed
sed-4.2.1-7.el6.x86_64

# LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\)/\1/'
avb
# LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
avb


Do you have any other environment variable set otherwise than default?
Comment 6 Antti Siira 2012-05-23 04:09:14 EDT
I booted the machine to rescue mode using rhel-server-6.2-x86_64-dvd.iso to ensure known environment.

Choose a Language: English
Keyboard Type: fi

# locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/'; LC_COLLATE=fi_FI.UTF-8; locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/';
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=C
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
av
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=C
LC_TIME="en_US.UTF-8"
LC_COLLATE=fi_FI.UTF-8
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
ab
Comment 7 Martin Bříza 2012-06-06 09:16:12 EDT
The component has changed its owner to me, so I'm assigning the bug to myself.
Comment 8 Martin Bříza 2012-06-07 10:55:09 EDT
I was able to reproduce this bug in Fedora with sed from RHEL. Will investigate this further.
Comment 9 Martin Bříza 2012-06-11 09:09:40 EDT
Well, now I'm sure this bug is not related to sed itself, I'll quote a part of BUGS file that is packed with the source code distribution of sed:

  Another problem is that [a-z] tries to use collation symbols.  This
  only happens if you are on the GNU system, using GNU libc's regular
  expression matcher instead of compiling the one supplied with GNU sed.
  In a Danish locale, for example, the regular expression `^[a-z]$'
  matches the string `aa', because `aa' is a single collating symbol that
  comes after `a' and before `b'; `ll' behaves similarly in Spanish
  locales, or `ij' in Dutch locales.

I believe this is related to your issue.
However, I'll discuss this with my team as there are some limiting factors factors:
Compiling using the regular expression matcher from sed is quite a big change to the package possibly causing other issues. 
And I'm quite sure it won't be possible to change the behavior of the glibc one as it is (as far as I can tell) considered correct under certain circumstances.
Comment 10 Martin Bříza 2012-06-19 07:43:07 EDT
Hello, I discussed this and decided to switch the component to glibc as it is the origin of the behavior.

Note: The reproducer command is as follows:
echo "avb" | LC_COLLATE=fi_FI.UTF-8 sed -e 's/\([a-z]*\).*/\1/'
Comment 11 RHEL Product and Program Management 2012-07-10 04:27:33 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 12 RHEL Product and Program Management 2012-07-10 19:17:37 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 13 Jeff Law 2012-07-11 18:28:56 EDT
Created attachment 597676 [details]
Potential patch, various fixes to collation in fi_FI locale
Comment 16 errata-xmlrpc 2013-02-21 02:04:37 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0279.html

Note You need to log in before you can comment on or make changes to this bug.