Bug 809726

Summary: Characters omitted from range a-z in finnish locale.
Product: Red Hat Enterprise Linux 6 Reporter: Antti Siira <antti>
Component: glibcAssignee: Jeff Law <law>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: fweimer, mbriza, mcermak, mfranc, ovasik, pfrankli
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: locale data for the characters in the range a-z where incorrect in the Finnish locale. Consequence: Some characters in the range a-z were not printing correctly in the Finnish locale Fix: Finnish locale data was updated to provide the correct output for these characters. Result: Characters in the Finnish locale should now print correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:04:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Potential patch, various fixes to collation in fi_FI locale none

Description Antti Siira 2012-04-04 08:25:29 UTC
Description of problem:
Characters omitted from range a-z in finnish locale.


Version-Release number of selected component (if applicable):
4.2.1

How reproducible:
Always

Steps to Reproduce:
1. export LC_COLLATE=fi_FI.UTF-8
2. echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
  
Actual results:
Output: a

Expected results:
Output: avb

Additional info:
Version 4.2.1 of sed on Ubuntu and version 4.1.5 of sed on RHEL5.8 works as expected.

Comment 2 Ondrej Vasik 2012-04-23 13:39:16 UTC
Strange... I'm getting correct results on my RHEL-6.
$ LC_COLLATE=fi_FI.UTF-8 echo "avb" | ./sed -e 's/\([a-z]*\).*/\1/'
avb

with sed-4.2.1-7.el6.i686 and sed-4.2.1-9.el6.i686. Maybe something more specific to your system (glibc version?)?

Comment 4 Antti Siira 2012-04-24 09:55:19 UTC
Strage indeed. I'm not certain that it is the locale causing the problem, but
something is off.

$ echo "avb" | sed -e 's/\([a-z]*\)/\1/'
avb
$ echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
a
$ echo "avb" | sed -e 's/\([a-z]*\)b/\1/'
av

My glibc version is the current rhel6 glibc.
$ rpm -q glibc
glibc-2.12-1.47.el6_2.9.x86_64

Comment 5 Vojtech Vitek 2012-05-10 14:24:00 UTC
I couldn't reproduce the behavior on RHEL-6.2 either.

# rpm -q glibc
glibc-2.12-1.47.el6_2.9.x86_64
# rpm -q sed
sed-4.2.1-7.el6.x86_64

# LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\)/\1/'
avb
# LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\).*/\1/'
avb


Do you have any other environment variable set otherwise than default?

Comment 6 Antti Siira 2012-05-23 08:09:14 UTC
I booted the machine to rescue mode using rhel-server-6.2-x86_64-dvd.iso to ensure known environment.

Choose a Language: English
Keyboard Type: fi

# locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/'; LC_COLLATE=fi_FI.UTF-8; locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/';
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=C
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
av
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=C
LC_TIME="en_US.UTF-8"
LC_COLLATE=fi_FI.UTF-8
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
ab

Comment 7 Martin Bříza 2012-06-06 13:16:12 UTC
The component has changed its owner to me, so I'm assigning the bug to myself.

Comment 8 Martin Bříza 2012-06-07 14:55:09 UTC
I was able to reproduce this bug in Fedora with sed from RHEL. Will investigate this further.

Comment 9 Martin Bříza 2012-06-11 13:09:40 UTC
Well, now I'm sure this bug is not related to sed itself, I'll quote a part of BUGS file that is packed with the source code distribution of sed:

  Another problem is that [a-z] tries to use collation symbols.  This
  only happens if you are on the GNU system, using GNU libc's regular
  expression matcher instead of compiling the one supplied with GNU sed.
  In a Danish locale, for example, the regular expression `^[a-z]$'
  matches the string `aa', because `aa' is a single collating symbol that
  comes after `a' and before `b'; `ll' behaves similarly in Spanish
  locales, or `ij' in Dutch locales.

I believe this is related to your issue.
However, I'll discuss this with my team as there are some limiting factors factors:
Compiling using the regular expression matcher from sed is quite a big change to the package possibly causing other issues. 
And I'm quite sure it won't be possible to change the behavior of the glibc one as it is (as far as I can tell) considered correct under certain circumstances.

Comment 10 Martin Bříza 2012-06-19 11:43:07 UTC
Hello, I discussed this and decided to switch the component to glibc as it is the origin of the behavior.

Note: The reproducer command is as follows:
echo "avb" | LC_COLLATE=fi_FI.UTF-8 sed -e 's/\([a-z]*\).*/\1/'

Comment 11 RHEL Program Management 2012-07-10 08:27:33 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 12 RHEL Program Management 2012-07-10 23:17:37 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 13 Jeff Law 2012-07-11 22:28:56 UTC
Created attachment 597676 [details]
Potential patch, various fixes to collation in fi_FI locale

Comment 16 errata-xmlrpc 2013-02-21 07:04:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0279.html