Bug 809726
Summary: | Characters omitted from range a-z in finnish locale. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Antti Siira <antti> | ||||
Component: | glibc | Assignee: | Jeff Law <law> | ||||
Status: | CLOSED ERRATA | QA Contact: | qe-baseos-tools-bugs | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.2 | CC: | fweimer, mbriza, mcermak, mfranc, ovasik, pfrankli | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: locale data for the characters in the range a-z where incorrect in the Finnish locale.
Consequence: Some characters in the range a-z were not printing correctly in the Finnish locale
Fix: Finnish locale data was updated to provide the correct output for these characters.
Result: Characters in the Finnish locale should now print correctly.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-21 07:04:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Antti Siira
2012-04-04 08:25:29 UTC
Strange... I'm getting correct results on my RHEL-6. $ LC_COLLATE=fi_FI.UTF-8 echo "avb" | ./sed -e 's/\([a-z]*\).*/\1/' avb with sed-4.2.1-7.el6.i686 and sed-4.2.1-9.el6.i686. Maybe something more specific to your system (glibc version?)? Strage indeed. I'm not certain that it is the locale causing the problem, but something is off. $ echo "avb" | sed -e 's/\([a-z]*\)/\1/' avb $ echo "avb" | sed -e 's/\([a-z]*\).*/\1/' a $ echo "avb" | sed -e 's/\([a-z]*\)b/\1/' av My glibc version is the current rhel6 glibc. $ rpm -q glibc glibc-2.12-1.47.el6_2.9.x86_64 I couldn't reproduce the behavior on RHEL-6.2 either. # rpm -q glibc glibc-2.12-1.47.el6_2.9.x86_64 # rpm -q sed sed-4.2.1-7.el6.x86_64 # LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\)/\1/' avb # LC_COLLATE=fi_FI.UTF-8 echo "avb" | sed -e 's/\([a-z]*\).*/\1/' avb Do you have any other environment variable set otherwise than default? I booted the machine to rescue mode using rhel-server-6.2-x86_64-dvd.iso to ensure known environment. Choose a Language: English Keyboard Type: fi # locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/'; LC_COLLATE=fi_FI.UTF-8; locale; echo 'avb' | sed -e 's/\([a-z]*\)./\1/'; LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC=C LC_TIME="en_US.UTF-8" LC_COLLATE=C LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= av LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC=C LC_TIME="en_US.UTF-8" LC_COLLATE=fi_FI.UTF-8 LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= ab The component has changed its owner to me, so I'm assigning the bug to myself. I was able to reproduce this bug in Fedora with sed from RHEL. Will investigate this further. Well, now I'm sure this bug is not related to sed itself, I'll quote a part of BUGS file that is packed with the source code distribution of sed: Another problem is that [a-z] tries to use collation symbols. This only happens if you are on the GNU system, using GNU libc's regular expression matcher instead of compiling the one supplied with GNU sed. In a Danish locale, for example, the regular expression `^[a-z]$' matches the string `aa', because `aa' is a single collating symbol that comes after `a' and before `b'; `ll' behaves similarly in Spanish locales, or `ij' in Dutch locales. I believe this is related to your issue. However, I'll discuss this with my team as there are some limiting factors factors: Compiling using the regular expression matcher from sed is quite a big change to the package possibly causing other issues. And I'm quite sure it won't be possible to change the behavior of the glibc one as it is (as far as I can tell) considered correct under certain circumstances. Hello, I discussed this and decided to switch the component to glibc as it is the origin of the behavior. Note: The reproducer command is as follows: echo "avb" | LC_COLLATE=fi_FI.UTF-8 sed -e 's/\([a-z]*\).*/\1/' This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4. Created attachment 597676 [details]
Potential patch, various fixes to collation in fi_FI locale
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0279.html |