Bug 1184168

Summary: locale -a output is binary according to grep because of bokmal
Product: [Fedora] Fedora Reporter: Barry Scott <barry.scott>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: arjun.is, codonell, edgar.hoch, fweimer, jakub, law, mkolman, mnewsome, pfrankli, van.de.bugger
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.21-11.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-17 12:50:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Barry Scott 2015-01-20 17:47:21 UTC
Description of problem:

We have scripts that grep the output of locale -a. It seems that
that entry after bokmal has an 0xe5 in it, buts its not urf-8.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. locale -a | grep bok
2.
3.

Actual results:

grep reports file as binary.

after adding grep -a you can get to the bokmal entry

here is the od -cx of the odd entry.

0002140   b   o   k   m 345   l  \n   
           6f62    6d6b    6ce5    620a

what encoding is the 0xe5 in?

Expected results:

output of locale considered to be text.

Additional info:

Comment 1 Barry Scott 2015-01-20 18:18:40 UTC
My locale is set to en_GB.UTF-8.

The 0xe5 is the right unicode code point but should be output as
the pair of bytes 0xc3 0xa5 for a UTF-* locale.

Comment 2 Carlos O'Donell 2015-01-20 18:39:50 UTC
(In reply to Barry Scott from comment #0)
> 0002140   b   o   k   m 345   l  \n   
>            6f62    6d6b    6ce5    620a
> 
> what encoding is the 0xe5 in?

It is ISO-8859-1, it matches the locale in question.

The names in general match the encodings of their localizations.

Therefore if you want to use grep in this way you need to do the following:

[carlos@athas glibc]$ export LANG=C
[carlos@athas glibc]$ locale -a | grep bok
bokmal
bokm�l

This problem is not new.

The output is in ISO-8859-1 because a bokmal application would be using that encoding to search for and reference the particular ISO-8859-1 localization by name (the alternate one with the diacrtic ring over the a).

We might *add* an entry in UTF-8 for another alias, but I don't see it possible to remove an aliased entry already in the list.

Therefore I don't think this is a bug, it's simply a reflection of the fact that this output can't be processed with grep this way without using a very generic locale e.g. C.

I'm marking this as CLOSED/NOTABUG.

Comment 3 Carlos O'Donell 2015-03-05 22:08:29 UTC
This is a much more complicated issue than one would expect:
https://sourceware.org/ml/libc-alpha/2015-01/msg00379.html

Comment 4 Carlos O'Donell 2015-03-05 22:09:05 UTC
*** Bug 1199117 has been marked as a duplicate of this bug. ***

Comment 5 Carlos O'Donell 2015-03-05 22:09:16 UTC
*** Bug 1196391 has been marked as a duplicate of this bug. ***

Comment 6 Carlos O'Donell 2015-03-05 22:13:19 UTC
We need to:
- Keep the old non-ASCII aliases so existing programs keep working.
- Make sure all non-ASCII aliases have ASCII aliases.
- Stop displaying non-ASCII aliases in locale output.
- Prevent new locales with non-ASCII names from being created (impose the restriction at locale compile time).
- Verify you can use the old non-ACII aliases, and that they are not displayed.
- Fix the bug that fails to print locale information in the present character set.

Comment 7 Fedora End Of Life 2015-11-04 10:31:34 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Edgar Hoch 2015-11-04 10:49:06 UTC
This problem still seems to exist in Fedora 22. It should be solved.

Please update the Fedora version of this bug to 22 or rawhide.

Comment 9 Martin Kolman 2015-11-04 11:31:42 UTC
(In reply to Edgar Hoch from comment #8)
> This problem still seems to exist in Fedora 22. It should be solved.
> 
> Please update the Fedora version of this bug to 22 or rawhide.

Indeed, this is still relevant and should be fixed.

Comment 10 Carlos O'Donell 2015-11-10 02:36:25 UTC
This is fixed upstream and in F23 via:

commit 333e1ba4e53456a603621274177ae9393b9d5385
Author: Paul Eggert <eggert.edu>
Date:   Fri May 22 14:57:11 2015 -0700

    Remove obsolete aliases that broke 'locale -a'
    
    [BZ #18412]
    * intl/locale.alias: Remove obsolete aliases "bokmål" and "français"
    which caused 'locale -a' to output Latin-1 data in UTF-8 locales,
    breaking some applications that use 'locale -a' output.
    Change the encoding of this file from Latin-1 to ASCII to avoid
    other potential problems with people grepping this file.

We have not yet fixed this in F22.

Comment 11 Fedora Update System 2016-02-08 15:54:35 UTC
glibc-2.21-10.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0480defc94

Comment 12 Fedora Update System 2016-02-10 11:55:20 UTC
glibc-2.21-10.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0480defc94

Comment 13 Fedora Update System 2016-02-16 16:10:49 UTC
glibc-2.21-11.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0480defc94

Comment 14 Fedora Update System 2016-02-17 06:28:16 UTC
glibc-2.21-11.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0480defc94

Comment 15 Fedora Update System 2016-02-17 12:49:30 UTC
glibc-2.21-11.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.