Bug 74701 - Locale rules are inconsistent between X and non-X environments
Summary: Locale rules are inconsistent between X and non-X environments
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: bash
Version: 8.0
Hardware: athlon
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tim Waugh
QA Contact: David Lawrence
URL:
Whiteboard:
: 77115 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-10-01 00:25 UTC by Ed Halley
Modified: 2007-04-18 16:46 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-10-17 07:45:02 UTC
Embargoed:


Attachments (Terms of Use)
Patch hopefully fixing the bash L* variable weirdness (4.39 KB, patch)
2002-10-10 02:52 UTC, Miloslav Trmac
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2003:140 0 normal SHIPPED_LIVE Updated bash packages fix several bugs 2003-06-23 04:00:00 UTC

Description Ed Halley 2002-10-01 00:25:15 UTC
Description of Problem:

  With all of the defaults of a fresh RHL8.0 setup, the locale and
  collation rules are different for processes parented by X versus
  processes not parented by X.

  This is *not* a bug report about whether locale en_US vs locale
  POSIX should be made the default value for Red Hat Linux.

  This is a bug report that shows that the default locale is treated
  differently under XFree86 than it is outside of XFree86.

Steps to Reproduce:

  1. Fresh English/USA install of Psyche.  Fresh user account.

  2. mkdir example ; cd example ;
     touch Apple apple Banana banana Cherry cherry Zod zod

  3. Run any terminal process under X.
     Some examples tested were gnome-terminal, xterm,
     and xchat's terminal tab.

  4. (cd example ; ls ; echo --- ; ls [a-z]* ; echo --- ; ls [A-Z]*)

  5. Switch to a text virtual terminal.

  6. (cd example ; ls ; echo --- ; ls [a-z]* ; echo --- ; ls [A-Z]*)

  7. Compare results from #4 and #6.
     Results will differ.

  8. Edit /etc/sysconfig/i18n LANG="POSIX"

  9. Restart X.

 10. Repeat steps #3 through #7.
     Results will match.

Actual Results:

  #4 under X with default global locale settings (en_US.UTF-8)

     apple Apple banana Banana cherry Cherry zod Zod
     ---
     apple Banana banana Cherry cherry Zod zod
     ---
     Apple apple Banana banana Cherry cherry Zod

  Note that [A-Z]* misses apple but includes banana, cherry, zod.
  Note that [a-z]* misses zod but includes apple, banana, cherry.
  Sort and globbing rules appear to interleave letters aAbBcC...zZ.
     
  #7 no X with default global locale settings (en_US.UTF-8)

     apple Apple banana Banana cherry Cherry zod Zod
     ---
     apple banana cherry zod
     ---
     Apple Banana Cherry Zod

  Note that #4 above does not match #7 above.
  The difference between the two is XFree86.

  =================
     
  #4 under X with adjusted global locale settings of POSIX or C

     Apple Banana Cherry Zod apple banana cherry zod
     ---
     apple banana cherry zod
     ---
     Apple Banana Cherry Zod
     
  #7 no X with adjusted global locale settings of POSIX or C

     Apple Banana Cherry Zod apple banana cherry zod
     ---
     apple banana cherry zod
     ---
     Apple Banana Cherry Zod

  Note that the adjusted #4 above does match the adjusted #7 above.
  Once the POSIX (or C) rules are chosen as the local, XFree86 no
  longer acts different from non-XFree86 environments.

Expected Results:

  All processes should have the same locale and collation rules whether
  running under XFree86 or not.  The first output from step #4 above
  should have matched the first output from step #7 above.  XFree86
  should not cause any locale differences in the behavior of any non-
  graphical application run within its environment.

Additional Information:

  Other locales may be inconsistent between XFree86 and non-XFree86
  environments as well.

Comment 1 Ali-Reza Anghaie 2002-10-01 00:52:39 UTC
I am able to replicate this problem. Comparing systems with the LOCALE set to 
POSIX/C in /etc/sysconfig/i18n and systems that have the default the only 
significant difference is that the trace shows hits to 
/usr/X11R6/lib/X11/locale/en_US.UTF-8/ .. 
 
The bug report, again, isn't a LC_COLLATE complaint. It's just that 
en_US.UTF-8 seems to behave differently under X vs. console. 
 
What drives you insane is that once the gnome-terminal or what-not was 
launched, "export LANG="POSIX"" or C did ~nothing~ to the above. Completely 
unexpected. So the only way to get it to work properly is to get the env set 
before X starts. 
 
From memory I'm thinking that X is behaving properly for the given locale 
except I don't understand why the 'export' does nothing to change it. So this 
bug might be shared between X and GLIBC. 
 
Cheers, -Ali 


Comment 2 Miloslav Trmac 2002-10-01 17:19:28 UTC
any differences in ouput of 'locale' between X and text console?

Comment 3 Ali-Reza Anghaie 2002-10-01 17:24:50 UTC
Nope, no differences at all. And setting them once gnome-terminal was already 
launched did nothing to help this problem. The guess is that the X locale info 
is ~different~ than the GLIBC stuff that would be used otherwise. It was 
really bizarre and Halley and I might have to demo it for Mharris at some 
point. It took a while to characterize it ~consistently~. -Ali

Comment 4 Owen Taylor 2002-10-01 22:02:39 UTC
Note that I've never had any luck at all getting 'locale' to
produce anything useful. I'd advise ignoring it.

I suspect something is setting LC_COLLATE or LC_CTYPE different
in the two circumstances.

Comment 5 Ali-Reza Anghaie 2002-10-01 22:10:58 UTC
That is where I think the X locale directory specified above is futzing with 
things. A difference between the same locale between the way X and GLIBC 
handle it.. Cheers, -Ali

Comment 6 Miloslav Trmac 2002-10-03 04:01:12 UTC
Ah, I know what happens, nothing to do with X, but genuine bash bug.
[ Ed, would you mind reassigning? ]

The problem is that bash uses the LANG/LC_* variable values from the time
it starts. When it is started from gnome-terminal, it gets LANG=en_US from its
environment as everything else, and works as expected. When started on
login in console, it *doesn't*, LANG is set *inside* the shell and bash
doesn't reflect it.

To verify it, run a subshell in console (i.e. type 'bash') and in this subshell
(started with LANG=en_US), things behave as in X.

This is bash bug, POSIX 1003.1-2002 (Shell and Utilites, Issue 6) requires in
section 2.5.3 Shell Variables, that "The following [shell, not environment]
variables shall affect the execution of the shell" ... LANG, LC_*, ...
- i.e. changing the value *inside the shell* is required to affect it.

Comment 7 Ali-Reza Anghaie 2002-10-03 04:13:56 UTC
I see what you're saying but as I noted above, even once in the shell (under  
gnome-terminal) doing an 'export LANG=<foo>' wouldn't change the behavior  
either.. that isn't right. And that only seemed to be broken under  
gnome-terminal and I was ~guessing~ based on traces where it inherited that ( 
as opposed to traces under konsole). More noted above.  
  
Hrmm. I think I'll wait for Mharris to comment. He mentioned some things to me  
earlier today that he thought might be related. 
 
Then again, perhaps I don't quite grasp the vastness of your comment. I'm 
Forrest-Gumpish at times.  ;-)  
  
Cheers, -Ali  
   


Comment 8 Miloslav Trmac 2002-10-03 11:37:52 UTC
Ali, the whole point is that bash ignores changes of LC_* and LANG which
are done inside it:

in konsole:
touch a A b B c C
LANG=POSIX bash
ls [a-c] .... a b c
export LANG=cs_CZ
ls [a-c] ... a b c
exit
LANG=cs_CZ bash
ls [a-c] ... a B b C c
export LANG=POSIX
ls [a-c] ... a B b C c
exit

see?

LANG=cs_CZ bash shows one behavior,
LANG=POSIX bash shows the other,
even if *in* both of them, LANG is then set to the same value.
(after the shel is exec()ed, changing LANG has no effect)

And gnome-terminal runs it with LANG=en_US, but getty/login with LANG=C
(or unset)


Comment 9 Ali-Reza Anghaie 2002-10-03 13:48:04 UTC
Ah. I see what you meant now, didn't understand it before. So Ed, remember   
when I was 'LANG=<foo> gnome-terminal' and that all worked? Ok. Hrmm.   
   
Something still bothers me here. Switching from UTF to Latin fixes the display   
problems in programs like 'man' and 'screen' (UNICODE growing pains). And   
those programs certainly noticed the changed. So ~BASH~ is ignoring it.  
  
SOB, you know why I didn't get your behavior before? Your POSIX comment  
should've jogged my memory, I thought I saw some anomoly while testing Ed but  
YOU certainly grasped it, I didn't. Watch:  
  
[ali@damascus mitr]$ LANG="en_US.UTF-8" bash --posix  
  
bash-2.05b$ ls  
a  A  b  B  c  C  d  D  
  
bash-2.05b$ ls [a-z]*  
a  A  b  B  c  C  d  D  
  
bash-2.05b$ export LANG="POSIX"  
  
bash-2.05b$ ls [a-z]*  
a  b  c  d  
  
  
Note that launching bash as 'sh' defaults to this behavior I understand. Hrmm. 
 
Now I don't believe there is a difference between POSIX 1003.2 (from man page 
vs. classic (mine above)) but I haven't found a firm answer to that yet. Mitr 
or RH might have that stuff/difference handy. 
 
Talk about bloody confusing. I'm never submitting a Bugzilla report again.  
:-/ 
 
Cheers, -Ali 


Comment 10 Mike A. Harris 2002-10-07 05:59:05 UTC
This seems to be a valid bug IMHO.  After tracing things a bit, I
believe it might be a bash bug.  I'm CC'ing the bash maintainer, and
Jakub and Uli for concensus/advice/info.


Comment 11 Ulrich Drepper 2002-10-07 07:01:36 UTC
All these discussions do not provide all the needed information.

In each and every situation mentioned also run 'locale' and post the output. 
This will show what programs should see.

Comment 12 Ed Halley 2002-10-07 14:32:59 UTC
In each usage case above, #4 and #7 as en_US.UTF-8, and #4 and #7 as POSIX (or
C), the output from locale was unanimously as expected.  All locale variables
report the en_US.UTF-8 or POSIX or C setting.

First run of #4 and #7:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Second run of #4 and #7:
$ locale
LANG=POSIX
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=


Comment 13 Tim Waugh 2002-10-07 14:57:45 UTC
mitr seems to have hit the nail on the head here.  The quote is quite 
clear that LANG et al should behave in a similar to HOME. 
 
c.f.: 
HOME=foo 
echo ~ 


Comment 14 Ulrich Drepper 2002-10-07 16:50:21 UTC
I know that there is already special code in bash which traces certain envvars
and directly uses their values.  HOME is one example, but there is also LANG.

At least some of the LC_* variables are missing although I'm sure that some
where handled in the patch.  In any case, quite easy to locate and probably fix.

Comment 15 Miloslav Trmac 2002-10-10 02:50:29 UTC
OK, here is a patch. It fixes the problem for me, but it changes most of
the locale variable handling (which was originally really strange IMHO,
LANG would override LC_*, assuming that libc sees bash variables), and we
already know these issues are quite brittle. Something for Red Hat QA team
to spend a day or two on :-( All I can say is I'm currently running
the patched bash and the systems seems to shut down, boot and run cleanly.

The original code ignores (sort of) most L* variable changes, but LC_ALL
is honored. So a workaround for those who don't need to set individual
LC_* values seems to be to use LC_ALL instead of LANG in /etc/sysconfig/i18n.

As a second note, either bash needs a BuildPrereq on texinfo, or the implicit
build requirements should be documented somewhere ;-)

Comment 16 Miloslav Trmac 2002-10-10 02:52:46 UTC
Created attachment 79724 [details]
Patch hopefully fixing the bash L* variable weirdness

Comment 17 Tim Waugh 2002-10-11 10:45:12 UTC
mitr: You beat me to reporting it upstream.. ;-)

Comment 18 Miloslav Trmac 2002-10-17 07:44:53 UTC
Two more comments:
a) From discussion with bash maintainer: The original intent was that
   setlocale (LC_*, "") works, because bash "overrides" getenv (). This was
   AFAIK never guarranteed to work, and breaks with glibc 2.3, which always
   calls internal getenv (this means PLT reduction and smaller run-time-linking
   overhead).
b) In case the QA team does not notice: this means that rc.sysinit has
   LANG=cs_CZ.UTF-8 (or whatever) set from the start, and all messages
   translated into Czech are printed translated, but console is set to UTF-8
   mode rather late, which means that most of the messages have "random"
   pairs characters instead of non-ASCII characters. This is quite ugly.
   I would like to humbly propose it's time to fix bug #30469 ;-)

Comment 19 Tim Waugh 2002-10-17 09:32:54 UTC
This should be fixed in bash-2.05b-6.

Comment 20 Tim Waugh 2002-11-02 16:27:46 UTC
*** Bug 77115 has been marked as a duplicate of this bug. ***

Comment 21 Tim Waugh 2003-06-23 14:52:19 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2003-140.html



Note You need to log in before you can comment on or make changes to this bug.