Description of Problem: With all of the defaults of a fresh RHL8.0 setup, the locale and collation rules are different for processes parented by X versus processes not parented by X. This is *not* a bug report about whether locale en_US vs locale POSIX should be made the default value for Red Hat Linux. This is a bug report that shows that the default locale is treated differently under XFree86 than it is outside of XFree86. Steps to Reproduce: 1. Fresh English/USA install of Psyche. Fresh user account. 2. mkdir example ; cd example ; touch Apple apple Banana banana Cherry cherry Zod zod 3. Run any terminal process under X. Some examples tested were gnome-terminal, xterm, and xchat's terminal tab. 4. (cd example ; ls ; echo --- ; ls [a-z]* ; echo --- ; ls [A-Z]*) 5. Switch to a text virtual terminal. 6. (cd example ; ls ; echo --- ; ls [a-z]* ; echo --- ; ls [A-Z]*) 7. Compare results from #4 and #6. Results will differ. 8. Edit /etc/sysconfig/i18n LANG="POSIX" 9. Restart X. 10. Repeat steps #3 through #7. Results will match. Actual Results: #4 under X with default global locale settings (en_US.UTF-8) apple Apple banana Banana cherry Cherry zod Zod --- apple Banana banana Cherry cherry Zod zod --- Apple apple Banana banana Cherry cherry Zod Note that [A-Z]* misses apple but includes banana, cherry, zod. Note that [a-z]* misses zod but includes apple, banana, cherry. Sort and globbing rules appear to interleave letters aAbBcC...zZ. #7 no X with default global locale settings (en_US.UTF-8) apple Apple banana Banana cherry Cherry zod Zod --- apple banana cherry zod --- Apple Banana Cherry Zod Note that #4 above does not match #7 above. The difference between the two is XFree86. ================= #4 under X with adjusted global locale settings of POSIX or C Apple Banana Cherry Zod apple banana cherry zod --- apple banana cherry zod --- Apple Banana Cherry Zod #7 no X with adjusted global locale settings of POSIX or C Apple Banana Cherry Zod apple banana cherry zod --- apple banana cherry zod --- Apple Banana Cherry Zod Note that the adjusted #4 above does match the adjusted #7 above. Once the POSIX (or C) rules are chosen as the local, XFree86 no longer acts different from non-XFree86 environments. Expected Results: All processes should have the same locale and collation rules whether running under XFree86 or not. The first output from step #4 above should have matched the first output from step #7 above. XFree86 should not cause any locale differences in the behavior of any non- graphical application run within its environment. Additional Information: Other locales may be inconsistent between XFree86 and non-XFree86 environments as well.
I am able to replicate this problem. Comparing systems with the LOCALE set to POSIX/C in /etc/sysconfig/i18n and systems that have the default the only significant difference is that the trace shows hits to /usr/X11R6/lib/X11/locale/en_US.UTF-8/ .. The bug report, again, isn't a LC_COLLATE complaint. It's just that en_US.UTF-8 seems to behave differently under X vs. console. What drives you insane is that once the gnome-terminal or what-not was launched, "export LANG="POSIX"" or C did ~nothing~ to the above. Completely unexpected. So the only way to get it to work properly is to get the env set before X starts. From memory I'm thinking that X is behaving properly for the given locale except I don't understand why the 'export' does nothing to change it. So this bug might be shared between X and GLIBC. Cheers, -Ali
any differences in ouput of 'locale' between X and text console?
Nope, no differences at all. And setting them once gnome-terminal was already launched did nothing to help this problem. The guess is that the X locale info is ~different~ than the GLIBC stuff that would be used otherwise. It was really bizarre and Halley and I might have to demo it for Mharris at some point. It took a while to characterize it ~consistently~. -Ali
Note that I've never had any luck at all getting 'locale' to produce anything useful. I'd advise ignoring it. I suspect something is setting LC_COLLATE or LC_CTYPE different in the two circumstances.
That is where I think the X locale directory specified above is futzing with things. A difference between the same locale between the way X and GLIBC handle it.. Cheers, -Ali
Ah, I know what happens, nothing to do with X, but genuine bash bug. [ Ed, would you mind reassigning? ] The problem is that bash uses the LANG/LC_* variable values from the time it starts. When it is started from gnome-terminal, it gets LANG=en_US from its environment as everything else, and works as expected. When started on login in console, it *doesn't*, LANG is set *inside* the shell and bash doesn't reflect it. To verify it, run a subshell in console (i.e. type 'bash') and in this subshell (started with LANG=en_US), things behave as in X. This is bash bug, POSIX 1003.1-2002 (Shell and Utilites, Issue 6) requires in section 2.5.3 Shell Variables, that "The following [shell, not environment] variables shall affect the execution of the shell" ... LANG, LC_*, ... - i.e. changing the value *inside the shell* is required to affect it.
I see what you're saying but as I noted above, even once in the shell (under gnome-terminal) doing an 'export LANG=<foo>' wouldn't change the behavior either.. that isn't right. And that only seemed to be broken under gnome-terminal and I was ~guessing~ based on traces where it inherited that ( as opposed to traces under konsole). More noted above. Hrmm. I think I'll wait for Mharris to comment. He mentioned some things to me earlier today that he thought might be related. Then again, perhaps I don't quite grasp the vastness of your comment. I'm Forrest-Gumpish at times. ;-) Cheers, -Ali
Ali, the whole point is that bash ignores changes of LC_* and LANG which are done inside it: in konsole: touch a A b B c C LANG=POSIX bash ls [a-c] .... a b c export LANG=cs_CZ ls [a-c] ... a b c exit LANG=cs_CZ bash ls [a-c] ... a B b C c export LANG=POSIX ls [a-c] ... a B b C c exit see? LANG=cs_CZ bash shows one behavior, LANG=POSIX bash shows the other, even if *in* both of them, LANG is then set to the same value. (after the shel is exec()ed, changing LANG has no effect) And gnome-terminal runs it with LANG=en_US, but getty/login with LANG=C (or unset)
Ah. I see what you meant now, didn't understand it before. So Ed, remember when I was 'LANG=<foo> gnome-terminal' and that all worked? Ok. Hrmm. Something still bothers me here. Switching from UTF to Latin fixes the display problems in programs like 'man' and 'screen' (UNICODE growing pains). And those programs certainly noticed the changed. So ~BASH~ is ignoring it. SOB, you know why I didn't get your behavior before? Your POSIX comment should've jogged my memory, I thought I saw some anomoly while testing Ed but YOU certainly grasped it, I didn't. Watch: [ali@damascus mitr]$ LANG="en_US.UTF-8" bash --posix bash-2.05b$ ls a A b B c C d D bash-2.05b$ ls [a-z]* a A b B c C d D bash-2.05b$ export LANG="POSIX" bash-2.05b$ ls [a-z]* a b c d Note that launching bash as 'sh' defaults to this behavior I understand. Hrmm. Now I don't believe there is a difference between POSIX 1003.2 (from man page vs. classic (mine above)) but I haven't found a firm answer to that yet. Mitr or RH might have that stuff/difference handy. Talk about bloody confusing. I'm never submitting a Bugzilla report again. :-/ Cheers, -Ali
This seems to be a valid bug IMHO. After tracing things a bit, I believe it might be a bash bug. I'm CC'ing the bash maintainer, and Jakub and Uli for concensus/advice/info.
All these discussions do not provide all the needed information. In each and every situation mentioned also run 'locale' and post the output. This will show what programs should see.
In each usage case above, #4 and #7 as en_US.UTF-8, and #4 and #7 as POSIX (or C), the output from locale was unanimously as expected. All locale variables report the en_US.UTF-8 or POSIX or C setting. First run of #4 and #7: $ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Second run of #4 and #7: $ locale LANG=POSIX LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL=
mitr seems to have hit the nail on the head here. The quote is quite clear that LANG et al should behave in a similar to HOME. c.f.: HOME=foo echo ~
I know that there is already special code in bash which traces certain envvars and directly uses their values. HOME is one example, but there is also LANG. At least some of the LC_* variables are missing although I'm sure that some where handled in the patch. In any case, quite easy to locate and probably fix.
OK, here is a patch. It fixes the problem for me, but it changes most of the locale variable handling (which was originally really strange IMHO, LANG would override LC_*, assuming that libc sees bash variables), and we already know these issues are quite brittle. Something for Red Hat QA team to spend a day or two on :-( All I can say is I'm currently running the patched bash and the systems seems to shut down, boot and run cleanly. The original code ignores (sort of) most L* variable changes, but LC_ALL is honored. So a workaround for those who don't need to set individual LC_* values seems to be to use LC_ALL instead of LANG in /etc/sysconfig/i18n. As a second note, either bash needs a BuildPrereq on texinfo, or the implicit build requirements should be documented somewhere ;-)
Created attachment 79724 [details] Patch hopefully fixing the bash L* variable weirdness
mitr: You beat me to reporting it upstream.. ;-)
Two more comments: a) From discussion with bash maintainer: The original intent was that setlocale (LC_*, "") works, because bash "overrides" getenv (). This was AFAIK never guarranteed to work, and breaks with glibc 2.3, which always calls internal getenv (this means PLT reduction and smaller run-time-linking overhead). b) In case the QA team does not notice: this means that rc.sysinit has LANG=cs_CZ.UTF-8 (or whatever) set from the start, and all messages translated into Czech are printed translated, but console is set to UTF-8 mode rather late, which means that most of the messages have "random" pairs characters instead of non-ASCII characters. This is quite ugly. I would like to humbly propose it's time to fix bug #30469 ;-)
This should be fixed in bash-2.05b-6.
*** Bug 77115 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2003-140.html