From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux) Description of problem: When your locale is using UTF8 there is a problem with collation. Try the following under a bash: mkdir c cd c > a > b echo [A-Z] The output of echo is: b which is entirely wrong. If you have LC_COLLATE=C in the environment the bug does not occur. This bug will cause shell scripts, at least, to behave incorrectly. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. run bash interactively 2 run "locale" to make sure "UTF8" appears in the LC_COLLATE variable, e.g. locale | grep LC_COLLATE and the output is something like: LC_COLLATE=en_US.UTF-8 3. run the following commands: mkdir c cd c > a > b echo [A-Z] the output of echo is: b which is wrong Actual Results: the output of echo is: b which is wrong Expected Results: the output of echo should be: [A-Z] Additional info: If you set the environment variable LC_LOCALE=C the results of running the example is correct: [A-Z] However, it must be set when the shell is started, so do LC_LOCALE=C bash and then run the example from this new shell.
Range expressions (i.e. [A-Z]) must behave according to LC_COLLATE settings, i.e. use dictionary order (AaBbCc or aAbBcC in most locales). This is mandated by POSIX, if you need the old behavior, export LC_COLLATE=C. The only problem is that bash doesn't react immediately (needs LC_COLLATE=C bash), and this is fixed in bash-2.05b-7 in rawhide.
I understand, thanks. However, won't existing shell scripts break? Perhaps, LC_COLLATE should be set in /etc/profile.d/glib2.* to "C" to avoid this? (Or something like it.) The reason this was spotted was because a friend of mine noticed an existing shell script broke under Redhat 8.0.
You should set LC_COLLATE=C (or LC_ALL=C) right before invoking program which requires the C collation. Say if you want to do an ASCII sort, you do somecommand | LC_ALL=C sort This is nothing new in 8.0 - exactly the same behaviour was there in the 7.x series (though in that case the locales weren't en_US.UTF-8 etc., but en_US or en_US.ISO-8859-15).