The following is very, very wrong and will break a lot of scripts: % touch A B C a b c % echo [A-Z] A a B b C c % echo [a-z] a B b C C I traced it to LC_ALL="en_US" in /etc/sysconfig/i18n. If I do not set LC_ALL, but instead set LC_COLLATE="C", the tcsh patterns work as expected. This comes down to a bug in strcoll() in glibc, or the locale files in /usr/share/locale are screwed up. Either way, this is a big bug! I am running on 6.1 with all the updates.
*** Bug 5980 has been marked as a duplicate of this bug. *** Found a bug in tcsh: Example: (typed in a running tcsh) pc-9:~ mkdir t pc-9:~ cd t pc-9:~/t touch A B c pc-9:~/t echo [A-Z]* A B c pc-9:~/t sh [zut@pc-9 t]$ echo [A-Z]* A B [zut@pc-9 t]$ sh parses the patterns correctly, tcsh works wrong ! ------- Additional Comments From ilh.mit.edu 10/15/99 16:31 ------- I think this is a serious bug having to due with glibc and locale. With LC_ALL=en_US, this bug always happens for me in 6.1. With LC_ALL undefined and LC_COLLATE=C, it works fine. This bug is going to break a lot of scripts!
I just stumbled on this as well. very annoying since i use ls -d [A-Z]* to list docs in large dirs. this seems to be slow to fix...
if you run the following program (strcoll2.c; cc -o strcoll2 strcoll2.c) like so: % sh % cd /usr/share/i18n/locales % for f in *; do strcoll2 $f done > /tmp/strcoll2.out you'll see that every locale gets the ordering wrong (except the unsupported ones which revert to the C locale) #include <locale.h> #include <string.h> #include <stdio.h> int main(int argc, char *argv[]) { char *lcl; lcl = argv[1]? argv[1]: "en_US"; printf("setlocale(LC_ALL, \"C\") yields %s\n", setlocale(LC_ALL, "C")); printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B")); printf("setlocale(LC_ALL, \"%s\") yields %s\n", lcl, setlocale(LC_ALL, lcl)); printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B")); }
oops, forgot these additional notes: it seems the string collation is done via /usr/share/i18n/locales/en_DK - the others i checked seem to copy from there. i've emailed the author listed in the file: Keld.Simonsen to see if he/she has any ideas. it might be a parsing/logic bug in glibc, so changing these files won't neccesarily help. either way this is going to affect more than just tcsh, so this needs to get fixed...
This is a feature, not a bug. It comes from that all forms of one letter, such as upper and lower case versions, are to be sorted before the next letter. Thus all "a"'s come before all "b"'s. The behaviour is recognized by ISO and IEEE and The Open Group, and it is being set in concrete in the new ISO standard 14651, and in a Unicode TR. [A-Z] is not a good way to say uppercase letters. Rather use [:upper:] - this also includes accented letters. Probably sorting behaviour is actually not the problem, but rather that the [] notation is widely used and dependent on the locale collation. Standardizers have discussed new syntax for regular expressions, also to include the full 10646 repertoire in a protable way, but no concensus has been found yet. Kind regards Keld Simonsen
ok, so strcoll() won't change it's behaviour. A note in the man page would be nice. I'll do a patch. I can either try to keep strcoll and add a check for lower/upper, or dump it for strcmp. anyone have a vote? can't do it now though, but i'll do it in a few hours. suppose i should ask the tcsh people...
ok, this seems to work for me. these lines get added to glob.c: if (islower(c1) && isupper(c2)) return (1); now you can't just put this anywhere! see below for where it goes. int globcharcoll(c1, c2) int c1, c2; { #if defined(NLS) && defined(LC_COLLATE) && !defined(NOSTRCOLL) char s1[2], s2[2]; if (c1 == c2) return (0); if (islower(c1) && isupper(c2)) return (1); s1[0] = c1; s2[0] = c2;
i've emailed the author of tcsh with the changes (christos (Christos Zoulas)), and he's accepted them. i think 6.09.00 is current, and this change would be a further version along. before making an additional patch file to be applied within the rpm, i'd suggest the new version of tcsh. if the other people who have seen this bug could report back with any adverse behaviours i'd appreciate it. likewise success. :)
bug id's 6244 & 6398. relate to this one. if this gets closed, both of those do as well. (at least if the fix involves snarfing the new tcsh)
bash2 has the same annoyng problem (being at home, with no access to bugzilla db, it took me two ours of digging to trace the problem :-)
*** Bug 6398 has been marked as a duplicate of this bug. ***
*** Bug 6244 has been marked as a duplicate of this bug. ***
Fixed (at least tcsh) in tcsh-6.09-1 in Raw Hide. Changing component to bash2 ...
*** Bug 10473 has been marked as a duplicate of this bug. ***
You can use LANG="C" to turn off this feature.