Bug 6000
Summary: | tcsh patterns broken due to locale settings | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | ilh |
Component: | bash | Assignee: | Bernhard Rosenkraenzer <bero> |
Status: | CLOSED NOTABUG | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 6.1 | CC: | kevin, santini, zut |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2000-07-27 18:30:22 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
ilh
1999-10-15 21:28:48 UTC
*** Bug 5980 has been marked as a duplicate of this bug. *** Found a bug in tcsh: Example: (typed in a running tcsh) pc-9:~ mkdir t pc-9:~ cd t pc-9:~/t touch A B c pc-9:~/t echo [A-Z]* A B c pc-9:~/t sh [zut@pc-9 t]$ echo [A-Z]* A B [zut@pc-9 t]$ sh parses the patterns correctly, tcsh works wrong ! ------- Additional Comments From ilh.mit.edu 10/15/99 16:31 ------- I think this is a serious bug having to due with glibc and locale. With LC_ALL=en_US, this bug always happens for me in 6.1. With LC_ALL undefined and LC_COLLATE=C, it works fine. This bug is going to break a lot of scripts! I just stumbled on this as well. very annoying since i use ls -d [A-Z]* to list docs in large dirs. this seems to be slow to fix... if you run the following program (strcoll2.c; cc -o strcoll2 strcoll2.c) like so: % sh % cd /usr/share/i18n/locales % for f in *; do strcoll2 $f done > /tmp/strcoll2.out you'll see that every locale gets the ordering wrong (except the unsupported ones which revert to the C locale) #include <locale.h> #include <string.h> #include <stdio.h> int main(int argc, char *argv[]) { char *lcl; lcl = argv[1]? argv[1]: "en_US"; printf("setlocale(LC_ALL, \"C\") yields %s\n", setlocale(LC_ALL, "C")); printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B")); printf("setlocale(LC_ALL, \"%s\") yields %s\n", lcl, setlocale(LC_ALL, lcl)); printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B")); } oops, forgot these additional notes: it seems the string collation is done via /usr/share/i18n/locales/en_DK - the others i checked seem to copy from there. i've emailed the author listed in the file: Keld.Simonsen to see if he/she has any ideas. it might be a parsing/logic bug in glibc, so changing these files won't neccesarily help. either way this is going to affect more than just tcsh, so this needs to get fixed... This is a feature, not a bug. It comes from that all forms of one letter, such as upper and lower case versions, are to be sorted before the next letter. Thus all "a"'s come before all "b"'s. The behaviour is recognized by ISO and IEEE and The Open Group, and it is being set in concrete in the new ISO standard 14651, and in a Unicode TR. [A-Z] is not a good way to say uppercase letters. Rather use [:upper:] - this also includes accented letters. Probably sorting behaviour is actually not the problem, but rather that the [] notation is widely used and dependent on the locale collation. Standardizers have discussed new syntax for regular expressions, also to include the full 10646 repertoire in a protable way, but no concensus has been found yet. Kind regards Keld Simonsen ok, so strcoll() won't change it's behaviour. A note in the man page would be nice. I'll do a patch. I can either try to keep strcoll and add a check for lower/upper, or dump it for strcmp. anyone have a vote? can't do it now though, but i'll do it in a few hours. suppose i should ask the tcsh people... ok, this seems to work for me. these lines get added to glob.c: if (islower(c1) && isupper(c2)) return (1); now you can't just put this anywhere! see below for where it goes. int globcharcoll(c1, c2) int c1, c2; { #if defined(NLS) && defined(LC_COLLATE) && !defined(NOSTRCOLL) char s1[2], s2[2]; if (c1 == c2) return (0); if (islower(c1) && isupper(c2)) return (1); s1[0] = c1; s2[0] = c2; i've emailed the author of tcsh with the changes (christos (Christos Zoulas)), and he's accepted them. i think 6.09.00 is current, and this change would be a further version along. before making an additional patch file to be applied within the rpm, i'd suggest the new version of tcsh. if the other people who have seen this bug could report back with any adverse behaviours i'd appreciate it. likewise success. :) bug id's 6244 & 6398. relate to this one. if this gets closed, both of those do as well. (at least if the fix involves snarfing the new tcsh) bash2 has the same annoyng problem (being at home, with no access to bugzilla db, it took me two ours of digging to trace the problem :-) *** Bug 6398 has been marked as a duplicate of this bug. *** *** Bug 6244 has been marked as a duplicate of this bug. *** Fixed (at least tcsh) in tcsh-6.09-1 in Raw Hide. Changing component to bash2 ... *** Bug 10473 has been marked as a duplicate of this bug. *** You can use LANG="C" to turn off this feature. |