Bug 6000 - tcsh patterns broken due to locale settings
Summary: tcsh patterns broken due to locale settings
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: bash
Version: 6.1
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Bernhard Rosenkraenzer
QA Contact:
URL:
Whiteboard:
: 5980 6244 6398 10473 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 1999-10-15 21:28 UTC by ilh
Modified: 2008-05-01 15:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2000-07-27 18:30:22 UTC
Embargoed:


Attachments (Terms of Use)

Description ilh 1999-10-15 21:28:48 UTC
The following is very, very wrong and will break a lot of
scripts:

% touch A B C a b c
% echo [A-Z]
A a B b C c
% echo [a-z]
a B b C C

I traced it to LC_ALL="en_US" in /etc/sysconfig/i18n.  If I
do not set LC_ALL, but instead set LC_COLLATE="C", the tcsh
patterns work as expected.  This comes down to a bug in
strcoll() in glibc, or the locale files in /usr/share/locale
are screwed up.  Either way, this is a big bug!

I am running on 6.1 with all the updates.

Comment 1 Bill Nottingham 1999-11-08 16:09:59 UTC
*** Bug 5980 has been marked as a duplicate of this bug. ***

Found a bug in tcsh:

Example: (typed in a running tcsh)
pc-9:~ mkdir t
pc-9:~ cd t
pc-9:~/t touch A B c
pc-9:~/t echo [A-Z]*
A B c
pc-9:~/t sh
[zut@pc-9 t]$ echo [A-Z]*
A B
[zut@pc-9 t]$

sh parses the patterns correctly, tcsh works wrong !

------- Additional Comments From ilh.mit.edu  10/15/99 16:31 -------
I think this is a serious bug having to due with glibc and locale.
With LC_ALL=en_US, this bug always happens for me in 6.1.  With LC_ALL
undefined and LC_COLLATE=C, it works fine.

This bug is going to break a lot of scripts!

Comment 2 kevin lyda 1999-12-27 16:43:59 UTC
I just stumbled on this as well.  very annoying since i use ls -d [A-Z]* to list
docs in large dirs.  this seems to be slow to fix...

Comment 3 kevin lyda 1999-12-28 16:21:59 UTC
if you run the following program (strcoll2.c; cc -o strcoll2 strcoll2.c) like
so:

% sh
% cd /usr/share/i18n/locales
% for f in *; do
strcoll2 $f
done > /tmp/strcoll2.out

you'll see that every locale gets the ordering wrong (except the unsupported
ones which revert to the C locale)

#include <locale.h>
#include <string.h>
#include <stdio.h>

int
main(int argc, char *argv[])
{
    char *lcl;

    lcl = argv[1]? argv[1]: "en_US";
    printf("setlocale(LC_ALL, \"C\") yields %s\n", setlocale(LC_ALL, "C"));
    printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B"));
    printf("setlocale(LC_ALL, \"%s\") yields %s\n", lcl,
	    setlocale(LC_ALL, lcl));
    printf("strcoll(%s, %s) yields %d\n", "a", "B", strcoll("a", "B"));
}

Comment 4 kevin lyda 1999-12-28 16:52:59 UTC
oops, forgot these additional notes:

it seems the string collation is done via /usr/share/i18n/locales/en_DK - the
others i checked seem to copy from there.  i've emailed the author listed in the
file: Keld.Simonsen to see if he/she has any ideas.  it might be a
parsing/logic bug in glibc, so changing these files won't neccesarily help.

either way this is going to affect more than just tcsh, so this needs to get
fixed...

Comment 5 keld 1999-12-30 11:56:59 UTC
This is a feature, not a bug.

It comes from that all forms of one letter, such as upper and lower
case versions, are to be sorted before the next letter. Thus all "a"'s
come before all "b"'s.

The behaviour is recognized by ISO and IEEE and The Open Group, and
it is being set in concrete in the new ISO standard 14651, and in
a Unicode TR. [A-Z] is not a good way to say uppercase letters.
Rather use [:upper:] - this also includes accented letters.

Probably sorting behaviour is actually not the problem, but rather
that the [] notation is widely used and dependent on the locale collation.
Standardizers have discussed new syntax for regular expressions, also to include
the full 10646 repertoire in a protable way, but no concensus has been
found yet.

Kind regards
Keld Simonsen

Comment 6 kevin lyda 1999-12-30 12:08:59 UTC
ok, so strcoll() won't change it's behaviour.  A note in the man page would be
nice.  I'll do a patch.  I can either try to keep strcoll and add a check for
lower/upper, or dump it for strcmp.  anyone have a vote?  can't do it now
though, but i'll do it in a few hours.  suppose i should ask the tcsh people...

Comment 7 kevin lyda 2000-01-01 13:05:59 UTC
ok, this seems to work for me.  these lines get added to glob.c:

    if (islower(c1) && isupper(c2))
        return (1);

now you can't just put this anywhere!  see below for where it goes.

int
globcharcoll(c1, c2)
    int c1, c2;
{
#if defined(NLS) && defined(LC_COLLATE) && !defined(NOSTRCOLL)
    char s1[2], s2[2];

    if (c1 == c2)
        return (0);
    if (islower(c1) && isupper(c2))
        return (1);
    s1[0] = c1;
    s2[0] = c2;

Comment 8 kevin lyda 2000-01-01 22:08:59 UTC
i've emailed the author of tcsh with the changes (christos (Christos
Zoulas)), and he's accepted them.  i think 6.09.00 is current, and this change
would be a further version along.  before making an additional patch file to be
applied within the rpm, i'd suggest the new version of tcsh.

if the other people who have seen this bug could report back with any adverse
behaviours i'd appreciate it.  likewise success.  :)

Comment 9 kevin lyda 2000-01-03 00:31:59 UTC
bug id's 6244 & 6398. relate to this one.  if this gets closed, both of those do
as well.  (at least if the fix involves snarfing the new tcsh)

Comment 10 santini 2000-01-06 22:48:59 UTC
bash2 has the same annoyng problem (being at home, with no access to bugzilla
db, it took me two ours of digging to trace the problem :-)

Comment 11 Jeff Johnson 2000-01-10 20:18:59 UTC
*** Bug 6398 has been marked as a duplicate of this bug. ***

Comment 12 Jeff Johnson 2000-01-10 20:19:59 UTC
*** Bug 6244 has been marked as a duplicate of this bug. ***

Comment 13 Jeff Johnson 2000-01-10 20:51:59 UTC
Fixed (at least tcsh) in tcsh-6.09-1 in Raw Hide.

Changing component to bash2 ...

Comment 14 Andy Newsam 2000-03-31 09:41:59 UTC
*** Bug 10473 has been marked as a duplicate of this bug. ***

Comment 15 Bernhard Rosenkraenzer 2000-08-08 15:23:30 UTC
You can use LANG="C" to turn off this feature.


Note You need to log in before you can comment on or make changes to this bug.