Bug 146253 - unexpected locale-dependent behaviour with -name \*
Summary: unexpected locale-dependent behaviour with -name \*
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 3
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-01-26 14:46 UTC by John Smith
Modified: 2007-11-30 22:10 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2005-01-28 14:42:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description John Smith 2005-01-26 14:46:16 UTC
Description of problem:

Depending on the locale settings, "find -name \*" sometimes
fails to find some files.


How reproducible:
Always.

Steps to Reproduce:

Type, in bash:

mkdir /tmp/foo
cd /tmp/foo
touch $(echo -e \\351)
LC_ALL=en_US.UTF-8 find -name \* | wc -l
LC_ALL=en_US.iso88591 find -name \* | wc -l

  
Actual results:

/tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l
0
/tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l
1


Expected results:

/tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l
1
/tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l
1


Additional info:

OK, this filename is not a valid UTF-8 string, but it
is still a valid POSIX filename.

Comment 1 John Smith 2005-01-26 21:28:01 UTC
Apparently this is the correct behaviour specified by POSIX.

Comment 2 John Smith 2005-01-26 23:11:49 UTC
Sorry, I take back my previous comment.  This is a bug according
to POSIX:

SUSv3 says (Shell & Utilities, 2.13.2):

The asterisk ( '*' ) is a pattern that shall match any string,
including the null string.

And it also says (Base definitions, 3.367):
String: A contiguous sequence of bytes terminated by and including the
first null byte.

Comment 3 John Smith 2005-01-26 23:12:31 UTC
Forgot to reopen. Soooorry.

Comment 4 Tim Waugh 2005-01-28 14:32:46 UTC
"find" just calls fnmatch().

#include <fnmatch.h>
#include <locale.h>
int main ()
{
        setlocale (LC_ALL, "");
        return fnmatch ("*", "\351", FNM_PERIOD);
}

setlocale(6, "")                                 = "en_GB.UTF-8"
fnmatch("*", "\351", 4)                          = -1


Comment 5 Jakub Jelinek 2005-01-28 14:42:14 UTC
I disagree with that interpretation.
E.g. 2.13.1 says:
An asterisk is a pattern that shall match multiple characters, as described in
Patterns Matching Multiple Characters.

\351 is not a character in UTF-8, so it shall not be matched by fnmatch.

Comment 6 John Smith 2005-01-28 19:06:08 UTC
I have three comments:

1) I still think my interpretation was right. The sentence
ends with "as described in Patterns Matching Multiple Characters",
so what is written in that section has precedence over what you
quote. And in that section, SUSv3 says "the asterisk is a pattern
that shall match any string".

2) If you still disagree with my interpretation, then you must
consider that SUSv3 is inconsistent, and that should probably
be reported upstream.

3) Assuming you are right, all the shells that come with FC3
are broken (sh, ash, bsh, ksh, pdksh, tcsh, csh, zsh).
Indeed, with all of them, "echo \*" does display filenames which
are not sequences of characters according to the locale.
Should I file a bug report against each of them ?


Note You need to log in before you can comment on or make changes to this bug.