Bug 146253 - unexpected locale-dependent behaviour with -name \*
unexpected locale-dependent behaviour with -name \*
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Depends On:
  Show dependency treegraph
Reported: 2005-01-26 09:46 EST by John Smith
Modified: 2007-11-30 17:10 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-01-28 09:42:14 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description John Smith 2005-01-26 09:46:16 EST
Description of problem:

Depending on the locale settings, "find -name \*" sometimes
fails to find some files.

How reproducible:

Steps to Reproduce:

Type, in bash:

mkdir /tmp/foo
cd /tmp/foo
touch $(echo -e \\351)
LC_ALL=en_US.UTF-8 find -name \* | wc -l
LC_ALL=en_US.iso88591 find -name \* | wc -l

Actual results:

/tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l
/tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l

Expected results:

/tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l
/tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l

Additional info:

OK, this filename is not a valid UTF-8 string, but it
is still a valid POSIX filename.
Comment 1 John Smith 2005-01-26 16:28:01 EST
Apparently this is the correct behaviour specified by POSIX.
Comment 2 John Smith 2005-01-26 18:11:49 EST
Sorry, I take back my previous comment.  This is a bug according

SUSv3 says (Shell & Utilities, 2.13.2):

The asterisk ( '*' ) is a pattern that shall match any string,
including the null string.

And it also says (Base definitions, 3.367):
String: A contiguous sequence of bytes terminated by and including the
first null byte.
Comment 3 John Smith 2005-01-26 18:12:31 EST
Forgot to reopen. Soooorry.
Comment 4 Tim Waugh 2005-01-28 09:32:46 EST
"find" just calls fnmatch().

#include <fnmatch.h>
#include <locale.h>
int main ()
        setlocale (LC_ALL, "");
        return fnmatch ("*", "\351", FNM_PERIOD);

setlocale(6, "")                                 = "en_GB.UTF-8"
fnmatch("*", "\351", 4)                          = -1
Comment 5 Jakub Jelinek 2005-01-28 09:42:14 EST
I disagree with that interpretation.
E.g. 2.13.1 says:
An asterisk is a pattern that shall match multiple characters, as described in
Patterns Matching Multiple Characters.

\351 is not a character in UTF-8, so it shall not be matched by fnmatch.
Comment 6 John Smith 2005-01-28 14:06:08 EST
I have three comments:

1) I still think my interpretation was right. The sentence
ends with "as described in Patterns Matching Multiple Characters",
so what is written in that section has precedence over what you
quote. And in that section, SUSv3 says "the asterisk is a pattern
that shall match any string".

2) If you still disagree with my interpretation, then you must
consider that SUSv3 is inconsistent, and that should probably
be reported upstream.

3) Assuming you are right, all the shells that come with FC3
are broken (sh, ash, bsh, ksh, pdksh, tcsh, csh, zsh).
Indeed, with all of them, "echo \*" does display filenames which
are not sequences of characters according to the locale.
Should I file a bug report against each of them ?

Note You need to log in before you can comment on or make changes to this bug.