Bug 146253

Summary: unexpected locale-dependent behaviour with -name \*
Product: [Fedora] Fedora Reporter: John Smith <johnsmith7219>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: drepper, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-01-28 14:42:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Smith 2005-01-26 14:46:16 UTC
Description of problem:

Depending on the locale settings, "find -name \*" sometimes
fails to find some files.


How reproducible:
Always.

Steps to Reproduce:

Type, in bash:

mkdir /tmp/foo
cd /tmp/foo
touch $(echo -e \\351)
LC_ALL=en_US.UTF-8 find -name \* | wc -l
LC_ALL=en_US.iso88591 find -name \* | wc -l

  
Actual results:

/tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l
0
/tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l
1


Expected results:

/tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l
1
/tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l
1


Additional info:

OK, this filename is not a valid UTF-8 string, but it
is still a valid POSIX filename.

Comment 1 John Smith 2005-01-26 21:28:01 UTC
Apparently this is the correct behaviour specified by POSIX.

Comment 2 John Smith 2005-01-26 23:11:49 UTC
Sorry, I take back my previous comment.  This is a bug according
to POSIX:

SUSv3 says (Shell & Utilities, 2.13.2):

The asterisk ( '*' ) is a pattern that shall match any string,
including the null string.

And it also says (Base definitions, 3.367):
String: A contiguous sequence of bytes terminated by and including the
first null byte.

Comment 3 John Smith 2005-01-26 23:12:31 UTC
Forgot to reopen. Soooorry.

Comment 4 Tim Waugh 2005-01-28 14:32:46 UTC
"find" just calls fnmatch().

#include <fnmatch.h>
#include <locale.h>
int main ()
{
        setlocale (LC_ALL, "");
        return fnmatch ("*", "\351", FNM_PERIOD);
}

setlocale(6, "")                                 = "en_GB.UTF-8"
fnmatch("*", "\351", 4)                          = -1


Comment 5 Jakub Jelinek 2005-01-28 14:42:14 UTC
I disagree with that interpretation.
E.g. 2.13.1 says:
An asterisk is a pattern that shall match multiple characters, as described in
Patterns Matching Multiple Characters.

\351 is not a character in UTF-8, so it shall not be matched by fnmatch.

Comment 6 John Smith 2005-01-28 19:06:08 UTC
I have three comments:

1) I still think my interpretation was right. The sentence
ends with "as described in Patterns Matching Multiple Characters",
so what is written in that section has precedence over what you
quote. And in that section, SUSv3 says "the asterisk is a pattern
that shall match any string".

2) If you still disagree with my interpretation, then you must
consider that SUSv3 is inconsistent, and that should probably
be reported upstream.

3) Assuming you are right, all the shells that come with FC3
are broken (sh, ash, bsh, ksh, pdksh, tcsh, csh, zsh).
Indeed, with all of them, "echo \*" does display filenames which
are not sequences of characters according to the locale.
Should I file a bug report against each of them ?