Description of problem: Depending on the locale settings, "find -name \*" sometimes fails to find some files. How reproducible: Always. Steps to Reproduce: Type, in bash: mkdir /tmp/foo cd /tmp/foo touch $(echo -e \\351) LC_ALL=en_US.UTF-8 find -name \* | wc -l LC_ALL=en_US.iso88591 find -name \* | wc -l Actual results: /tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l 0 /tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l 1 Expected results: /tmp/foo $ LC_ALL=en_US.UTF-8 find -name \* | wc -l 1 /tmp/foo $ LC_ALL=en_US.iso88591 find -name \* | wc -l 1 Additional info: OK, this filename is not a valid UTF-8 string, but it is still a valid POSIX filename.
Apparently this is the correct behaviour specified by POSIX.
Sorry, I take back my previous comment. This is a bug according to POSIX: SUSv3 says (Shell & Utilities, 2.13.2): The asterisk ( '*' ) is a pattern that shall match any string, including the null string. And it also says (Base definitions, 3.367): String: A contiguous sequence of bytes terminated by and including the first null byte.
Forgot to reopen. Soooorry.
"find" just calls fnmatch(). #include <fnmatch.h> #include <locale.h> int main () { setlocale (LC_ALL, ""); return fnmatch ("*", "\351", FNM_PERIOD); } setlocale(6, "") = "en_GB.UTF-8" fnmatch("*", "\351", 4) = -1
I disagree with that interpretation. E.g. 2.13.1 says: An asterisk is a pattern that shall match multiple characters, as described in Patterns Matching Multiple Characters. \351 is not a character in UTF-8, so it shall not be matched by fnmatch.
I have three comments: 1) I still think my interpretation was right. The sentence ends with "as described in Patterns Matching Multiple Characters", so what is written in that section has precedence over what you quote. And in that section, SUSv3 says "the asterisk is a pattern that shall match any string". 2) If you still disagree with my interpretation, then you must consider that SUSv3 is inconsistent, and that should probably be reported upstream. 3) Assuming you are right, all the shells that come with FC3 are broken (sh, ash, bsh, ksh, pdksh, tcsh, csh, zsh). Indeed, with all of them, "echo \*" does display filenames which are not sequences of characters according to the locale. Should I file a bug report against each of them ?