4.10-2.EL4.3 Using the provided testcase: $ ./file-cannot-handle-utf-1.sh expected result: symbolic link to '/tmp/¥µÃÐÒµÓ严弥漥' $ file /tmp/sln.test /tmp/sln.test: symbolic link to `/tmp/\302\245\302\265\303\205\320\205\322\265\323\225\344\270\245\345\274\245\346\274\245' $ file -r /tmp/sln.test /tmp/sln.test: symbolic link to `/tmp/¥µÃÐÒµÓ严弥漥' file munges what it considers "non-printable" characters in file_getbuffer() in src/funcs.c. Removing the for loop and returning ms->o.buf instead of ms->o.pbuf "fixes" the problem. file shouldn't use "isprint()" to check if a character is printable.
Created attachment 121536 [details] file-cannot-handle-utf-1.sh
Well, this solves the problem with UTF, but what about if the file had \n embedded in it, or other terminal escape sequences? Also what if the string did not come from a symlink, but from a %s magic? Is it really UTF then?
What about using iswctype(), after converting each mb sequence to a wchar_t, instead of using isprint()?
Created attachment 121591 [details] file-escape-mb-sequences.patch Try to parse the output buffer as a multi-byte sequence. If it fails at any point, fall back on the old ASCII based escape.
Slightly modified patch applied on rawhide, file-4.16-4
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0340.html