Bug 174348 - file munges non-ASCII filenames in output
file munges non-ASCII filenames in output
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: file (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Radek Vokal
Mike McLean
:
Depends On:
Blocks: 187538
  Show dependency treegraph
 
Reported: 2005-11-28 06:43 EST by Bastien Nocera
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2006-0340
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-17 15:39:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
file-cannot-handle-utf-1.sh (435 bytes, text/plain)
2005-11-28 06:43 EST, Bastien Nocera
no flags Details
file-escape-mb-sequences.patch (1.36 KB, patch)
2005-11-29 11:05 EST, Bastien Nocera
no flags Details | Diff

  None (edit)
Description Bastien Nocera 2005-11-28 06:43:56 EST
4.10-2.EL4.3

Using the provided testcase:

$ ./file-cannot-handle-utf-1.sh
expected result:
symbolic link to '/tmp/¥µÅЅҵӕ严弥漥'

$ file /tmp/sln.test
/tmp/sln.test: symbolic link to
`/tmp/\302\245\302\265\303\205\320\205\322\265\323\225\344\270\245\345\274\245\346\274\245'

$ file -r /tmp/sln.test
/tmp/sln.test: symbolic link to `/tmp/¥µÅЅҵӕ严弥漥'

file munges what it considers "non-printable" characters in file_getbuffer() in
src/funcs.c.
Removing the for loop and returning ms->o.buf instead of ms->o.pbuf "fixes" the
problem.

file shouldn't use "isprint()" to check if a character is printable.
Comment 1 Bastien Nocera 2005-11-28 06:43:56 EST
Created attachment 121536 [details]
file-cannot-handle-utf-1.sh
Comment 3 Radek Vokal 2005-11-29 10:08:55 EST
Well, this solves the problem with UTF, but what about if the file had \n
embedded in it, or other terminal escape sequences? Also what if the string
did not come from a symlink, but from a %s magic? Is it really UTF then?
Comment 4 Bastien Nocera 2005-11-29 10:21:30 EST
What about using iswctype(), after converting each mb sequence to a wchar_t,
instead of using isprint()?
Comment 5 Bastien Nocera 2005-11-29 11:05:23 EST
Created attachment 121591 [details]
file-escape-mb-sequences.patch

Try to parse the output buffer as a multi-byte sequence. If it fails at any
point, fall back on the old ASCII based escape.
Comment 7 Radek Vokal 2005-11-30 02:34:38 EST
Slightly modified patch applied on rawhide, file-4.16-4
Comment 13 Red Hat Bugzilla 2006-05-17 15:39:21 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0340.html

Note You need to log in before you can comment on or make changes to this bug.