Bug 174348 - file munges non-ASCII filenames in output
Summary: file munges non-ASCII filenames in output
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: file
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Radek Vokál
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks: 187538
TreeView+ depends on / blocked
 
Reported: 2005-11-28 11:43 UTC by Bastien Nocera
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHBA-2006-0340
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-05-17 19:39:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
file-cannot-handle-utf-1.sh (435 bytes, text/plain)
2005-11-28 11:43 UTC, Bastien Nocera
no flags Details
file-escape-mb-sequences.patch (1.36 KB, patch)
2005-11-29 16:05 UTC, Bastien Nocera
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0340 0 normal SHIPPED_LIVE file bug fix update 2006-05-17 04:00:00 UTC

Description Bastien Nocera 2005-11-28 11:43:56 UTC
4.10-2.EL4.3

Using the provided testcase:

$ ./file-cannot-handle-utf-1.sh
expected result:
symbolic link to '/tmp/¥µÃÐÒµÓ严弥漥'

$ file /tmp/sln.test
/tmp/sln.test: symbolic link to
`/tmp/\302\245\302\265\303\205\320\205\322\265\323\225\344\270\245\345\274\245\346\274\245'

$ file -r /tmp/sln.test
/tmp/sln.test: symbolic link to `/tmp/¥µÃÐÒµÓ严弥漥'

file munges what it considers "non-printable" characters in file_getbuffer() in
src/funcs.c.
Removing the for loop and returning ms->o.buf instead of ms->o.pbuf "fixes" the
problem.

file shouldn't use "isprint()" to check if a character is printable.

Comment 1 Bastien Nocera 2005-11-28 11:43:56 UTC
Created attachment 121536 [details]
file-cannot-handle-utf-1.sh

Comment 3 Radek Vokál 2005-11-29 15:08:55 UTC
Well, this solves the problem with UTF, but what about if the file had \n
embedded in it, or other terminal escape sequences? Also what if the string
did not come from a symlink, but from a %s magic? Is it really UTF then?

Comment 4 Bastien Nocera 2005-11-29 15:21:30 UTC
What about using iswctype(), after converting each mb sequence to a wchar_t,
instead of using isprint()?

Comment 5 Bastien Nocera 2005-11-29 16:05:23 UTC
Created attachment 121591 [details]
file-escape-mb-sequences.patch

Try to parse the output buffer as a multi-byte sequence. If it fails at any
point, fall back on the old ASCII based escape.

Comment 7 Radek Vokál 2005-11-30 07:34:38 UTC
Slightly modified patch applied on rawhide, file-4.16-4

Comment 13 Red Hat Bugzilla 2006-05-17 19:39:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0340.html



Note You need to log in before you can comment on or make changes to this bug.