Bug 174348

Summary: file munges non-ASCII filenames in output
Product: Red Hat Enterprise Linux 4 Reporter: Bastien Nocera <bnocera>
Component: fileAssignee: Radek Vokál <rvokal>
Status: CLOSED ERRATA QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jlaska, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0340 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-17 19:39:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 187538    
Attachments:
Description Flags
file-cannot-handle-utf-1.sh
none
file-escape-mb-sequences.patch none

Description Bastien Nocera 2005-11-28 11:43:56 UTC
4.10-2.EL4.3

Using the provided testcase:

$ ./file-cannot-handle-utf-1.sh
expected result:
symbolic link to '/tmp/¥µÃÐÒµÓ严弥漥'

$ file /tmp/sln.test
/tmp/sln.test: symbolic link to
`/tmp/\302\245\302\265\303\205\320\205\322\265\323\225\344\270\245\345\274\245\346\274\245'

$ file -r /tmp/sln.test
/tmp/sln.test: symbolic link to `/tmp/¥µÃÐÒµÓ严弥漥'

file munges what it considers "non-printable" characters in file_getbuffer() in
src/funcs.c.
Removing the for loop and returning ms->o.buf instead of ms->o.pbuf "fixes" the
problem.

file shouldn't use "isprint()" to check if a character is printable.

Comment 1 Bastien Nocera 2005-11-28 11:43:56 UTC
Created attachment 121536 [details]
file-cannot-handle-utf-1.sh

Comment 3 Radek Vokál 2005-11-29 15:08:55 UTC
Well, this solves the problem with UTF, but what about if the file had \n
embedded in it, or other terminal escape sequences? Also what if the string
did not come from a symlink, but from a %s magic? Is it really UTF then?

Comment 4 Bastien Nocera 2005-11-29 15:21:30 UTC
What about using iswctype(), after converting each mb sequence to a wchar_t,
instead of using isprint()?

Comment 5 Bastien Nocera 2005-11-29 16:05:23 UTC
Created attachment 121591 [details]
file-escape-mb-sequences.patch

Try to parse the output buffer as a multi-byte sequence. If it fails at any
point, fall back on the old ASCII based escape.

Comment 7 Radek Vokál 2005-11-30 07:34:38 UTC
Slightly modified patch applied on rawhide, file-4.16-4

Comment 13 Red Hat Bugzilla 2006-05-17 19:39:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0340.html