174348 – file munges non-ASCII filenames in output

Bug 174348 - file munges non-ASCII filenames in output

Summary: file munges non-ASCII filenames in output

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	file
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Radek Vokál
QA Contact:	Mike McLean
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	187538
TreeView+	depends on / blocked

Reported:	2005-11-28 11:43 UTC by Bastien Nocera
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2006-0340
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-05-17 19:39:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
file-cannot-handle-utf-1.sh (435 bytes, text/plain) 2005-11-28 11:43 UTC, Bastien Nocera	no flags	Details
file-escape-mb-sequences.patch (1.36 KB, patch) 2005-11-29 16:05 UTC, Bastien Nocera	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0340	0	normal	SHIPPED_LIVE	file bug fix update	2006-05-17 04:00:00 UTC

Description Bastien Nocera 2005-11-28 11:43:56 UTC

4.10-2.EL4.3

Using the provided testcase:

$ ./file-cannot-handle-utf-1.sh
expected result:
symbolic link to '/tmp/Â¥ÂµÃÐÒµÓä¸¥å¼¥æ¼¥'

$ file /tmp/sln.test
/tmp/sln.test: symbolic link to
`/tmp/\302\245\302\265\303\205\320\205\322\265\323\225\344\270\245\345\274\245\346\274\245'

$ file -r /tmp/sln.test
/tmp/sln.test: symbolic link to `/tmp/Â¥ÂµÃÐÒµÓä¸¥å¼¥æ¼¥'

file munges what it considers "non-printable" characters in file_getbuffer() in
src/funcs.c.
Removing the for loop and returning ms->o.buf instead of ms->o.pbuf "fixes" the
problem.

file shouldn't use "isprint()" to check if a character is printable.

Comment 1 Bastien Nocera 2005-11-28 11:43:56 UTC

Created attachment 121536 [details]
file-cannot-handle-utf-1.sh

Comment 3 Radek Vokál 2005-11-29 15:08:55 UTC

Well, this solves the problem with UTF, but what about if the file had \n
embedded in it, or other terminal escape sequences? Also what if the string
did not come from a symlink, but from a %s magic? Is it really UTF then?

Comment 4 Bastien Nocera 2005-11-29 15:21:30 UTC

What about using iswctype(), after converting each mb sequence to a wchar_t,
instead of using isprint()?

Comment 5 Bastien Nocera 2005-11-29 16:05:23 UTC

Created attachment 121591 [details]
file-escape-mb-sequences.patch

Try to parse the output buffer as a multi-byte sequence. If it fails at any
point, fall back on the old ASCII based escape.

Comment 7 Radek Vokál 2005-11-30 07:34:38 UTC

Slightly modified patch applied on rawhide, file-4.16-4

Comment 13 Red Hat Bugzilla 2006-05-17 19:39:21 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0340.html

Note You need to log in before you can comment on or make changes to this bug.