Bug 214718 - file reports that a file is MS-DOS executable (COM) file is actually a japanese email
file reports that a file is MS-DOS executable (COM) file is actually a japane...
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: file (Show other bugs)
6
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Martin Bacovsky
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-08 18:28 EST by Ethan Sommer
Modified: 2007-11-30 17:11 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-26 06:11:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
the beginning of an e-mail in japanese (337 bytes, text/plain)
2006-11-08 18:30 EST, Ethan Sommer
no flags Details
Patch to disable COM detection (681 bytes, patch)
2007-01-26 06:09 EST, Martin Bacovsky
no flags Details | Diff

  None (edit)
Description Ethan Sommer 2006-11-08 18:28:44 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20061011 Fedora/1.5.0.7-7.fc6 Firefox/1.5.0.7

Description of problem:
We use MailScanner to filter e-mail. It recently came to our attention that a Japanese professor's e-mails were being dropped because they were falsely identified as being MS-DOS executables. 

This behavior exists in at least FC6, FC5, and FC4 but does not in FC2. 

I am told that it may also misclassify Chinese as well.

Version-Release number of selected component (if applicable):
file-4.17-8

How reproducible:
Always


Steps to Reproduce:
1. run file command on attached file
2.
3.

Actual Results:
test.txt: DOS executable (COM)

Expected Results:
test.txt: UTF-8 Unicode text, with CRLF line terminators
(results on FC2)

Additional info:
I will attach a sample file once the bug is open.

I have marked the severity as High, as it causes loss of data for users of MailScanner who use typical settings. I suspect that it may cause other filters to drop data in similar circumstances.
Comment 1 Ethan Sommer 2006-11-08 18:30:13 EST
Created attachment 140728 [details]
the beginning of an e-mail in japanese

Hopefully I cleared out any sensitive info in the e-mail. (I don't read
Japanese)
Comment 2 Martin Bacovsky 2007-01-26 06:07:04 EST
File at first tries to match given file to patterns stored in magic dictionary.
If it was not successfull file tries to examine given file and guess it type
from various indices.

After fc2 there was added pattern for "DOS executable (COM)". This pattern is
one byte long and matches beginning of the file. Unfortunately, in your message
the first character's code (鈴) is represented by the same byte and the pattern
matches. 

Since we have to find matching patterns first, this is not bug.

If you have to solve this problem
- you can use file's mimetype recognition
 $ file -i /tmp/test.txt
 /tmp/test.txt: text/plain; charset=utf-8
- you can disable the pattern in pattern db by commenting out theese lines in
/usr/share/file/magic
 0	byte		0xe9		DOS executable (COM)
 >0x1FE	leshort		0xAA55		\b, boot code
 >6	string		SFX\ of\ LHarc	(%s)
and then by running rm magic.mgc;file -C in /usr/share/file (as root)
- or by building your file with following patch

If you need some help feel free to reopen this bug.
Comment 3 Martin Bacovsky 2007-01-26 06:09:32 EST
Created attachment 146673 [details]
Patch to disable COM detection

This patch disables DOS executable (COM) detection.

Note You need to log in before you can comment on or make changes to this bug.