From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20061011 Fedora/1.5.0.7-7.fc6 Firefox/1.5.0.7 Description of problem: We use MailScanner to filter e-mail. It recently came to our attention that a Japanese professor's e-mails were being dropped because they were falsely identified as being MS-DOS executables. This behavior exists in at least FC6, FC5, and FC4 but does not in FC2. I am told that it may also misclassify Chinese as well. Version-Release number of selected component (if applicable): file-4.17-8 How reproducible: Always Steps to Reproduce: 1. run file command on attached file 2. 3. Actual Results: test.txt: DOS executable (COM) Expected Results: test.txt: UTF-8 Unicode text, with CRLF line terminators (results on FC2) Additional info: I will attach a sample file once the bug is open. I have marked the severity as High, as it causes loss of data for users of MailScanner who use typical settings. I suspect that it may cause other filters to drop data in similar circumstances.
Created attachment 140728 [details] the beginning of an e-mail in japanese Hopefully I cleared out any sensitive info in the e-mail. (I don't read Japanese)
File at first tries to match given file to patterns stored in magic dictionary. If it was not successfull file tries to examine given file and guess it type from various indices. After fc2 there was added pattern for "DOS executable (COM)". This pattern is one byte long and matches beginning of the file. Unfortunately, in your message the first character's code (鈴) is represented by the same byte and the pattern matches. Since we have to find matching patterns first, this is not bug. If you have to solve this problem - you can use file's mimetype recognition $ file -i /tmp/test.txt /tmp/test.txt: text/plain; charset=utf-8 - you can disable the pattern in pattern db by commenting out theese lines in /usr/share/file/magic 0 byte 0xe9 DOS executable (COM) >0x1FE leshort 0xAA55 \b, boot code >6 string SFX\ of\ LHarc (%s) and then by running rm magic.mgc;file -C in /usr/share/file (as root) - or by building your file with following patch If you need some help feel free to reopen this bug.
Created attachment 146673 [details] Patch to disable COM detection This patch disables DOS executable (COM) detection.