Bug 474156 - file regression, mismatch some file types
file regression, mismatch some file types
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: file (Show other bugs)
10
All Linux
low Severity medium
: ---
: ---
Assigned To: Daniel Novotny
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-02 10:22 EST by Luis
Modified: 2009-01-14 10:06 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-14 10:06:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
a latex file that is detected as a graphviz file (5.25 KB, text/plain)
2008-12-05 04:31 EST, Luis
no flags Details
A latex file detected as graphviz, minimal example (334 bytes, application/x-tex)
2008-12-05 04:32 EST, Luis
no flags Details
A document that file does not detect its charset (74 bytes, text/plain charset=macintosh)
2008-12-05 04:35 EST, Luis
no flags Details

  None (edit)
Description Luis 2008-12-02 10:22:22 EST
Description of problem:

File does not detect correctly several file types.

Version-Release number of selected component (if applicable):

file-4.26-3.fc10.i386

How reproducible:

Always

Steps to Reproduce:

I have found two problems, with latex and with files encoded as "macintosh" (with iconv or from a mac)


take any latex file LATEX.tex or a text file encoded as macintosh MAC.txt

  
Actual results:

$file LATEX.tex
LATEX.tex: graphviz graph text

$ file -i MAC.txt
MAC.txt: text/plain charset=unknown

Expected results:

$file LATEX.tex
LATEX.tex: LaTeX file

$ file -i MAC.txt
MAC.txt: text/plain charset=macintosh


Additional info:

gnome-libs-1.4.2-10.fc10.i386
mailcap-2.1.28-1.fc9.noarch
file-libs-4.26-3.fc10.i386
Comment 1 Daniel Novotny 2008-12-04 05:14:52 EST
hello Luis,
can you attach those files as test cases? thanks...
Comment 2 Luis 2008-12-05 04:29:48 EST
I have been exploring a little, the problem with the latex files appears when the file uses some commands. For example, whenever you use the \bibliography{file.bib} command to add a .bib file to your document.

But there are other cases too.
Comment 3 Luis 2008-12-05 04:31:10 EST
Created attachment 325817 [details]
a latex file that is detected as a graphviz file
Comment 4 Luis 2008-12-05 04:32:25 EST
Created attachment 325818 [details]
A latex file detected as graphviz, minimal example
Comment 5 Luis 2008-12-05 04:35:58 EST
Created attachment 325819 [details]
A document that file does not detect its charset
Comment 6 Daniel Novotny 2008-12-05 08:50:30 EST
about that macintosh problem: is this a regression? did it work before? as I can see, the attached text document has normal 0xA (\n) line ending and I wonder if there is a way you can guess it was created on Mac - a few non-ascii characters can tell you nothing IMHO...
Comment 7 Daniel Novotny 2008-12-05 08:55:30 EST
the graphviz problem is caused by a regexp specified too widely:

0       regex/100       [\r\n\t\ ]*graph[\r\n\t\ ]*.*\\{        graphviz graph t
ext
!:mime  text/vnd.graphviz
0       regex/100       [\r\n\t\ ]*digraph[\r\n\t\ ]*.*\\{      graphviz digraph
 text
!:mime  text/vnd.graphviz

so any word which contains "graph" and has a "{" after itself will make the file seem like "graphviz graph text" ... I will correct the regexp
Comment 8 Daniel Novotny 2008-12-15 11:29:16 EST
the latex/graphviz issue fixed in rawhide
file-4.26-7.fc11
the macintosh thing is still questionable (NEEDINFO)
Comment 9 Daniel Novotny 2009-01-14 10:06:21 EST
according to http://en.wikipedia.org/wiki/Mac_OS_Roman , the encoding is just one of 8bit encodings and cannot be algorithmically distinguishable from, say, ISO Latin1 (and there's no header or anything like this)

...so because of the latex vs. graphviz regression is fixed in rawhide, I will close this bug

Note You need to log in before you can comment on or make changes to this bug.