Bug 474156 - file regression, mismatch some file types
Summary: file regression, mismatch some file types
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: file
Version: 10
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Daniel Novotny
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-02 15:22 UTC by Luis
Modified: 2009-01-14 15:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-14 15:06:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
a latex file that is detected as a graphviz file (5.25 KB, text/plain)
2008-12-05 09:31 UTC, Luis
no flags Details
A latex file detected as graphviz, minimal example (334 bytes, application/x-tex)
2008-12-05 09:32 UTC, Luis
no flags Details
A document that file does not detect its charset (74 bytes, text/plain charset=macintosh)
2008-12-05 09:35 UTC, Luis
no flags Details

Description Luis 2008-12-02 15:22:22 UTC
Description of problem:

File does not detect correctly several file types.

Version-Release number of selected component (if applicable):

file-4.26-3.fc10.i386

How reproducible:

Always

Steps to Reproduce:

I have found two problems, with latex and with files encoded as "macintosh" (with iconv or from a mac)


take any latex file LATEX.tex or a text file encoded as macintosh MAC.txt

  
Actual results:

$file LATEX.tex
LATEX.tex: graphviz graph text

$ file -i MAC.txt
MAC.txt: text/plain charset=unknown

Expected results:

$file LATEX.tex
LATEX.tex: LaTeX file

$ file -i MAC.txt
MAC.txt: text/plain charset=macintosh


Additional info:

gnome-libs-1.4.2-10.fc10.i386
mailcap-2.1.28-1.fc9.noarch
file-libs-4.26-3.fc10.i386

Comment 1 Daniel Novotny 2008-12-04 10:14:52 UTC
hello Luis,
can you attach those files as test cases? thanks...

Comment 2 Luis 2008-12-05 09:29:48 UTC
I have been exploring a little, the problem with the latex files appears when the file uses some commands. For example, whenever you use the \bibliography{file.bib} command to add a .bib file to your document.

But there are other cases too.

Comment 3 Luis 2008-12-05 09:31:10 UTC
Created attachment 325817 [details]
a latex file that is detected as a graphviz file

Comment 4 Luis 2008-12-05 09:32:25 UTC
Created attachment 325818 [details]
A latex file detected as graphviz, minimal example

Comment 5 Luis 2008-12-05 09:35:58 UTC
Created attachment 325819 [details]
A document that file does not detect its charset

Comment 6 Daniel Novotny 2008-12-05 13:50:30 UTC
about that macintosh problem: is this a regression? did it work before? as I can see, the attached text document has normal 0xA (\n) line ending and I wonder if there is a way you can guess it was created on Mac - a few non-ascii characters can tell you nothing IMHO...

Comment 7 Daniel Novotny 2008-12-05 13:55:30 UTC
the graphviz problem is caused by a regexp specified too widely:

0       regex/100       [\r\n\t\ ]*graph[\r\n\t\ ]*.*\\{        graphviz graph t
ext
!:mime  text/vnd.graphviz
0       regex/100       [\r\n\t\ ]*digraph[\r\n\t\ ]*.*\\{      graphviz digraph
 text
!:mime  text/vnd.graphviz

so any word which contains "graph" and has a "{" after itself will make the file seem like "graphviz graph text" ... I will correct the regexp

Comment 8 Daniel Novotny 2008-12-15 16:29:16 UTC
the latex/graphviz issue fixed in rawhide
file-4.26-7.fc11
the macintosh thing is still questionable (NEEDINFO)

Comment 9 Daniel Novotny 2009-01-14 15:06:21 UTC
according to http://en.wikipedia.org/wiki/Mac_OS_Roman , the encoding is just one of 8bit encodings and cannot be algorithmically distinguishable from, say, ISO Latin1 (and there's no header or anything like this)

...so because of the latex vs. graphviz regression is fixed in rawhide, I will close this bug


Note You need to log in before you can comment on or make changes to this bug.