Description of problem: File does not detect correctly several file types. Version-Release number of selected component (if applicable): file-4.26-3.fc10.i386 How reproducible: Always Steps to Reproduce: I have found two problems, with latex and with files encoded as "macintosh" (with iconv or from a mac) take any latex file LATEX.tex or a text file encoded as macintosh MAC.txt Actual results: $file LATEX.tex LATEX.tex: graphviz graph text $ file -i MAC.txt MAC.txt: text/plain charset=unknown Expected results: $file LATEX.tex LATEX.tex: LaTeX file $ file -i MAC.txt MAC.txt: text/plain charset=macintosh Additional info: gnome-libs-1.4.2-10.fc10.i386 mailcap-2.1.28-1.fc9.noarch file-libs-4.26-3.fc10.i386
hello Luis, can you attach those files as test cases? thanks...
I have been exploring a little, the problem with the latex files appears when the file uses some commands. For example, whenever you use the \bibliography{file.bib} command to add a .bib file to your document. But there are other cases too.
Created attachment 325817 [details] a latex file that is detected as a graphviz file
Created attachment 325818 [details] A latex file detected as graphviz, minimal example
Created attachment 325819 [details] A document that file does not detect its charset
about that macintosh problem: is this a regression? did it work before? as I can see, the attached text document has normal 0xA (\n) line ending and I wonder if there is a way you can guess it was created on Mac - a few non-ascii characters can tell you nothing IMHO...
the graphviz problem is caused by a regexp specified too widely: 0 regex/100 [\r\n\t\ ]*graph[\r\n\t\ ]*.*\\{ graphviz graph t ext !:mime text/vnd.graphviz 0 regex/100 [\r\n\t\ ]*digraph[\r\n\t\ ]*.*\\{ graphviz digraph text !:mime text/vnd.graphviz so any word which contains "graph" and has a "{" after itself will make the file seem like "graphviz graph text" ... I will correct the regexp
the latex/graphviz issue fixed in rawhide file-4.26-7.fc11 the macintosh thing is still questionable (NEEDINFO)
according to http://en.wikipedia.org/wiki/Mac_OS_Roman , the encoding is just one of 8bit encodings and cannot be algorithmically distinguishable from, say, ISO Latin1 (and there's no header or anything like this) ...so because of the latex vs. graphviz regression is fixed in rawhide, I will close this bug