Red Hat Bugzilla – Bug 849621
file is coming back with 'LaTeX document text' instead of 'XML document text'
Last modified: 2014-10-14 04:29:06 EDT
Created attachment 605678 [details]
I've noticed that file is, possibly, coming back with the wrong mime type - I was expecting 'XML document text' but instead get 'LaTeX document text'.
Ubuntu - various releases - comes back with 'XML document text', why isn't 6.3?
I've attached an example that produces the 'wrong' response from file.
I think this is not valid XML:
<?version xml="1.0" encoding="UTF-8"?>
Proper DTD is:
<?xml version="1.0" encoding="UTF-8"?>
Although even with the proper header it doesn't work correctly, because it misdetecs "\chapter" as LaTeX command. Attached patch against file-5.11 fixes that.
Created attachment 605696 [details]
Hi Jan - thank you for getting back to me so promptly.
I went into /usr/share/misc/magic, manuall patched, and rebuilt a .mgc file via: file -C -m
Before I did this I ran file against the files that are producing this LaTex problem - I got 21 misdetecs.
After I manually applied the patch I got 1 misdetec.
I think the proposed patch needs a bit more work.
- apologies for typo in the first attachement on the metadataInfo element.
- my OS version is 2.6.32-279.5.1.el6.x86_64 if that helps.
Can you attach the file for which it's still broken?
Created attachment 605721 [details]
Hi Jan - I can't provide you with any original file; however by using vi I've managed to strip away our propietary information yet still keep what's causing 'file' to come out with the 'wrong' answer.
Put another way, when I use 'file' on the attached a.xml I get 'LaTeX' whether I use the defaul magic database or the one I patched (by following your instructions).
If it helps I can, relatively easily, regression test another patch of yours.
In the last attachment there's still "<?version xml="1.0" encoding="UTF-8"?>". This is not valid XML. If I change it to "<?xml version="1.0" encoding="UTF-8"?>", patched File is able to detect that file.
Hi Jan - after doing some more research I agree with you. The a.xml file is not valid when compared against http://www.w3.org/TR/REC-xml/#sec-prolog-dtd
Hence the patch, at least from my perspective, works.
Thank you for your help.
Any estimate as to when the patch will be released?
Not sure how this bug is related to latex2html. Reassigning back to file(1).
Hi - for the sake of history, just want to record the fact that we've got a similar (magic database) related defect at: https://bugzilla.redhat.com/show_bug.cgi?id=873997
There was a case for this - https://access.redhat.com/support/cases/00742615 - but at the moment we haven't upgraded our support package away from 'self support'.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.