Bug 849621

Summary: file is coming back with 'LaTeX document text' instead of 'XML document text'
Product: Red Hat Enterprise Linux 6 Reporter: james.hn.sears
Component: fileAssignee: Jan Kaluža <jkaluza>
Status: CLOSED ERRATA QA Contact: Stanislav Zidek <szidek>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: ksrot, szidek
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: file-5.04-16.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849641 (view as bug list) Environment:
Last Closed: 2014-10-14 08:29:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 849641    
Attachments:
Description Flags
'LaTeX' example
none
proposed patch
none
a.xml none

Description james.hn.sears 2012-08-20 11:27:17 UTC
Created attachment 605678 [details]
'LaTeX' example

I've noticed that file is, possibly, coming back with the wrong mime type - I was expecting 'XML document text' but instead get 'LaTeX document text'.

Ubuntu - various releases - comes back with 'XML document text', why isn't 6.3?

I've attached an example that produces the 'wrong' response from file.

Comment 2 Jan Kaluža 2012-08-20 12:32:30 UTC
I think this is not valid XML:

<?version xml="1.0" encoding="UTF-8"?>

Proper DTD is:

<?xml version="1.0" encoding="UTF-8"?>

Although even with the proper header it doesn't work correctly, because it misdetecs "\chapter" as LaTeX command. Attached patch against file-5.11 fixes that.

Comment 3 Jan Kaluža 2012-08-20 12:33:01 UTC
Created attachment 605696 [details]
proposed patch

Comment 4 james.hn.sears 2012-08-20 14:07:52 UTC
Hi Jan - thank you for getting back to me so promptly.

I went into /usr/share/misc/magic, manuall patched, and rebuilt a .mgc file via: file -C -m 

Before I did this I ran file against the files that are producing this LaTex problem - I got 21 misdetecs.

After I manually applied the patch I got 1 misdetec.

I think the proposed patch needs a bit more work.

PS 
- apologies for typo in the first attachement on the metadataInfo element.
- my OS version is 2.6.32-279.5.1.el6.x86_64 if that helps.

Comment 5 Jan Kaluža 2012-08-20 14:20:05 UTC
Can you attach the file for which it's still broken?

Comment 6 james.hn.sears 2012-08-20 15:22:18 UTC
Created attachment 605721 [details]
a.xml

Hi Jan - I can't provide you with any original file; however by using vi I've managed to strip away our propietary information yet still keep what's causing 'file' to come out with the 'wrong' answer.

Put another way, when I use 'file' on the attached a.xml I get 'LaTeX' whether I use the defaul magic database or the one I patched (by following your instructions).

If it helps I can, relatively easily, regression test another patch of yours.

Comment 7 Jan Kaluža 2012-08-21 05:47:33 UTC
In the last attachment there's still "<?version xml="1.0" encoding="UTF-8"?>". This is not valid XML. If I change it to "<?xml version="1.0" encoding="UTF-8"?>", patched File is able to detect that file.

Comment 8 james.hn.sears 2012-08-21 08:34:57 UTC
Hi Jan - after doing some more research I agree with you. The a.xml file is not valid when compared against http://www.w3.org/TR/REC-xml/#sec-prolog-dtd

Hence the patch, at least from my perspective, works.

Thank you for your help.

Any estimate as to when the patch will be released?

Comment 11 Jindrich Novy 2012-09-08 10:21:09 UTC
Not sure how this bug is related to latex2html. Reassigning back to file(1).

Comment 13 james.hn.sears 2013-01-14 15:16:19 UTC
Hi - for the sake of history, just want to record the fact that we've got a similar (magic database) related defect at: https://bugzilla.redhat.com/show_bug.cgi?id=873997

There was a case for this - https://access.redhat.com/support/cases/00742615 - but at the moment we haven't upgraded our support package away from 'self support'.

Comment 23 errata-xmlrpc 2014-10-14 08:29:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-1606.html