Bug 675691

Summary: recognition of TeX and LaTeX files needs an improvement
Product: Red Hat Enterprise Linux 6 Reporter: Milos Malik <mmalik>
Component: fileAssignee: Jan Kaluža <jkaluza>
Status: CLOSED ERRATA QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: low Docs Contact:
Priority: low    
Version: 6.1CC: ksrot, ovasik, pkovar
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: file-5.04-8.el6 Doc Type: Bug Fix
Doc Text:
Prior to this update, file patterns for LaTeX checked only the first 400 bytes of a file to determine the pattern type. This caused an incorrect pattern type recognition as some files could have contained a larger number of comments at the beginning of the file. Furthermore, file patterns which matched a Python script were tried before the LaTex patterns and this undesired behavior could have caused an incorrect pattern type recognition as LaTex files could have included a source code written in Python. With this update, the aforementioned problems have been fixed by increasing the number of first bytes checked for a LaTex file to 4096 bytes, and by trying the LaTex patterns before the Python patterns.
Story Points: ---
Clone Of:
: 676543 826898 (view as bug list) Environment:
Last Closed: 2011-07-13 08:48:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 676543, 826898    
Attachments:
Description Flags
proposed patch none

Description Milos Malik 2011-02-07 11:11:10 UTC
Description of problem:


Version-Release number of selected component (if applicable):
file-5.04-6.el6

How reproducible:
always

Steps to Reproduce:
# rpm -ql PyXML | grep -F .tex | xargs file
/usr/share/doc/PyXML-0.8.4/xml-howto.tex: Python script text executable
/usr/share/doc/PyXML-0.8.4/xml-ref.tex:   Python script text executable
# rpm -ql emacs-common | grep -F .tex | xargs file
/usr/share/emacs/23.1/etc/refcards/calccard.tex:      ASCII English text
/usr/share/emacs/23.1/etc/refcards/cs-dired-ref.tex:  ISO-8859 English text
/usr/share/emacs/23.1/etc/refcards/cs-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/cs-survival.tex:   LaTeX document text
/usr/share/emacs/23.1/etc/refcards/de-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/dired-ref.tex:     ASCII English text
/usr/share/emacs/23.1/etc/refcards/fr-dired-ref.tex:  ISO-8859 English text
/usr/share/emacs/23.1/etc/refcards/fr-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/fr-survival.tex:   LaTeX document text
/usr/share/emacs/23.1/etc/refcards/gnus-refcard.tex:  LaTeX 2e document text
/usr/share/emacs/23.1/etc/refcards/orgcard.tex:       ASCII English text
/usr/share/emacs/23.1/etc/refcards/pl-refcard.tex:    ASCII English text
/usr/share/emacs/23.1/etc/refcards/pt-br-refcard.tex: ASCII English text
/usr/share/emacs/23.1/etc/refcards/refcard.tex:       ASCII English text
/usr/share/emacs/23.1/etc/refcards/ru-refcard.tex:    LaTeX 2e document text
/usr/share/emacs/23.1/etc/refcards/sk-dired-ref.tex:  ISO-8859 English text
/usr/share/emacs/23.1/etc/refcards/sk-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/sk-survival.tex:   LaTeX document text
/usr/share/emacs/23.1/etc/refcards/survival.tex:      LaTeX document text
/usr/share/emacs/23.1/etc/refcards/vipcard.tex:       TeX document text
/usr/share/emacs/23.1/etc/refcards/viperCard.tex:     TeX document text
# 

Actual results:


Expected results:

Comment 1 Jan Kaluža 2011-02-07 14:05:15 UTC
There are 2 problems:
1. File patterns for LaTeX check only first 400 bytes of file to determine its type and some of those files have long comment in the beginning,  I will increase that limit.

2. Patterns which match Python script are tried before Latex which is bad in case there's python code in latex.

I'll send patch to upstream mailing list and then attach it here.

Comment 2 Jan Kaluža 2011-02-08 09:19:45 UTC
Created attachment 477581 [details]
proposed patch

Check 4096 bytes from TeX file instead of 400 bytes. Increate strength of TeX patterns to beat Python, because it's more usual to have Python in LaTeX than LaTeX in Python.

Comment 7 Petr Kovar 2011-06-24 11:42:26 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, file patterns for LaTeX checked only the first 400 bytes of a file to determine the pattern type. This caused an incorrect pattern type recognition as some files could have contained a larger number of comments at the beginning of the file. Furthermore, file patterns which matched a Python script were tried before the LaTex patterns and this undesired behavior could have caused an incorrect pattern type recognition as LaTex files could have included a source code written in Python. With this update, the aforementioned problems have been fixed by increasing the number of first bytes checked for a LaTex file to 4096 bytes, and by trying the LaTex patterns before the Python patterns.

Comment 10 errata-xmlrpc 2011-07-13 08:48:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0934.html