Bug 675691 - recognition of TeX and LaTeX files needs an improvement
Summary: recognition of TeX and LaTeX files needs an improvement
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: file
Version: 6.1
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Jan Kaluža
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks: 676543 826898
TreeView+ depends on / blocked
 
Reported: 2011-02-07 11:11 UTC by Milos Malik
Modified: 2012-05-31 08:35 UTC (History)
3 users (show)

Fixed In Version: file-5.04-8.el6
Doc Type: Bug Fix
Doc Text:
Prior to this update, file patterns for LaTeX checked only the first 400 bytes of a file to determine the pattern type. This caused an incorrect pattern type recognition as some files could have contained a larger number of comments at the beginning of the file. Furthermore, file patterns which matched a Python script were tried before the LaTex patterns and this undesired behavior could have caused an incorrect pattern type recognition as LaTex files could have included a source code written in Python. With this update, the aforementioned problems have been fixed by increasing the number of first bytes checked for a LaTex file to 4096 bytes, and by trying the LaTex patterns before the Python patterns.
Clone Of:
: 676543 826898 (view as bug list)
Environment:
Last Closed: 2011-07-13 08:48:58 UTC
Target Upstream Version:


Attachments (Terms of Use)
proposed patch (2.41 KB, patch)
2011-02-08 09:19 UTC, Jan Kaluža
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0934 0 normal SHIPPED_LIVE file bug fix update 2011-09-07 13:38:51 UTC

Description Milos Malik 2011-02-07 11:11:10 UTC
Description of problem:


Version-Release number of selected component (if applicable):
file-5.04-6.el6

How reproducible:
always

Steps to Reproduce:
# rpm -ql PyXML | grep -F .tex | xargs file
/usr/share/doc/PyXML-0.8.4/xml-howto.tex: Python script text executable
/usr/share/doc/PyXML-0.8.4/xml-ref.tex:   Python script text executable
# rpm -ql emacs-common | grep -F .tex | xargs file
/usr/share/emacs/23.1/etc/refcards/calccard.tex:      ASCII English text
/usr/share/emacs/23.1/etc/refcards/cs-dired-ref.tex:  ISO-8859 English text
/usr/share/emacs/23.1/etc/refcards/cs-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/cs-survival.tex:   LaTeX document text
/usr/share/emacs/23.1/etc/refcards/de-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/dired-ref.tex:     ASCII English text
/usr/share/emacs/23.1/etc/refcards/fr-dired-ref.tex:  ISO-8859 English text
/usr/share/emacs/23.1/etc/refcards/fr-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/fr-survival.tex:   LaTeX document text
/usr/share/emacs/23.1/etc/refcards/gnus-refcard.tex:  LaTeX 2e document text
/usr/share/emacs/23.1/etc/refcards/orgcard.tex:       ASCII English text
/usr/share/emacs/23.1/etc/refcards/pl-refcard.tex:    ASCII English text
/usr/share/emacs/23.1/etc/refcards/pt-br-refcard.tex: ASCII English text
/usr/share/emacs/23.1/etc/refcards/refcard.tex:       ASCII English text
/usr/share/emacs/23.1/etc/refcards/ru-refcard.tex:    LaTeX 2e document text
/usr/share/emacs/23.1/etc/refcards/sk-dired-ref.tex:  ISO-8859 English text
/usr/share/emacs/23.1/etc/refcards/sk-refcard.tex:    TeX document text
/usr/share/emacs/23.1/etc/refcards/sk-survival.tex:   LaTeX document text
/usr/share/emacs/23.1/etc/refcards/survival.tex:      LaTeX document text
/usr/share/emacs/23.1/etc/refcards/vipcard.tex:       TeX document text
/usr/share/emacs/23.1/etc/refcards/viperCard.tex:     TeX document text
# 

Actual results:


Expected results:

Comment 1 Jan Kaluža 2011-02-07 14:05:15 UTC
There are 2 problems:
1. File patterns for LaTeX check only first 400 bytes of file to determine its type and some of those files have long comment in the beginning,  I will increase that limit.

2. Patterns which match Python script are tried before Latex which is bad in case there's python code in latex.

I'll send patch to upstream mailing list and then attach it here.

Comment 2 Jan Kaluža 2011-02-08 09:19:45 UTC
Created attachment 477581 [details]
proposed patch

Check 4096 bytes from TeX file instead of 400 bytes. Increate strength of TeX patterns to beat Python, because it's more usual to have Python in LaTeX than LaTeX in Python.

Comment 7 Petr Kovar 2011-06-24 11:42:26 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, file patterns for LaTeX checked only the first 400 bytes of a file to determine the pattern type. This caused an incorrect pattern type recognition as some files could have contained a larger number of comments at the beginning of the file. Furthermore, file patterns which matched a Python script were tried before the LaTex patterns and this undesired behavior could have caused an incorrect pattern type recognition as LaTex files could have included a source code written in Python. With this update, the aforementioned problems have been fixed by increasing the number of first bytes checked for a LaTex file to 4096 bytes, and by trying the LaTex patterns before the Python patterns.

Comment 10 errata-xmlrpc 2011-07-13 08:48:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0934.html


Note You need to log in before you can comment on or make changes to this bug.