Bug 733229

Summary: specific python script looks like PASCAL program
Product: Red Hat Enterprise Linux 6 Reporter: Petr Sklenar <psklenar>
Component: fileAssignee: Jan Kaluža <jkaluza>
Status: CLOSED ERRATA QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: azelinka, dapospis, ksrot, mfojtik, ovasik
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: file-5.04-12.el6 Doc Type: Bug Fix
Doc Text:
Previously, "magic" patterns for Python were insufficient. The file utility was therefore unable to detect a Python script according to the Python function definition. With this update, detection of Python is improved, and Python scripts are properly recognized.
Story Points: ---
Clone Of:
: 826900 830808 (view as bug list) Environment:
Last Closed: 2012-03-15 08:23:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 826900    
Attachments:
Description Flags
python script which looks like Pascal
none
proposed patch
none
sample files none

Description Petr Sklenar 2011-08-25 08:18:09 UTC
Created attachment 519784 [details]
python script which looks like Pascal

Description of problem:
python script looks like PASCAL program

Version-Release number of selected component (if applicable):
file-5.04-9.el6.s390x

How reproducible:
deterministic

Steps to Reproduce:
RHEL5:
# file gtk_label_autowrap.py 
gtk_label_autowrap.py: ASCII English text
# rpm -q file
file-4.17-15.el5_3.1.x86_64

RHEL6:
# file gtk_label_autowrap.py
gtk_label_autowrap.py: ASCII Pascal program text
# rpm -q file
file-5.04-9.el6.s390x


Actual results:
ASCII Pascal program

Expected results:
python script text executable


Additional info:
# rpm -qf /usr/share/system-config-printer/gtk_label_autowrap.py
system-config-printer-1.1.16-22.el6.s390x

Comment 1 RHEL Program Management 2011-08-25 08:28:03 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 2 Jan Kaluža 2011-11-01 13:40:27 UTC
Created attachment 531134 [details]
proposed patch

Comment 6 Jan Kaluža 2012-02-09 07:53:33 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: File magic patterns for Python were insufficient.

Consequence: File was not able to detect Python script according to the Python function definition.

Fix: New magic pattern has been added to detect Python script according to the Python function definition.

Result: Python detection improved.

Comment 8 Michal Fojtik 2012-02-13 12:54:27 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: File magic patterns for Python were insufficient.
+Previously, "magic" patterns for Python were insufficient. The file utility was therefore unable to detect a Python script according to the Python function definition. With this update, detection of Python is improved, and Python scripts are properly recognized.-
-Consequence: File was not able to detect Python script according to the Python function definition.
-
-Fix: New magic pattern has been added to detect Python script according to the Python function definition.
-
-Result: Python detection improved.

Comment 10 Karel Srot 2012-03-12 15:12:16 UTC
Created attachment 569432 [details]
sample files

I am afraid that the fix has introduced some regressions in text file detection.
Although not all files were identified properly with old file pkg, new file version does worse job with these files.

[ksrot@dhcp-30-102 samples]$ rpm -q file
file-5.04-13.el6.x86_64
[ksrot@dhcp-30-102 samples]$ file *
API_CHANGES.txt:   Python script text executable
artist-tmpl.html:  Python script text executable
capi.txt:          Python script text executable
dstat-paper.txt:   Python script text executable
extend.txt:        Python script text executable
FAQ.txt:           Python script text executable
index.txt:         Python script text executable
INTERACTIVE:       Python script text executable
lxml-ep2008.txt:   Python script text executable
lxmlhtml.txt:      Python script text executable
PKG-INFO:          Python script text executable
pkg_resources.txt: Python script text executable
programmers-guide: Python script text executable
README.txt:        Python script text executable
syntax.html:       Python script text executable
tutorial.txt:      Python script text executable


[ksrot@dhcp-30-102 samples]$ rpm -q file
file-5.04-11.el6.x86_64
[ksrot@dhcp-30-102 samples]$ file *
API_CHANGES.txt:   ASCII English text
artist-tmpl.html:  ASCII Java program text
capi.txt:          FORTRAN program
dstat-paper.txt:   ASCII English text
extend.txt:        ASCII English text
FAQ.txt:           ISO-8859 English text
index.txt:         ASCII English text
INTERACTIVE:       ASCII English text
lxml-ep2008.txt:   UTF-8 Unicode English text
lxmlhtml.txt:      ASCII English text
PKG-INFO:          ASCII C++ program text
pkg_resources.txt: ASCII English text
programmers-guide: ASCII English text
README.txt:        ASCII English text
syntax.html:       ASCII English text, with very long lines
tutorial.txt:      ASCII English text

Comment 11 Karel Srot 2012-03-13 10:16:45 UTC
OK, to give previous comment some context:
On my filesystem new file version fixed recognition for ~150 python files and introduced "regression" for 27 files. Most of those text files actually contains pieces of Python code therefore the new recognition is not that odd. It might be more obvious from the full paths:

/usr/lib64/python2.6/idlelib/extend.txt:
/usr/lib64/rhythmbox/plugins/context/tmpl/artist-tmpl.html:
/usr/lib/mailman/scripts/driver:
/usr/lib/python2.6/site-packages/DecoratorTools-1.7-py2.6.egg-info/PKG-INFO:
/usr/share/doc/dbus-python-0.83.0/API_CHANGES.txt:
/usr/share/doc/dbus-python-0.83.0/tutorial.txt:
/usr/share/doc/dstat-0.7.0/dstat-paper.txt:
/usr/share/doc/numpy-1.3.0/docs-f2py/FAQ.txt:
/usr/share/doc/pykickstart-1.74.6/programmers-guide:
/usr/share/doc/python-lxml-2.2.3/doc/capi.txt:
/usr/share/doc/python-lxml-2.2.3/doc/FAQ.txt:
/usr/share/doc/python-lxml-2.2.3/doc/lxmlhtml.txt:
/usr/share/doc/python-lxml-2.2.3/doc/performance.txt:
/usr/share/doc/python-lxml-2.2.3/doc/s5/lxml-ep2008.txt:
/usr/share/doc/python-mako-0.3.4/doc/build/content/filtering.txt:
/usr/share/doc/python-mako-0.3.4/doc/build/content/namespaces.txt:
/usr/share/doc/python-mako-0.3.4/doc/build/content/syntax.txt:
/usr/share/doc/python-mako-0.3.4/doc/build/output/syntax.html:
/usr/share/doc/python-mako-0.3.4/doc/build/templates/formatting.html:
/usr/share/doc/python-matplotlib-0.99.1.2/INTERACTIVE:
/usr/share/doc/python-nose-0.10.4/README.txt:
/usr/share/doc/python-setuptools-0.6.10/docs/build/html/_sources/pkg_resources.txt:
/usr/share/doc/python-setuptools-0.6.10/docs/build/html/_sources/setuptools.txt:
/usr/share/doc/python-setuptools-0.6.10/docs/pkg_resources.txt:
/usr/share/doc/python-setuptools-0.6.10/docs/setuptools.txt:
/usr/share/doc/python-simplejson-2.0.9/docs/_sources/index.txt:
/usr/src/debug/xulrunner-1.9.2.17/mozilla-1.9.2/js/src/prmjtime.cpp:

I believe it makes sense to proceed with the fix.

Comment 12 Jan Kaluža 2012-03-13 10:35:56 UTC
The File tool can't work 100% especially in the source code detection problem. I admit this patch brings some regressions, but it still improves detection of Python scripts a lot. With the way how File works it's not possible to do some in-depth detection which would be beneficial in cases like this one.

Comment 13 errata-xmlrpc 2012-03-15 08:23:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0391.html