Bug 1051598 - Perl scripts misidentified as perl modules
Summary: Perl scripts misidentified as perl modules
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: file
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jan Kaluža
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1026760 1051607 1057718
TreeView+ depends on / blocked
 
Reported: 2014-01-10 16:39 UTC by Peter Oliver
Modified: 2014-01-24 16:34 UTC (History)
4 users (show)

Fixed In Version: file-5.14-14.fc20
Clone Of:
Environment:
Last Closed: 2014-01-17 05:45:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Test case (75 bytes, application/x-perl)
2014-01-10 16:39 UTC, Peter Oliver
no flags Details

Description Peter Oliver 2014-01-10 16:39:01 UTC
Created attachment 848299 [details]
Test case

See the attached perl script.  On Fedora 19, file correctly identifies this as a script, whereas on Fedora 20 it is incorrectly identified as a module.

f19> file example.pl 
example.pl: Perl script, ASCII text executable

f20> file example.pl 
example.pl: Perl5 module source, ASCII text

Undoing the patch http://pkgs.fedoraproject.org/cgit/file.git/tree/file-5.14-perl.patch?h=f20 fixes this.

Comment 1 Jan Kaluža 2014-01-13 09:29:47 UTC
This is questionable problem. File from usptream currently distinguish between Perl Module (contains "package" keyword in proper form) and Perl script (contains Perl shebang).

The mentioned patch had to be introduced to fix <http://bugs.gw.com/view.php?id=317>, otherwise there would be problem with dependency generators in RPM.

The question is if this script is really Perl script or if it's Perl Module. Currently File checks for "package" and if it finds that, it says the file is Perl Module. If there's no "package" keyword, it checks for Perl shebang and says the file is "Perl Script. I can switch the order of these two checks, but I'm not sure it would make more sense...

Comment 2 Panu Matilainen 2014-01-13 10:38:14 UTC
The problem with these kind of things is that technically a perl/python/whatever thingie can be *both* a module and an executable script (a common situation is running unit tests of a module when executed).

Whether the changed order makes more sense universally I dunno, but it'd be good from rpm POV. Testing for file executability might also help decide in such situations.

Comment 3 Peter Oliver 2014-01-13 12:14:59 UTC
(In reply to Jan Kaluža from comment #1)
> The mentioned patch had to be introduced to fix
> <http://bugs.gw.com/view.php?id=317>, otherwise there would be problem with
> dependency generators in RPM.

I think you mean http://bugs.gw.com/view.php?id=166.

Comment 4 Peter Oliver 2014-01-13 12:17:36 UTC
(In reply to Panu Matilainen from comment #2)
> The problem with these kind of things is that technically a
> perl/python/whatever thingie can be *both* a module and an executable script
> (a common situation is running unit tests of a module when executed).

I don't think testing in that way is something I've ever seen in Perl (although I'm sure it would be possible, so someone's probably doing it somewhere).

> Whether the changed order makes more sense universally I dunno, but it'd be
> good from rpm POV. Testing for file executability might also help decide in
> such situations.

I'm not sure if this helps, but the Perl manual defines a module thus:
       A module is just a set of related functions in a library file,
       i.e., a Perl package with the same name as the file.
(http://perldoc.perl.org/perlmod.html#Perl-Modules)

Comment 5 Jan Kaluža 2014-01-15 08:48:16 UTC
(In reply to Peter Oliver from comment #3)
> (In reply to Jan Kaluža from comment #1)
> > The mentioned patch had to be introduced to fix
> > <http://bugs.gw.com/view.php?id=317>, otherwise there would be problem with
> > dependency generators in RPM.
> 
> I think you mean http://bugs.gw.com/view.php?id=166.

No, I really mean 317. Awk/HTML patterns in File are stronger than "Perl module" pattern. That means that if Perl file contains "BEGIN {" or HTML keywords, File will say it's Awk/HTML script without even considering it could be Perl script. I have increased the strength of Perl module pattern to overcome that.

Note that this change probably breaks detection of some HTML files which include Perl, because File will detect Perl according to "package" without detecting HTML tags. That's the way how File works, you can't detect everything 100% because of its design.

> I'm not sure if this helps, but the Perl manual defines a module thus:
>       A module is just a set of related functions in a library file,
>       i.e., a Perl package with the same name as the file.
>(http://perldoc.perl.org/perlmod.html#Perl-Modules)

"a Perl package with the same name as the file"

There's no way in File to check the filename without rewriting it, so it only tries to check if the valid "package" is there. I don't think the current behaviour is wrong.

Comment 6 Jan Kaluža 2014-01-15 08:55:37 UTC
(In reply to Panu Matilainen from comment #2)
> The problem with these kind of things is that technically a
> perl/python/whatever thingie can be *both* a module and an executable script
> (a common situation is running unit tests of a module when executed).
> 
> Whether the changed order makes more sense universally I dunno, but it'd be
> good from rpm POV. Testing for file executability might also help decide in
> such situations.

It would be the same from RPM POV. File would return "Perl module" in some cases and if you are not handling that, you will miss the deps.

Comment 7 Panu Matilainen 2014-01-15 09:39:12 UTC
There are actually two separate issues here for rpm:
1) the file not getting classified as script of any kind now, despite shebang and executable bits
2) the file getting classified as a perl module instead of perl script, causing different rules to be applied

Part 1) is a consistency issue in libmagic, if a file has a shebang it should mention it somehow I think. In general stuff detected as scripts are "<lang> script, <encoding> text executable" which is kinda saying the same thing twice. Could something similar be done here, eg include both classifications: "Perl5 module source, Perl script". Or something like that? 

Part 2) probably needs adjusting in rpm-side anyhow, it can just match anything resembling perl and filter further from that, as provides should only be created for actual modules.

Comment 8 Peter Oliver 2014-01-15 13:02:08 UTC
(In reply to Jan Kaluža from comment #5)
> No, I really mean 317. Awk/HTML patterns in File are stronger than "Perl
> module" pattern. That means that if Perl file contains "BEGIN {" or HTML
> keywords, File will say it's Awk/HTML script without even considering it
> could be Perl script. I have increased the strength of Perl module pattern
> to overcome that.

Ah, my mistake.  I was confused by the patch being older than the bug.

So, at the risk of teaching my grandmother to suck eggs, would it make sense to increase the strength of the Perl script rules by the same amount, so that the Perl script rule is stronger than the Perl module rule, as it was before?

Comment 9 Jan Kaluža 2014-01-15 13:32:53 UTC
I will do that, but it won't fix the problem number 2) mentioned by Panu:

> 2) the file getting classified as a perl module instead of perl script, causing different rules to be applied

Comment 10 Fedora Update System 2014-01-15 13:43:02 UTC
file-5.14-14.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/file-5.14-14.fc20

Comment 11 Todd Zullinger 2014-01-15 20:38:32 UTC
I ran a mock build of git with file-5.14-14.fc20 and git-svn picked up the correct dependencies.  I'm not sure that's enough testing to give the update positive karma in bodhi (since it's in the critical path), but I did want to say that it looks to have fixed the issue for me.  Thanks!

<mock-chroot>[root@f20-64 /]# rpm -q file
file-5.14-14.fc20.x86_64

<mock-chroot>[root@f20-64 /]# file /builddir/build/BUILD/git-1.8.5.3/git-svn
/builddir/build/BUILD/git-1.8.5.3/git-svn: Perl script, ASCII text executable

<mock-chroot>[root@f20-64 /]# rpm -qp --requires /builddir/build/RPMS/git-svn-1.8.5.3-1.fc20.x86_64.rpm 
/usr/bin/perl
git = 1.8.5.3-1.fc20
libc.so.6()(64bit)
libc.so.6(GLIBC_2.14)(64bit)
libc.so.6(GLIBC_2.2.5)(64bit)
libc.so.6(GLIBC_2.3)(64bit)
libc.so.6(GLIBC_2.3.4)(64bit)
libc.so.6(GLIBC_2.4)(64bit)
libc.so.6(GLIBC_2.7)(64bit)
libpcre.so.1()(64bit)
libpthread.so.0()(64bit)
libpthread.so.0(GLIBC_2.2.5)(64bit)
libz.so.1()(64bit)
libz.so.1(ZLIB_1.2.0)(64bit)
perl >= 0:5.008
perl(Carp)
perl(Digest::MD5)
perl(File::Basename)
perl(File::Find)
perl(File::Path)
perl(File::Spec)
perl(Getopt::Long)
perl(Git)
perl(Git::SVN)
perl(Git::SVN::Editor)
perl(Git::SVN::Fetcher)
perl(Git::SVN::Log)
perl(Git::SVN::Migration)
perl(Git::SVN::Prompt)
perl(Git::SVN::Ra)
perl(Git::SVN::Utils)
perl(IO::File)
perl(IPC::Open3)
perl(Memoize)
perl(Term::ReadKey)
perl(Term::ReadLine)
perl(lib)
perl(strict)
perl(vars)
perl(warnings)
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rtld(GNU_HASH)
subversion
rpmlib(PayloadIsXz) <= 5.2-1

Comment 12 Fedora Update System 2014-01-16 07:04:20 UTC
Package file-5.14-14.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing file-5.14-14.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-0910/file-5.14-14.fc20
then log in and leave karma (feedback).

Comment 13 Fedora Update System 2014-01-17 05:45:11 UTC
file-5.14-14.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.