Bug 664920 - Review Request: perl-Web-Scraper - Web Scraping Toolkit using HTML and CSS Selectors or XPath expressions
Summary: Review Request: perl-Web-Scraper - Web Scraping Toolkit using HTML and CSS Se...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Petr Šabata
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 664912 809633
Blocks: rt4-dependencies-tracker
TreeView+ depends on / blocked
 
Reported: 2010-12-22 05:49 UTC by Ralf Corsepius
Modified: 2012-12-21 10:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-21 10:37:07 UTC
Type: ---
Embargoed:
psabata: fedora-review+
gwync: fedora-cvs+


Attachments (Terms of Use)
Specfile update (2.35 KB, patch)
2012-07-10 10:16 UTC, Xavier Bachelot
no flags Details | Diff
Full updated specfile (2.33 KB, text/x-rpm-spec)
2012-07-10 10:17 UTC, Xavier Bachelot
no flags Details

Description Ralf Corsepius 2010-12-22 05:49:02 UTC
Spec URL: http://corsepiu.fedorapeople.org/packages/perl-Web-Scraper.spec
SRPM URL: http://corsepiu.fedorapeople.org/packages/perl-Web-Scraper-0.32-1.fc15.src.rpm

Description: 
Web::Scraper is a web scraper toolkit, inspired by Ruby's equivalent
Scrapi. It provides a DSL-ish interface for traversing HTML documents and
returning a neatly arranged Perl data strcuture.

Comment 1 Xavier Bachelot 2012-04-16 10:17:44 UTC
The build failed for me (tested on both F16 i386 and F18 i386) :
#   Failed test ''
#   at t/10_invalid_xpath.t line 20.
#                   ''
#     doesn't match '(?^:look like)'
# Looks like you failed 1 test of 4.
t/10_invalid_xpath.t .... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests

Comment 2 Xavier Bachelot 2012-04-16 10:32:28 UTC
Builds fine in both cases after updating to latest upstream version (0.36) and adding BR: perl(Test::Base) and BR: perl(Test::Requires) (although the test output is not completely silent in the F16 case).

Comment 3 Petr Šabata 2012-04-16 11:22:58 UTC
Oh, yes, the dependencies are finally in Fedora.  I'll get to this...

Comment 4 Petr Šabata 2012-04-18 13:14:20 UTC
Using the provided package without custom modifications for the review.
I've tried building for current rawhide without success (see Xavier's comments).
The review is thus incomplete.

Package Review
==============

Key:
- = N/A
x = Pass
! = Fail
? = Not evaluated


==== Generic ====
[x]: MUST Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
[!]: MUST Package successfully compiles and builds into binary rpms on at
     least one supported primary architecture.
[-]: MUST %build honors applicable compiler flags or justifies otherwise.
[x]: MUST All build dependencies are listed in BuildRequires, except for any
     that are listed in the exceptions section of Packaging Guidelines.
[!]: MUST Buildroot is not present
     Note: Buildroot is not needed unless packager plans to package for EPEL5
[x]: MUST Package contains no bundled libraries.
[x]: MUST Changelog in prescribed format.
[!]: MUST Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
     Note: Clean is needed only if supporting EPEL
[x]: MUST Sources contain only permissible code or content.
[!]: MUST Each %files section contains %defattr if rpm < 4.4
     Note: defattr(....) present in %files section. This is OK if packaging
     for EPEL5. Otherwise not needed
[-]: MUST Macros in Summary, %description expandable at SRPM build time.
[?]: MUST Package requires other packages for directories it uses.
[x]: MUST Package uses nothing in %doc for runtime.
[x]: MUST Package is not known to require ExcludeArch.
[?]: MUST Permissions on files are set properly.
[x]: MUST Package does not contain duplicates in %files.
[x]: MUST Spec file lacks Packager, Vendor, PreReq tags.
[!]: MUST Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
     Note: rm -rf is only needed if supporting EPEL5
[-]: MUST Large documentation files are in a -doc subpackage, if required.
[-]: MUST If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %doc.
[x]: MUST License field in the package spec file matches the actual license.
[x]: MUST Package consistently uses macros (instead of hard-coded directory
     names).
[x]: MUST Package is named according to the Package Naming Guidelines.
[?]: MUST Package does not generate any conflict.
[x]: MUST Package obeys FHS, except libexecdir and /usr/target.
[x]: MUST Package must own all directories that it creates.
[x]: MUST Package does not own files or directories owned by other packages.
[?]: MUST Package installs properly.
[?]: MUST Requires correct, justified where necessary.
[x]: MUST Rpmlint output is silent.
[x]: MUST Sources used to build the package match the upstream source, as
     provided in the spec URL.
/home/contyk/src/review/664920/Web-Scraper-0.32.tar.gz :
  MD5SUM this package     : e647ea0406cce4462abf9ef9262d0fd7
  MD5SUM upstream package : e647ea0406cce4462abf9ef9262d0fd7
[x]: MUST Spec file is legible and written in American English.
[x]: MUST Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[-]: MUST Package contains a SysV-style init script if in need of one.
[x]: MUST File names are valid UTF-8.
[-]: MUST Useful -debuginfo package or justification otherwise.
[x]: SHOULD Reviewer should test that the package builds in mock.
[-]: SHOULD If the source package does not include license text(s) as a
     separate file from upstream, the packager SHOULD query upstream to
     include it.
[x]: SHOULD Dist tag is present.
[x]: SHOULD No file requires outside of /etc, /bin, /sbin, /usr/bin,
     /usr/sbin.
[?]: SHOULD Final provides and requires are sane (rpm -q --provides and rpm -q
     --requires).
[?]: SHOULD Package functions as described.
[!]: SHOULD Latest version is packaged.
[x]: SHOULD Package does not include license text files separate from
     upstream.
[x]: SHOULD SourceX is a working URL.
[-]: SHOULD Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
[-]: SHOULD Package should compile and build into binary rpms on all supported
     architectures.
[!]: SHOULD %check is present and all tests pass.
[x]: SHOULD Packages should try to preserve timestamps of original installed
     files.
[-]: SHOULD Spec use %global instead of %define.

Issues:
FIX: The package doesn't build.
FIX: The following BRs are missing in the package:
  perl(LWP::UserAgent),
  perl(base),
  perl(Carp),
  perl(List::Util),
  perl(Scalar::Util),
  perl(YAML),
  perl(Test::Base),
  perl(Test::More)
TODO: This isn't clearly going to any EPEL (since the package dependencies
  aren't there).  Please, drop the obsolete BuildRoot and defattr junk.
TODO: Since I can't build the package, I haven't tested the final REQ/PROV.
  However, I guess rpmbuild will fail to generate some deps, e.g.
  perl(LWP::UserAgent) from lib/Web/Scraper.pm:31,
  perl(URI) from lib/Web/Scraper.pm:172,
TODO: v0.36 is out; the package should be updated

Generated by fedora-review 0.2.0git
External plugins:

Comment 5 Xavier Bachelot 2012-04-18 13:30:59 UTC
> TODO: This isn't clearly going to any EPEL (since the package dependencies
>   aren't there).  Please, drop the obsolete BuildRoot and defattr junk.

Please don't, I'll likely be trying to build RT4 on EL6 once all the deps are in Fedora.

Comment 6 Xavier Bachelot 2012-07-09 14:54:42 UTC
(In reply to comment #5)
> > TODO: This isn't clearly going to any EPEL (since the package dependencies
> >   aren't there).  Please, drop the obsolete BuildRoot and defattr junk.
> 
> Please don't, I'll likely be trying to build RT4 on EL6 once all the deps
> are in Fedora.

The comment above is just plain wrong, there's no difference between Fedora and EL6 packages guidelines, so the BuildRoot and defattr can actually be dropped. Sorry for the noise.

Ralf, did you had time to look at this package ? If that helps, I can upload a fixed spec.

Comment 7 Ralf Corsepius 2012-07-10 05:40:49 UTC
(In reply to comment #6)

> Ralf, did you had time to look at this package ? If that helps, I can upload
> a fixed spec.

rt4 or this one?

perl-Web-Scrapper has completely dropped of my desk. I simply forgot about this. Sorry, I'll try too resume this review.

I do have a local fedora rt4 package, but haven't had time to look into it for several weeks ;)

Comment 8 Xavier Bachelot 2012-07-10 10:15:58 UTC
(In reply to comment #7)
> (In reply to comment #6)
> 
> > Ralf, did you had time to look at this package ? If that helps, I can upload
> > a fixed spec.
> 
> rt4 or this one?
> 
This one.

> perl-Web-Scrapper has completely dropped of my desk. I simply forgot about
> this. Sorry, I'll try too resume this review.
> 
No worries. Meanwhile, here are both a specfile patch as well as the full updated spec, in the hope it will spare you some time. It addresses all of Petr's comments above.

> I do have a local fedora rt4 package, but haven't had time to look into it
> for several weeks ;)

No worries, lack of time and too much to do is something we unfortunately all know about.

Comment 9 Xavier Bachelot 2012-07-10 10:16:32 UTC
Created attachment 597282 [details]
Specfile update

Comment 10 Xavier Bachelot 2012-07-10 10:17:15 UTC
Created attachment 597284 [details]
Full updated specfile

Comment 11 Petr Šabata 2012-07-12 14:01:23 UTC
Ok, looks good, Xavier.
Finally approving :)

Comment 12 Xavier Bachelot 2012-07-12 15:13:35 UTC
(In reply to comment #11)
> Ok, looks good, Xavier.
> Finally approving :)

I'm not the one requesting the review, Ralf is. I'm just trying to help moving forward, Ralf will have to approve the changes I made to his spec too.

Comment 13 Petr Šabata 2012-07-13 11:36:21 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Ok, looks good, Xavier.
> > Finally approving :)
> 
> I'm not the one requesting the review, Ralf is. I'm just trying to help
> moving forward, Ralf will have to approve the changes I made to his spec too.

Of course...  I sort of expected Ralf to automatically accept your SPEC improvements but I realize I might have been wrong.

Okay.  Removing the fedora-review+ flag for now.

Ralf, do you accept Xavier's proposed changes to your package?

Comment 15 Petr Šabata 2012-08-07 09:32:11 UTC
(In reply to comment #14)
> http://corsepiu.fedorapeople.org/packages/perl-Web-Scraper-0.36-1.fc18.src.
> rpm

A typo, I guess. I've used the 0.36-2 URL.

Could you comment on the tests patch?  Both of those links seem to be working and both are images...  I suppose this is just a workaround (hence the 'hacks' filename) for a different Web::Scraper issue but some comment would be nice.

Comment 16 Ralf Corsepius 2012-08-07 10:24:50 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > http://corsepiu.fedorapeople.org/packages/perl-Web-Scraper-0.36-1.fc18.src.
> > rpm
> 
> A typo, I guess. I've used the 0.36-2 URL.
Sorry, yes. This was cut'n'pasto from the initial submission.

> Could you comment on the tests patch?  Both of those links seem to be
> working

Not for me.

* 07-live.t fails, because this test expects a query to a remote website to return a string in lower case letters. For me, the string being returned is in "uppercase" letters. 
I have no idea about the cause of this issue, but would guess on something related to locales/i18n or similar, but I don't know where. It could be in Fedora, in my local setup, inside of the test or on the webserver.

* The other 2 tests also fail for me. AFAIU they faik because the *.gif the test tries to access, does not exist on the remote web-site. I am inclined to believe this is due to the remote site having been modified since this test was written.

Comment 17 Petr Šabata 2012-08-08 12:27:07 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > http://corsepiu.fedorapeople.org/packages/perl-Web-Scraper-0.36-1.fc18.src.
> > > rpm
> > 
> > A typo, I guess. I've used the 0.36-2 URL.
> Sorry, yes. This was cut'n'pasto from the initial submission.
> 
> > Could you comment on the tests patch?  Both of those links seem to be
> > working
> 
> Not for me.
> 
> * 07-live.t fails, because this test expects a query to a remote website to
> return a string in lower case letters. For me, the string being returned is
> in "uppercase" letters. 
> I have no idea about the cause of this issue, but would guess on something
> related to locales/i18n or similar, but I don't know where. It could be in
> Fedora, in my local setup, inside of the test or on the webserver.

I get that in uppercase too.  I guess the site has changed.

> * The other 2 tests also fail for me. AFAIU they faik because the *.gif the
> test tries to access, does not exist on the remote web-site. I am inclined
> to believe this is due to the remote site having been modified since this
> test was written.

I can fetch http://b.hatena.ne.jp/images/logo1.gif without any issues.  Maybe it was just temporarily unavailable.  I suppose you could drop this.

Anyway, thanks for the info; approving.

Comment 18 Ralf Corsepius 2012-08-09 03:29:41 UTC
(In reply to comment #17)

> > * The other 2 tests also fail for me. AFAIU they faik because the *.gif the
> > test tries to access, does not exist on the remote web-site. I am inclined
> > to believe this is due to the remote site having been modified since this
> > test was written.
> 
> I can fetch http://b.hatena.ne.jp/images/logo1.gif without any issues. 
Yes, but as far as I understand t/18* and t/19*, these tests try to extract ".../logo1.gif" on the URL "http://b.hatena.ne.jp/". 

Inspecting "http://b.hatena.ne.jp/" however shows, this page does not contain any reference ".../logo1.gif".

With my patch reverted, this happens:

t/17_filter_loop.t ............. ok
#   Failed test 'Absolute URI'
#   at t/18_http_response.t line 27.
#          got: 'http://b.hatena.ne.jp/images/title_hotentry_curvebox-header.gif'
#     expected: 'http://b.hatena.ne.jp/images/logo1.gif'
# Looks like you failed 1 test of 2.
t/18_http_response.t ...........   
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/2 subtests 
#   Failed test 'Absolute URI'
#   at t/19_decode_content.t line 28.
#          got: 'http://b.hatena.ne.jp/images/title_hotentry_curvebox-header.gif'
#     expected: 'http://b.hatena.ne.jp/images/logo1.gif'
# Looks like you failed 1 test of 2.

So, ... my guess is, "http://b.hatena.ne.jp/" has changed its contents and broken these tests - I recall these tests once were working.

> Anyway, thanks for the info; approving.
Thanks. I am currently on vacation, so this will take some time until I can take care about this (Next week or the week after next week).

Comment 19 Ralf Corsepius 2012-08-14 13:13:05 UTC
New Package SCM Request
=======================
Package Name: perl-Web-Scraper
Short Description: Web Scraping Toolkit using HTML and CSS Selectors or XPath expressions
Owners: corsepiu
Branches: f18 f17 f16
InitialCC: perl-sig

Comment 20 Gwyn Ciesla 2012-08-14 13:23:14 UTC
Git done (by process-git-requests).

Comment 21 Ralf Corsepius 2012-12-21 10:37:07 UTC
Closing.

The package has landed in Fedora back in September. No idea, why this bug wasn't closed then.


Note You need to log in before you can comment on or make changes to this bug.