Bug 1002704

Summary: Review Request: boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages
Product: [Fedora] Fedora Reporter: gil cattaneo <puntogil>
Component: Package ReviewAssignee: Björn 'besser82' Esser <besser82>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: besser82, notting, package-review
Target Milestone: ---Flags: besser82: fedora-review+
gwync: fedora-cvs+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: boilerpipe-1.2.0-1.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-29 03:46:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1019650    

Description gil cattaneo 2013-08-29 18:31:37 UTC
Spec URL: http://gil.fedorapeople.org/boilerpipe.spec
SRPM URL: http://gil.fedorapeople.org/boilerpipe-1.2.0-1.fc19.src.rpm
Description:
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.

The library already provides specific strategies 
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate. 
Fedora Account System Username: gil

Comment 1 Björn 'besser82' Esser 2013-10-16 18:14:43 UTC
taken  ;)

Comment 2 Björn 'besser82' Esser 2013-10-19 08:06:12 UTC
Package has one small issue.  No blockers  :)

#####

Package Review
==============

Legend:
[x] = Pass
[!] = Fail
[-] = Not applicable
[?] = Not evaluated


Issues:
=======
- Packages have proper BuildRequires/Requires on jpackage-utils

  ---> please add them as BuildRequires during import.


===== MUST items =====

Generic:
[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
[x]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses found:
     "Apache (v2.0)", "Unknown or generated". 10 files have unknown license.
     Detailed output of licensecheck in
     /home/besser82/shared/fedora/review/1002704-boilerpipe/licensecheck.txt

     ---> License-tag is fine.  :)

[x]: License file installed when any subpackage combination is installed.
[x]: Package contains no bundled libraries without FPC exception.
[x]: Changelog in prescribed format.
[x]: Sources contain only permissible code or content.
[-]: Package contains desktop file if it is a GUI application.
[-]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory names).
[x]: Package is named according to the Package Naming Guidelines.
[x]: Package does not generate any conflict.
[x]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[x]: Requires correct, justified where necessary.
[x]: Spec file is legible and written in American English.
[-]: Package contains systemd file(s) if in need.
[x]: Package is not known to require an ExcludeArch tag.
[-]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 10240 bytes in 2 files.
[!]: Package complies to the Packaging Guidelines

     ---> missing `BuildRequires: jpackage-utils`.
          Can be fixed during import.

[x]: Package successfully compiles and builds into binary rpms on at least one
     supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: No rpmlint messages.
[x]: If (and only if) the source package includes the text of the license(s)
     in its own file, then that file, containing the text of the license(s)
     for the package is included in %doc.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package does not own files or directories owned by other packages.
[x]: All build dependencies are listed in BuildRequires, except for any that
     are listed in the exceptions section of Packaging Guidelines.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Each %files section contains %defattr if rpm < 4.4
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package use %makeinstall only when make install' ' DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package do not use a name that already exist
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as provided
     in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

Java:
[x]: Javadoc documentation files are generated and included in -javadoc
     subpackage
[x]: Javadoc subpackages should not have Requires: jpackage-utils
[x]: Javadocs are placed in %{_javadocdir}/%{name} (no -%{version} symlink)
[x]: Bundled jar/class files should be removed before build

Maven:
[x]: Pom files have correct Maven mapping
     Note: Some add_maven_depmap calls found. Please check if they are correct
     or update to latest guidelines
[x]: If package contains pom.xml files install it (including depmaps) even
     when building with ant
[x]: Old add_to_maven_depmap macro is not being used
[x]: Packages DOES NOT have Requires(post) and Requires(postun) on jpackage-
     utils for %update_maven_depmap macro
[x]: Package DOES NOT use %update_maven_depmap in %post/%postun
[x]: Packages use %{_mavenpomdir} instead of %{_datadir}/maven2/poms

===== SHOULD items =====

Generic:
[-]: If the source package does not include license text(s) as a separate file
     from upstream, the packager SHOULD query upstream to include it.
[x]: Final provides and requires are sane (see attachments).
[-]: Fully versioned dependency in subpackages if applicable.
     Note: No Requires: %{name}%{?_isa} = %{version}-%{release} in boilerpipe-
     javadoc

     ---> False positive.  Documentation should have no Requires
          on main-pkg.

[x]: Package functions as described.
[x]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[x]: Patches link to upstream bugs/comments/lists or are otherwise justified.
[-]: Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
[x]: Package should compile and build into binary rpms on all supported
     architectures.
[-]: %check is present and all tests pass.

     ---> no testsuite available.

[x]: Packages should try to preserve timestamps of original installed files.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: Dist tag is present (not strictly required in GL).
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: SourceX tarball generation or download is documented.
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

Java:
[x]: Package uses upstream build method (ant/maven/etc.)
[x]: Packages are noarch unless they use JNI

===== EXTRA items =====

Generic:
[x]: Rpmlint is run on all installed packages.
     Note: No rpmlint messages.
[x]: Large data in /usr/share should live in a noarch subpackage if package is
     arched.
[x]: Spec file according to URL is the same as in SRPM.


Rpmlint
-------
Checking: boilerpipe-1.2.0-1.fc21.noarch.rpm
          boilerpipe-javadoc-1.2.0-1.fc21.noarch.rpm
          boilerpipe-1.2.0-1.fc21.src.rpm
3 packages and 0 specfiles checked; 0 errors, 0 warnings.




Rpmlint (installed packages)
----------------------------
# rpmlint boilerpipe-javadoc boilerpipe
2 packages and 0 specfiles checked; 0 errors, 0 warnings.
# echo 'rpmlint-done:'



Requires
--------
boilerpipe-javadoc (rpmlib, GLIBC filtered):
    jpackage-utils

boilerpipe (rpmlib, GLIBC filtered):
    java
    javapackages-tools
    jpackage-utils
    nekohtml



Provides
--------
boilerpipe-javadoc:
    boilerpipe-javadoc

boilerpipe:
    boilerpipe
    mvn(de.l3s.boilerpipe:boilerpipe)



Source checksums
----------------
http://boilerpipe.googlecode.com/files/boilerpipe-1.2.0-src.tar.gz :
  CHECKSUM(SHA256) this package     : b87ce6e374081a417bf54016fda504b174445c6c9a275c73735c00b85f7080b4
  CHECKSUM(SHA256) upstream package : b87ce6e374081a417bf54016fda504b174445c6c9a275c73735c00b85f7080b4
http://boilerpipe.googlecode.com/svn/repo/de/l3s/boilerpipe/boilerpipe/1.2.0/boilerpipe-1.2.0.pom :
  CHECKSUM(SHA256) this package     : e25b1effb6835042e98e8f0e1f60c5c0cf8ef7339422b47df7b6d92605d546c0
  CHECKSUM(SHA256) upstream package : e25b1effb6835042e98e8f0e1f60c5c0cf8ef7339422b47df7b6d92605d546c0


Generated by fedora-review 0.5.0 (920221d) last change: 2013-08-30
Command line :/usr/bin/fedora-review -m fedora-rawhide-x86_64 -b 1002704
Buildroot used: fedora-rawhide-x86_64
Active plugins: Generic, Shell-api, Java
Disabled plugins: C/C++, Python, SugarActivity, Perl, R, PHP, Ruby
Disabled flags: EPEL5, EXARCH, DISTTAG

#####

APPROVED!!!

Comment 3 gil cattaneo 2013-10-19 10:36:42 UTC
BuildRequires: jpackage-utils and Required: jpackage-utils is non needed
because jpackage-utils (retired) was replaced by javapackages-tools

Thanks!

New Package SCM Request
=======================
Package Name: boilerpipe
Short Description: Boilerplate Removal and Fulltext Extraction from HTML pages
Owners: gil
Branches: f19 f20
InitialCC: java-sig

Comment 4 Gwyn Ciesla 2013-10-19 21:11:48 UTC
Git done (by process-git-requests).

Comment 5 gil cattaneo 2013-10-19 21:13:15 UTC
Thanks!

Comment 6 Fedora Update System 2013-10-19 22:13:48 UTC
boilerpipe-1.2.0-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/boilerpipe-1.2.0-1.fc20

Comment 7 Fedora Update System 2013-10-19 22:20:55 UTC
boilerpipe-1.2.0-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/boilerpipe-1.2.0-1.fc19

Comment 8 Fedora Update System 2013-10-20 17:45:46 UTC
boilerpipe-1.2.0-1.fc20 has been pushed to the Fedora 20 testing repository.

Comment 9 Fedora Update System 2013-10-29 03:46:02 UTC
boilerpipe-1.2.0-1.fc19 has been pushed to the Fedora 19 stable repository.

Comment 10 Fedora Update System 2013-11-10 08:00:06 UTC
boilerpipe-1.2.0-1.fc20 has been pushed to the Fedora 20 stable repository.