Bug 1002704 - Review Request: boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages
Review Request: boilerpipe - Boilerplate Removal and Fulltext Extraction from...
Product: Fedora
Classification: Fedora
Component: Package Review (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Björn "besser82" Esser
Fedora Extras Quality Assurance
Depends On:
Blocks: 1019650
  Show dependency treegraph
Reported: 2013-08-29 14:31 EDT by gil cattaneo
Modified: 2013-11-10 03:00 EST (History)
3 users (show)

See Also:
Fixed In Version: boilerpipe-1.2.0-1.fc20
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-10-28 23:46:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
besser82: fedora‑review+
limburgher: fedora‑cvs+

Attachments (Terms of Use)

  None (edit)
Description gil cattaneo 2013-08-29 14:31:37 EDT
Spec URL: http://gil.fedorapeople.org/boilerpipe.spec
SRPM URL: http://gil.fedorapeople.org/boilerpipe-1.2.0-1.fc19.src.rpm
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.

The library already provides specific strategies 
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate. 
Fedora Account System Username: gil
Comment 1 Björn "besser82" Esser 2013-10-16 14:14:43 EDT
taken  ;)
Comment 2 Björn "besser82" Esser 2013-10-19 04:06:12 EDT
Package has one small issue.  No blockers  :)


Package Review

[x] = Pass
[!] = Fail
[-] = Not applicable
[?] = Not evaluated

- Packages have proper BuildRequires/Requires on jpackage-utils

  ---> please add them as BuildRequires during import.

===== MUST items =====

[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
[x]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses found:
     "Apache (v2.0)", "Unknown or generated". 10 files have unknown license.
     Detailed output of licensecheck in

     ---> License-tag is fine.  :)

[x]: License file installed when any subpackage combination is installed.
[x]: Package contains no bundled libraries without FPC exception.
[x]: Changelog in prescribed format.
[x]: Sources contain only permissible code or content.
[-]: Package contains desktop file if it is a GUI application.
[-]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory names).
[x]: Package is named according to the Package Naming Guidelines.
[x]: Package does not generate any conflict.
[x]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[x]: Requires correct, justified where necessary.
[x]: Spec file is legible and written in American English.
[-]: Package contains systemd file(s) if in need.
[x]: Package is not known to require an ExcludeArch tag.
[-]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 10240 bytes in 2 files.
[!]: Package complies to the Packaging Guidelines

     ---> missing `BuildRequires: jpackage-utils`.
          Can be fixed during import.

[x]: Package successfully compiles and builds into binary rpms on at least one
     supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: No rpmlint messages.
[x]: If (and only if) the source package includes the text of the license(s)
     in its own file, then that file, containing the text of the license(s)
     for the package is included in %doc.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package does not own files or directories owned by other packages.
[x]: All build dependencies are listed in BuildRequires, except for any that
     are listed in the exceptions section of Packaging Guidelines.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Each %files section contains %defattr if rpm < 4.4
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package use %makeinstall only when make install' ' DESTDIR=... doesn't
[x]: Package is named using only allowed ASCII characters.
[x]: Package do not use a name that already exist
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as provided
     in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

[x]: Javadoc documentation files are generated and included in -javadoc
[x]: Javadoc subpackages should not have Requires: jpackage-utils
[x]: Javadocs are placed in %{_javadocdir}/%{name} (no -%{version} symlink)
[x]: Bundled jar/class files should be removed before build

[x]: Pom files have correct Maven mapping
     Note: Some add_maven_depmap calls found. Please check if they are correct
     or update to latest guidelines
[x]: If package contains pom.xml files install it (including depmaps) even
     when building with ant
[x]: Old add_to_maven_depmap macro is not being used
[x]: Packages DOES NOT have Requires(post) and Requires(postun) on jpackage-
     utils for %update_maven_depmap macro
[x]: Package DOES NOT use %update_maven_depmap in %post/%postun
[x]: Packages use %{_mavenpomdir} instead of %{_datadir}/maven2/poms

===== SHOULD items =====

[-]: If the source package does not include license text(s) as a separate file
     from upstream, the packager SHOULD query upstream to include it.
[x]: Final provides and requires are sane (see attachments).
[-]: Fully versioned dependency in subpackages if applicable.
     Note: No Requires: %{name}%{?_isa} = %{version}-%{release} in boilerpipe-

     ---> False positive.  Documentation should have no Requires
          on main-pkg.

[x]: Package functions as described.
[x]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[x]: Patches link to upstream bugs/comments/lists or are otherwise justified.
[-]: Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
[x]: Package should compile and build into binary rpms on all supported
[-]: %check is present and all tests pass.

     ---> no testsuite available.

[x]: Packages should try to preserve timestamps of original installed files.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
[x]: Dist tag is present (not strictly required in GL).
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: SourceX tarball generation or download is documented.
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

[x]: Package uses upstream build method (ant/maven/etc.)
[x]: Packages are noarch unless they use JNI

===== EXTRA items =====

[x]: Rpmlint is run on all installed packages.
     Note: No rpmlint messages.
[x]: Large data in /usr/share should live in a noarch subpackage if package is
[x]: Spec file according to URL is the same as in SRPM.

Checking: boilerpipe-1.2.0-1.fc21.noarch.rpm
3 packages and 0 specfiles checked; 0 errors, 0 warnings.

Rpmlint (installed packages)
# rpmlint boilerpipe-javadoc boilerpipe
2 packages and 0 specfiles checked; 0 errors, 0 warnings.
# echo 'rpmlint-done:'

boilerpipe-javadoc (rpmlib, GLIBC filtered):

boilerpipe (rpmlib, GLIBC filtered):



Source checksums
http://boilerpipe.googlecode.com/files/boilerpipe-1.2.0-src.tar.gz :
  CHECKSUM(SHA256) this package     : b87ce6e374081a417bf54016fda504b174445c6c9a275c73735c00b85f7080b4
  CHECKSUM(SHA256) upstream package : b87ce6e374081a417bf54016fda504b174445c6c9a275c73735c00b85f7080b4
http://boilerpipe.googlecode.com/svn/repo/de/l3s/boilerpipe/boilerpipe/1.2.0/boilerpipe-1.2.0.pom :
  CHECKSUM(SHA256) this package     : e25b1effb6835042e98e8f0e1f60c5c0cf8ef7339422b47df7b6d92605d546c0
  CHECKSUM(SHA256) upstream package : e25b1effb6835042e98e8f0e1f60c5c0cf8ef7339422b47df7b6d92605d546c0

Generated by fedora-review 0.5.0 (920221d) last change: 2013-08-30
Command line :/usr/bin/fedora-review -m fedora-rawhide-x86_64 -b 1002704
Buildroot used: fedora-rawhide-x86_64
Active plugins: Generic, Shell-api, Java
Disabled plugins: C/C++, Python, SugarActivity, Perl, R, PHP, Ruby
Disabled flags: EPEL5, EXARCH, DISTTAG


Comment 3 gil cattaneo 2013-10-19 06:36:42 EDT
BuildRequires: jpackage-utils and Required: jpackage-utils is non needed
because jpackage-utils (retired) was replaced by javapackages-tools


New Package SCM Request
Package Name: boilerpipe
Short Description: Boilerplate Removal and Fulltext Extraction from HTML pages
Owners: gil
Branches: f19 f20
InitialCC: java-sig
Comment 4 Gwyn Ciesla 2013-10-19 17:11:48 EDT
Git done (by process-git-requests).
Comment 5 gil cattaneo 2013-10-19 17:13:15 EDT
Comment 6 Fedora Update System 2013-10-19 18:13:48 EDT
boilerpipe-1.2.0-1.fc20 has been submitted as an update for Fedora 20.
Comment 7 Fedora Update System 2013-10-19 18:20:55 EDT
boilerpipe-1.2.0-1.fc19 has been submitted as an update for Fedora 19.
Comment 8 Fedora Update System 2013-10-20 13:45:46 EDT
boilerpipe-1.2.0-1.fc20 has been pushed to the Fedora 20 testing repository.
Comment 9 Fedora Update System 2013-10-28 23:46:02 EDT
boilerpipe-1.2.0-1.fc19 has been pushed to the Fedora 19 stable repository.
Comment 10 Fedora Update System 2013-11-10 03:00:06 EST
boilerpipe-1.2.0-1.fc20 has been pushed to the Fedora 20 stable repository.

Note You need to log in before you can comment on or make changes to this bug.