Bug 1002721 - Review Request: tika - A content analysis toolkit
Summary: Review Request: tika - A content analysis toolkit
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Marek Goldmann
QA Contact: Fedora Extras Quality Assurance
: 1017546 (view as bug list)
Depends On:
Blocks: 1016622 1017645
TreeView+ depends on / blocked
Reported: 2013-08-29 19:10 UTC by gil cattaneo
Modified: 2013-10-16 08:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2013-10-16 08:06:30 UTC
mgoldman: fedora-review+
gwync: fedora-cvs+

Attachments (Terms of Use)

Description gil cattaneo 2013-08-29 19:10:37 UTC
Spec URL: http://gil.fedorapeople.org/tika.spec
SRPM URL: http://gil.fedorapeople.org/tika-1.4-1.fc19.src.rpm
The Apache Tika toolkit detects and extracts meta-data and
structured text content from various documents using existing
parser libraries.
Fedora Account System Username: gil

Comment 1 Marek Goldmann 2013-10-10 09:03:25 UTC
Is it really required to make the spec file so complex? My spec is a few times shorter.

Comment 2 gil cattaneo 2013-10-10 09:13:28 UTC
this version requires apache poi >= 3.9
and, when possible, and make it more simple spec file

Comment 3 Marek Goldmann 2013-10-10 09:18:48 UTC
apache-poi >= 3.9 is in F20 and Rawhide already, this is totally sufficient. We shouldn't add new packages to old and current releases without any reason. Tika is required to build infinispan 6 deps, which will land only in F20+.

Comment 4 gil cattaneo 2013-10-10 10:38:47 UTC
Spec URL: http://gil.fedorapeople.org/tika.spec
SRPM URL: http://gil.fedorapeople.org/tika-1.4-1.fc19.src.rpm

- simplified spec file

Comment 5 Marek Goldmann 2013-10-15 10:35:15 UTC
What does require the tika parsers module? I don't see any packages that are blocked by this tika review and I feel this is unnecessary ATM. For me this module is totally unnecessary ATM. I don't want to review 7 packages just to have tika core packaged.

My hibernate-search package (bug 1017645) requires only the core of tika and this is provided by a ready to review tika package in bug 1017546.

Comment 6 gil cattaneo 2013-10-15 10:51:02 UTC
for Bigdata project is required sorl 4.x series and the module requested for this purpose required also the tika parsers module

Comment 7 Marek Goldmann 2013-10-15 10:58:01 UTC
How important is the bigdata project? What triggered packaging it?

I'll not upgrade solr to version 4.x soon, since version 3.6.x is required by Hibernate.

Comment 8 gil cattaneo 2013-10-15 11:15:05 UTC
this mean which we should create a compact sorl package (sorl4)

Comment 9 Marek Goldmann 2013-10-15 11:26:16 UTC
No, in this case my package should be a compat package called "solr3".

The thing is that the dependency tree of this tika review is huge and it'll take some time to package it. Instead I propose to drop all modules except core, so I can work with my stuff (hibernate) and enable parsers afterwards. It's your choice if you want to do it on your spec file or my, but I won't review all the other packages.

I'm running out of time, WildFly 8.0.0.Beta1 is released and it's still not in Fedora 20. And the deadline is soon.

Comment 10 Marek Goldmann 2013-10-15 13:55:39 UTC
Package Review

[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated
[ ] = Manual review needed

===== MUST items =====

[x]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
[x]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses found:
     "Apache (v2.0)", "Unknown or generated". 10 files have unknown license.
[x]: License file installed when any subpackage combination is installed.
[x]: Package contains no bundled libraries without FPC exception.
[x]: Changelog in prescribed format.
[x]: Sources contain only permissible code or content.
[-]: Package contains desktop file if it is a GUI application.
[-]: Development files must be in a -devel package
[x]: Package uses nothing in %doc for runtime.
[x]: Package consistently uses macros (instead of hard-coded directory names).
[x]: Package is named according to the Package Naming Guidelines.
[x]: Package does not generate any conflict.
[-]: Package obeys FHS, except libexecdir and /usr/target.
[-]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
[x]: Requires correct, justified where necessary.
[x]: Spec file is legible and written in American English.
[-]: Package contains systemd file(s) if in need.
[x]: Package is not known to require an ExcludeArch tag.
[x]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 102400 bytes in 6 files.
[x]: Package complies to the Packaging Guidelines
[x]: Package successfully compiles and builds into binary rpms on at least one
     supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: No rpmlint messages.
[x]: If (and only if) the source package includes the text of the license(s)
     in its own file, then that file, containing the text of the license(s)
     for the package is included in %doc.
[x]: Package requires other packages for directories it uses.
[x]: Package must own all directories that it creates.
[x]: Package does not own files or directories owned by other packages.
[x]: All build dependencies are listed in BuildRequires, except for any that
     are listed in the exceptions section of Packaging Guidelines.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Each %files section contains %defattr if rpm < 4.4
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package use %makeinstall only when make install' ' DESTDIR=... doesn't
[x]: Package is named using only allowed ASCII characters.
[x]: Package do not use a name that already exist
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as provided
     in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

[x]: Packages have proper BuildRequires/Requires on jpackage-utils
     Note: Maven packages do not need to (Build)Require jpackage-utils. It is
     pulled in by maven-local
[x]: Javadoc documentation files are generated and included in -javadoc
[x]: Javadoc subpackages should not have Requires: jpackage-utils
[x]: Javadocs are placed in %{_javadocdir}/%{name} (no -%{version} symlink)
[x]: Bundled jar/class files should be removed before build

[x]: If package contains pom.xml files install it (including depmaps) even
     when building with ant
[x]: Pom files have correct Maven mapping
[x]: Maven packages should use new style packaging
[x]: Old add_to_maven_depmap macro is not being used
[x]: Packages DOES NOT have Requires(post) and Requires(postun) on jpackage-
     utils for %update_maven_depmap macro
[x]: Package DOES NOT use %update_maven_depmap in %post/%postun
[x]: Packages use %{_mavenpomdir} instead of %{_datadir}/maven2/poms

===== SHOULD items =====

[-]: If the source package does not include license text(s) as a separate file
     from upstream, the packager SHOULD query upstream to include it.
[x]: Final provides and requires are sane (see attachments).
[x]: Fully versioned dependency in subpackages if applicable.
     Note: No Requires: %{name}%{?_isa} = %{version}-%{release} in tika-
[x]: Package functions as described.
[-]: Latest version is packaged.
[x]: Package does not include license text files separate from upstream.
[x]: Patches link to upstream bugs/comments/lists or are otherwise justified.
[-]: Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
[x]: Package should compile and build into binary rpms on all supported
[-]: %check is present and all tests pass.
[x]: Packages should try to preserve timestamps of original installed files.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
[x]: Dist tag is present (not strictly required in GL).
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: SourceX tarball generation or download is documented.
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

[x]: Package uses upstream build method (ant/maven/etc.)
[x]: Packages are noarch unless they use JNI

===== EXTRA items =====

[x]: Rpmlint is run on all installed packages.
     Note: There are rpmlint messages (see attachment).
[x]: Large data in /usr/share should live in a noarch subpackage if package is
[x]: Spec file according to URL is the same as in SRPM.

Checking: tika-1.4-1.fc21.noarch.rpm
3 packages and 0 specfiles checked; 0 errors, 0 warnings.

Rpmlint (installed packages)
# rpmlint tika tika-javadoc
tika.noarch: W: name-repeated-in-summary C Tika
tika.noarch: W: spelling-error %description -l en_US metadata -> meta data, meta-data, metatarsal
2 packages and 0 specfiles checked; 0 errors, 2 warnings.
# echo 'rpmlint-done:'

tika (rpmlib, GLIBC filtered):

tika-javadoc (rpmlib, GLIBC filtered):



Source checksums
http://www.apache.org/dist/tika/tika-1.4-src.zip :
  CHECKSUM(SHA256) this package     : f588ddc41354196fce00052e7d4066b7cf6e7d3ba1017d5b35037621d6992607
  CHECKSUM(SHA256) upstream package : f588ddc41354196fce00052e7d4066b7cf6e7d3ba1017d5b35037621d6992607

Generated by fedora-review 0.5.0 (920221d) last change: 2013-08-30
Command line :/usr/bin/fedora-review -m fedora-rawhide-i386 -b 1002721 -v
Buildroot used: fedora-rawhide-i386
Active plugins: Generic, Shell-api, Java
Disabled plugins: C/C++, Python, SugarActivity, Perl, R, PHP, Ruby
Disabled flags: EPEL5, EXARCH, DISTTAG

Looks OK.

Comment 11 Marek Goldmann 2013-10-15 13:56:35 UTC
*** Bug 1017546 has been marked as a duplicate of this bug. ***

Comment 12 gil cattaneo 2013-10-15 14:14:21 UTC
Thanks for the review!

New Package SCM Request
Package Name: tika
Short Description: A content analysis toolkit
Owners: gil
Branches: f19 f20
InitialCC: java-sig

Comment 13 Gwyn Ciesla 2013-10-15 14:59:35 UTC
Git done (by process-git-requests).

Note You need to log in before you can comment on or make changes to this bug.