Bug 1109072 - subpackage for the tika standalone app
Summary: subpackage for the tika standalone app
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: tika
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: gil cattaneo
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-13 08:06 UTC by Fabrice Bellet
Modified: 2014-06-27 21:49 UTC (History)
2 users (show)

Fixed In Version: tika-1.4-4.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-27 21:49:25 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
tike.spec patch (2.61 KB, patch)
2014-06-13 08:08 UTC, Fabrice Bellet
no flags Details | Diff
updated spec file with maven-shade-plugin disabled (9.47 KB, patch)
2014-06-18 08:25 UTC, Fabrice Bellet
no flags Details | Diff

Description Fabrice Bellet 2014-06-13 08:06:24 UTC
Hi!

it would be great to provide the tika standalone app in a sub-package.

thanks,

Comment 1 Fabrice Bellet 2014-06-13 08:08:01 UTC
Created attachment 908406 [details]
tike.spec patch

proposed patch for tika app subpackage

Comment 2 gil cattaneo 2014-06-13 15:24:25 UTC
hi, 
the patch seem have wrong references to
+BuildRequires: mvn(org.apache.felix:maven-shade-plugin:1.6)
maven-shade-plugin should be removed from pom file, is not usable
see http://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries [1]
(installable) tika-1.4/tika-app/target/original-tika-app-1.4.jar
[1] tika-1.4/tika-app/target/tika-app-1.4.jar

and please remove the version to (i used only how reminder ... e.g. isoparser )
+BuildRequires: mvn(com.google.code.gson:gson:1.7.1)
ig you want use this module, you can create a launcher script

with the following deps, merged by shade plugin (i don't know if are all necessary)
[INFO] --- maven-shade-plugin:2.0:shade (default) @ tika-app ---
[INFO] Including org.apache.tika:tika-parsers:jar:1.4 in the shaded jar.
[INFO] Including javax.xml.stream:stax-api:jar:1.0.1 in the shaded jar.
[INFO] Including org.apache.tika:tika-core:jar:1.4 in the shaded jar.
[INFO] Including edu.ucar:netcdf:jar:4.2-min in the shaded jar.
[INFO] Including edu.ucar:udunits:jar:4.4.2 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpclient:jar:4.2.6 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpcore:jar:4.2.5 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpmime:jar:4.2.6 in the shaded jar.
[INFO] Including org.slf4j:jcl-over-slf4j:jar:1.7.5 in the shaded jar.
[INFO] Including joda-time:joda-time:jar:2.2 in the shaded jar.
[INFO] Including org.jdom:jdom2:jar:2.0.4 in the shaded jar.
[INFO] Including net.jcip:jcip-annotations:jar:1.0 in the shaded jar.
[INFO] Including org.quartz-scheduler:quartz:jar:2.2.0 in the shaded jar.
[INFO] Including javax.mail:mail:jar:any in the shaded jar.
[INFO] Including javax.ejb:ejb:jar:any in the shaded jar.
[INFO] Including javax.servlet:servlet-api:jar:any in the shaded jar.
[INFO] Including javax.jms:jms:jar:any in the shaded jar.
[INFO] Including javax.transaction:jta:jar:any in the shaded jar.
[INFO] Including com.mchange:c3p0:jar:0.9.1.1 in the shaded jar.
[INFO] Including com.mchange:mchange-commons-java:jar:0.2.3.4 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:2.5.0 in the shaded jar.
[INFO] Including net.sf.ehcache:ehcache-core:jar:2.6.2 in the shaded jar.
[INFO] Including com.sleepycat:je:jar:4.0.92 in the shaded jar.
[INFO] Including org.ow2.asm:asm:jar:4.1 in the shaded jar.
[INFO] Including org.apache.james:apache-mime4j-core:jar:0.7.2 in the shaded jar.
[INFO] Including org.apache.james:apache-mime4j-dom:jar:0.7.2 in the shaded jar.
[INFO] Including org.apache.commons:commons-compress:jar:1.5 in the shaded jar.
[INFO] Including org.tukaani:xz:jar:1.2 in the shaded jar.
[INFO] Including commons-codec:commons-codec:jar:1.5 in the shaded jar.
[INFO] Including org.apache.pdfbox:pdfbox:jar:1.8.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:fontbox:jar:1.8.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:jempbox:jar:1.8.1 in the shaded jar.
[INFO] Including commons-logging:commons-logging:jar:1.1.1 in the shaded jar.
[INFO] Including avalon-framework:avalon-framework-api:jar:4.3 in the shaded jar.
[INFO] Including avalon-logkit:avalon-logkit:jar:2.1 in the shaded jar.
[INFO] Including org.bouncycastle:bcmail-jdk16:jar:1.45 in the shaded jar.
[INFO] Including org.bouncycastle:bcprov-jdk16:jar:1.45 in the shaded jar.
[INFO] Including org.apache.poi:poi:jar:3.9 in the shaded jar.
[INFO] Including org.apache.poi:poi-scratchpad:jar:3.9 in the shaded jar.
[INFO] Including org.apache.poi:poi-ooxml:jar:3.9 in the shaded jar.
[INFO] Including org.apache.poi:poi-ooxml-schemas:jar:3.9 in the shaded jar.
[INFO] Including org.apache.xmlbeans:xmlbeans:jar:2.3.0 in the shaded jar.
[INFO] Including dom4j:dom4j:jar:1.6.1 in the shaded jar.
[INFO] Including org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1 in the shaded jar.
[INFO] Including org.ow2.asm:asm-all:jar:4.1 in the shaded jar.
[INFO] Including com.drewnoakes:metadata-extractor:jar:2 in the shaded jar.
[INFO] Including xerces:xercesImpl:jar:2.8.1 in the shaded jar.
[INFO] Including xml-apis:xml-apis:jar:1.4.01 in the shaded jar.
[INFO] Including de.l3s.boilerpipe:boilerpipe:jar:1.1.0 in the shaded jar.
[INFO] Including net.sourceforge.nekohtml:nekohtml:jar:1.9.14 in the shaded jar.
[INFO] Including rome:rome:jar:0.9 in the shaded jar.
[INFO] Including jdom:jdom:jar:1.0 in the shaded jar.
[INFO] Including org.gagravarr:vorbis-java-core:jar:0.1 in the shaded jar.
[INFO] Including com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3 in the shaded jar.
[INFO] Including org.apache.tika:tika-xmp:jar:1.4 in the shaded jar.
[INFO] Including com.adobe.xmp:xmpcore:jar:5.1.2 in the shaded jar.
[INFO] Including org.slf4j:slf4j-log4j12:jar:1.5.6 in the shaded jar.
[INFO] Including org.slf4j:slf4j-api:jar:1.7.4 in the shaded jar.
[INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:1.7.1 in the shaded jar.

regards

Comment 3 gil cattaneo 2014-06-13 15:30:04 UTC
e.g.
%jpackage_script org.apache.tika.cli.TikaCLI "" "" %{name}/%{name}:[list of all required libs, with separate ":"] %{name}-app true

Comment 4 gil cattaneo 2014-06-13 15:30:34 UTC
%jpackage_script org.apache.tika.cli.TikaCLI "" "" %{name}:[list of all required libs, with separate ":"] %{name}-app true

Comment 5 Fabrice Bellet 2014-06-18 08:25:39 UTC
Created attachment 909877 [details]
updated spec file with maven-shade-plugin disabled

Ah I was not aware that all required jar files were bundled in the single tika-app jar file thanks to maven-shade-plugin: thanks for the feedback!

I updated the spec file following your suggestion. The standalone tika-app can process *almost* all sample documents from the tika-parsers subdir (except some encrypted microsoft office files), so I assume I collected all the dependencies required to the tika-parsers.

The problem with this wrapper script is that it cannot be used as-is in the drupal search_api_attachment module, because it expects the self contained tika-app.jar file, which was my primary motivation to get tika-app shipped in fedora, but I can live with that, and patch the drupal module accordingly (this is a third party module, not packaged in Fedora anyway).

Comment 6 gil cattaneo 2014-06-18 13:39:10 UTC
Please , can attached a git format patch?

Comment 7 Fedora Update System 2014-06-18 15:16:03 UTC
tika-1.4-4.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/tika-1.4-4.fc20

Comment 9 Fabrice Bellet 2014-06-18 16:18:04 UTC
great, it works for me, thanks!

Comment 10 Fedora Update System 2014-06-19 22:55:34 UTC
Package tika-1.4-4.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing tika-1.4-4.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-7507/tika-1.4-4.fc20
then log in and leave karma (feedback).

Comment 11 Fedora Update System 2014-06-27 21:49:25 UTC
tika-1.4-4.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.