Bug 1109072

Summary: subpackage for the tika standalone app
Product: [Fedora] Fedora Reporter: Fabrice Bellet <fabrice>
Component: tikaAssignee: gil cattaneo <puntogil>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: java-sig-commits, puntogil
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tika-1.4-4.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-27 21:49:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tike.spec patch
none
updated spec file with maven-shade-plugin disabled none

Description Fabrice Bellet 2014-06-13 08:06:24 UTC
Hi!

it would be great to provide the tika standalone app in a sub-package.

thanks,

Comment 1 Fabrice Bellet 2014-06-13 08:08:01 UTC
Created attachment 908406 [details]
tike.spec patch

proposed patch for tika app subpackage

Comment 2 gil cattaneo 2014-06-13 15:24:25 UTC
hi, 
the patch seem have wrong references to
+BuildRequires: mvn(org.apache.felix:maven-shade-plugin:1.6)
maven-shade-plugin should be removed from pom file, is not usable
see http://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries [1]
(installable) tika-1.4/tika-app/target/original-tika-app-1.4.jar
[1] tika-1.4/tika-app/target/tika-app-1.4.jar

and please remove the version to (i used only how reminder ... e.g. isoparser )
+BuildRequires: mvn(com.google.code.gson:gson:1.7.1)
ig you want use this module, you can create a launcher script

with the following deps, merged by shade plugin (i don't know if are all necessary)
[INFO] --- maven-shade-plugin:2.0:shade (default) @ tika-app ---
[INFO] Including org.apache.tika:tika-parsers:jar:1.4 in the shaded jar.
[INFO] Including javax.xml.stream:stax-api:jar:1.0.1 in the shaded jar.
[INFO] Including org.apache.tika:tika-core:jar:1.4 in the shaded jar.
[INFO] Including edu.ucar:netcdf:jar:4.2-min in the shaded jar.
[INFO] Including edu.ucar:udunits:jar:4.4.2 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpclient:jar:4.2.6 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpcore:jar:4.2.5 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpmime:jar:4.2.6 in the shaded jar.
[INFO] Including org.slf4j:jcl-over-slf4j:jar:1.7.5 in the shaded jar.
[INFO] Including joda-time:joda-time:jar:2.2 in the shaded jar.
[INFO] Including org.jdom:jdom2:jar:2.0.4 in the shaded jar.
[INFO] Including net.jcip:jcip-annotations:jar:1.0 in the shaded jar.
[INFO] Including org.quartz-scheduler:quartz:jar:2.2.0 in the shaded jar.
[INFO] Including javax.mail:mail:jar:any in the shaded jar.
[INFO] Including javax.ejb:ejb:jar:any in the shaded jar.
[INFO] Including javax.servlet:servlet-api:jar:any in the shaded jar.
[INFO] Including javax.jms:jms:jar:any in the shaded jar.
[INFO] Including javax.transaction:jta:jar:any in the shaded jar.
[INFO] Including com.mchange:c3p0:jar:0.9.1.1 in the shaded jar.
[INFO] Including com.mchange:mchange-commons-java:jar:0.2.3.4 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:2.5.0 in the shaded jar.
[INFO] Including net.sf.ehcache:ehcache-core:jar:2.6.2 in the shaded jar.
[INFO] Including com.sleepycat:je:jar:4.0.92 in the shaded jar.
[INFO] Including org.ow2.asm:asm:jar:4.1 in the shaded jar.
[INFO] Including org.apache.james:apache-mime4j-core:jar:0.7.2 in the shaded jar.
[INFO] Including org.apache.james:apache-mime4j-dom:jar:0.7.2 in the shaded jar.
[INFO] Including org.apache.commons:commons-compress:jar:1.5 in the shaded jar.
[INFO] Including org.tukaani:xz:jar:1.2 in the shaded jar.
[INFO] Including commons-codec:commons-codec:jar:1.5 in the shaded jar.
[INFO] Including org.apache.pdfbox:pdfbox:jar:1.8.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:fontbox:jar:1.8.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:jempbox:jar:1.8.1 in the shaded jar.
[INFO] Including commons-logging:commons-logging:jar:1.1.1 in the shaded jar.
[INFO] Including avalon-framework:avalon-framework-api:jar:4.3 in the shaded jar.
[INFO] Including avalon-logkit:avalon-logkit:jar:2.1 in the shaded jar.
[INFO] Including org.bouncycastle:bcmail-jdk16:jar:1.45 in the shaded jar.
[INFO] Including org.bouncycastle:bcprov-jdk16:jar:1.45 in the shaded jar.
[INFO] Including org.apache.poi:poi:jar:3.9 in the shaded jar.
[INFO] Including org.apache.poi:poi-scratchpad:jar:3.9 in the shaded jar.
[INFO] Including org.apache.poi:poi-ooxml:jar:3.9 in the shaded jar.
[INFO] Including org.apache.poi:poi-ooxml-schemas:jar:3.9 in the shaded jar.
[INFO] Including org.apache.xmlbeans:xmlbeans:jar:2.3.0 in the shaded jar.
[INFO] Including dom4j:dom4j:jar:1.6.1 in the shaded jar.
[INFO] Including org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1 in the shaded jar.
[INFO] Including org.ow2.asm:asm-all:jar:4.1 in the shaded jar.
[INFO] Including com.drewnoakes:metadata-extractor:jar:2 in the shaded jar.
[INFO] Including xerces:xercesImpl:jar:2.8.1 in the shaded jar.
[INFO] Including xml-apis:xml-apis:jar:1.4.01 in the shaded jar.
[INFO] Including de.l3s.boilerpipe:boilerpipe:jar:1.1.0 in the shaded jar.
[INFO] Including net.sourceforge.nekohtml:nekohtml:jar:1.9.14 in the shaded jar.
[INFO] Including rome:rome:jar:0.9 in the shaded jar.
[INFO] Including jdom:jdom:jar:1.0 in the shaded jar.
[INFO] Including org.gagravarr:vorbis-java-core:jar:0.1 in the shaded jar.
[INFO] Including com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3 in the shaded jar.
[INFO] Including org.apache.tika:tika-xmp:jar:1.4 in the shaded jar.
[INFO] Including com.adobe.xmp:xmpcore:jar:5.1.2 in the shaded jar.
[INFO] Including org.slf4j:slf4j-log4j12:jar:1.5.6 in the shaded jar.
[INFO] Including org.slf4j:slf4j-api:jar:1.7.4 in the shaded jar.
[INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:1.7.1 in the shaded jar.

regards

Comment 3 gil cattaneo 2014-06-13 15:30:04 UTC
e.g.
%jpackage_script org.apache.tika.cli.TikaCLI "" "" %{name}/%{name}:[list of all required libs, with separate ":"] %{name}-app true

Comment 4 gil cattaneo 2014-06-13 15:30:34 UTC
%jpackage_script org.apache.tika.cli.TikaCLI "" "" %{name}:[list of all required libs, with separate ":"] %{name}-app true

Comment 5 Fabrice Bellet 2014-06-18 08:25:39 UTC
Created attachment 909877 [details]
updated spec file with maven-shade-plugin disabled

Ah I was not aware that all required jar files were bundled in the single tika-app jar file thanks to maven-shade-plugin: thanks for the feedback!

I updated the spec file following your suggestion. The standalone tika-app can process *almost* all sample documents from the tika-parsers subdir (except some encrypted microsoft office files), so I assume I collected all the dependencies required to the tika-parsers.

The problem with this wrapper script is that it cannot be used as-is in the drupal search_api_attachment module, because it expects the self contained tika-app.jar file, which was my primary motivation to get tika-app shipped in fedora, but I can live with that, and patch the drupal module accordingly (this is a third party module, not packaged in Fedora anyway).

Comment 6 gil cattaneo 2014-06-18 13:39:10 UTC
Please , can attached a git format patch?

Comment 7 Fedora Update System 2014-06-18 15:16:03 UTC
tika-1.4-4.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/tika-1.4-4.fc20

Comment 9 Fabrice Bellet 2014-06-18 16:18:04 UTC
great, it works for me, thanks!

Comment 10 Fedora Update System 2014-06-19 22:55:34 UTC
Package tika-1.4-4.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing tika-1.4-4.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-7507/tika-1.4-4.fc20
then log in and leave karma (feedback).

Comment 11 Fedora Update System 2014-06-27 21:49:25 UTC
tika-1.4-4.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.