Bug 875125 - epub output includes files that are not needed (and not listed in the OPF file)
epub output includes files that are not needed (and not listed in the OPF file)
Product: Publican
Classification: Community
Component: publican (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jeff Fearn
Depends On:
  Show dependency treegraph
Reported: 2012-11-09 10:20 EST by Raphaël Hertzog
Modified: 2013-12-18 21:46 EST (History)
2 users (show)

See Also:
Fixed In Version: 4.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-12-18 21:46:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Raphaël Hertzog 2012-11-09 10:20:11 EST
publican epub's output includes all the files in the "images" directory even those that are not used (for instance I have .dia files used to generate some .png than end up in the .epub). The same is true for the html output but at least in the HTML output those files are never downloaded by the end-user. In the epub case since it's all in a single archive, it inflates the size of the file for no good reason.

Furthermore those files are not listed in the OPF file and thus lead to warnings emitted by epubcheck 3.0-RC1:

WARNING: /home/rhertzog/x/tdah/publish/en-US/Debian/6.0/epub/debian-handbook/Debian-6.0-debian-handbook-en-US.epub: item (OEBPS/images/etude-cas.dia) exists in the zip file, but is not declared in the OPF file

This error has been reproduced with the Debian Handbook:
$ git clone git://anonscm.debian.org/debian-handbook/debian-handbook.git
Comment 1 Jeff Fearn 2013-07-09 03:20:25 EDT
We used to have a feature that removed the non referenced images from the output and it resulted in a lot of complaining.

So much so we reduced it to:

$ publican print_unused_images
List of unused Image files in en-US

pondering ...
Comment 2 Raphaël Hertzog 2013-07-09 03:32:10 EDT
I don't know what those complaints were… but if they were legitimate, then maybe it means that we need some options?
Comment 3 Jeff Fearn 2013-07-09 03:44:28 EDT
In the specific case of epubs the inclusion is invalid and unused files should be excluded.
Comment 4 Jeff Fearn 2013-07-09 03:45:07 EDT
(In reply to Jeff Fearn from comment #3)
> In the specific case of epubs the inclusion is invalid and unused files
> should be excluded.

In the specific case of epubs the inclusion is invalid and unused images should be excluded.

Comment 5 Jeff Fearn 2013-09-30 02:41:30 EDT
The approach I'm going to take here is to add every file in the OEBPS directory to the manifest. To remove the files we'd need to parse every XML, CSS, and jscript file. I don't think that is realistic.
Comment 6 HSS Product Manager 2013-09-30 02:47:40 EDT
HSS-QE has reviewed and declined this request. QE for this bug will be handled by IED.
Comment 7 Jeff Fearn 2013-09-30 20:07:26 EDT
Made code include all files in list. Excluding unused content from output is a more generic problem as it affects all HTML output.

To ssh://git.fedorahosted.org/git/publican.git
   8711fbe..9d087c5  HEAD -> devel
Comment 8 Ruediger Landmann 2013-10-10 21:12:37 EDT
Unused images still seem to get included by publican-3.9.9-0.fc19.t4.noarch

This images directory contains two images, one used and one not. Both of them get included in the .epub:

$ ls en-US/images/
powertop.png  Sun_Conure_on_perch.jpg

$ publican build --formats epub --langs en-US

$ ls tmp/en-US/epub/OEBPS/images/
powertop.png  Sun_Conure_on_perch.jpg

$ unzip -l tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub |grep OEBPS/images
   125512  10-11-2013 11:10   OEBPS/images/powertop.png
  3362485  10-11-2013 11:10   OEBPS/images/Sun_Conure_on_perch.jpg
Comment 9 Jeff Fearn 2013-10-10 21:38:01 EDT
To clarify on #7, the fix in this bug is to correctly list all files shipped. A fix for shipping unused files is a much larger issue and covers more than epubs,  nd won't be addressed in this bug.
Comment 10 Ruediger Landmann 2013-10-11 23:49:15 EDT
In that case, in an EPUB built with publican-3.9.9-0.fc19.t4.noarch with an usused image:

$ unzip -l tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub |grep OEBPS/images
   125512  10-12-2013 13:41   OEBPS/images/powertop.png
  3362485  10-12-2013 13:41   OEBPS/images/Sun_Conure_on_perch.jpg

$ grep powertop tmp/en-US/epub/OEBPS/content.opf


$ grep perch tmp/en-US/epub/OEBPS/content.opf

confirm both these files in the OPF file; and 

$ $ java -jar epubcheck-3.0.1.jar /home/rlandmann/Documents/books/rhel/Power_Management_Guide/trunk/6-trunk/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub

is clean

Note You need to log in before you can comment on or make changes to this bug.