Red Hat Bugzilla – Full Text Bug Listing
|Summary:||epub output includes files that are not needed (and not listed in the OPF file)|
|Product:||[Community] Publican||Reporter:||Raphaël Hertzog <raphael>|
|Component:||publican||Assignee:||Jeff Fearn <jfearn>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||tools-bugs <tools-bugs>|
|Fixed In Version:||4.0.0||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-12-18 21:46:37 EST||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Raphaël Hertzog 2012-11-09 10:20:11 EST
publican epub's output includes all the files in the "images" directory even those that are not used (for instance I have .dia files used to generate some .png than end up in the .epub). The same is true for the html output but at least in the HTML output those files are never downloaded by the end-user. In the epub case since it's all in a single archive, it inflates the size of the file for no good reason. Furthermore those files are not listed in the OPF file and thus lead to warnings emitted by epubcheck 3.0-RC1: WARNING: /home/rhertzog/x/tdah/publish/en-US/Debian/6.0/epub/debian-handbook/Debian-6.0-debian-handbook-en-US.epub: item (OEBPS/images/etude-cas.dia) exists in the zip file, but is not declared in the OPF file This error has been reproduced with the Debian Handbook: $ git clone git://anonscm.debian.org/debian-handbook/debian-handbook.git
Comment 1 Jeff Fearn 2013-07-09 03:20:25 EDT
We used to have a feature that removed the non referenced images from the output and it resulted in a lot of complaining. So much so we reduced it to: $ publican print_unused_images List of unused Image files in en-US images/drupal_add_user.png pondering ...
Comment 2 Raphaël Hertzog 2013-07-09 03:32:10 EDT
I don't know what those complaints were… but if they were legitimate, then maybe it means that we need some options?
Comment 3 Jeff Fearn 2013-07-09 03:44:28 EDT
In the specific case of epubs the inclusion is invalid and unused files should be excluded.
Comment 4 Jeff Fearn 2013-07-09 03:45:07 EDT
(In reply to Jeff Fearn from comment #3) > In the specific case of epubs the inclusion is invalid and unused files > should be excluded. In the specific case of epubs the inclusion is invalid and unused images should be excluded. FTFM
Comment 5 Jeff Fearn 2013-09-30 02:41:30 EDT
The approach I'm going to take here is to add every file in the OEBPS directory to the manifest. To remove the files we'd need to parse every XML, CSS, and jscript file. I don't think that is realistic.
Comment 6 HSS Product Manager 2013-09-30 02:47:40 EDT
HSS-QE has reviewed and declined this request. QE for this bug will be handled by IED.
Comment 7 Jeff Fearn 2013-09-30 20:07:26 EDT
Made code include all files in list. Excluding unused content from output is a more generic problem as it affects all HTML output. To ssh://git.fedorahosted.org/git/publican.git 8711fbe..9d087c5 HEAD -> devel
Comment 8 Ruediger Landmann 2013-10-10 21:12:37 EDT
Unused images still seem to get included by publican-3.9.9-0.fc19.t4.noarch This images directory contains two images, one used and one not. Both of them get included in the .epub: $ ls en-US/images/ powertop.png Sun_Conure_on_perch.jpg $ publican build --formats epub --langs en-US $ ls tmp/en-US/epub/OEBPS/images/ powertop.png Sun_Conure_on_perch.jpg $ unzip -l tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub |grep OEBPS/images 125512 10-11-2013 11:10 OEBPS/images/powertop.png 3362485 10-11-2013 11:10 OEBPS/images/Sun_Conure_on_perch.jpg
Comment 9 Jeff Fearn 2013-10-10 21:38:01 EDT
To clarify on #7, the fix in this bug is to correctly list all files shipped. A fix for shipping unused files is a much larger issue and covers more than epubs, nd won't be addressed in this bug.
Comment 10 Ruediger Landmann 2013-10-11 23:49:15 EDT
In that case, in an EPUB built with publican-3.9.9-0.fc19.t4.noarch with an usused image: $ unzip -l tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub |grep OEBPS/images 125512 10-12-2013 13:41 OEBPS/images/powertop.png 3362485 10-12-2013 13:41 OEBPS/images/Sun_Conure_on_perch.jpg $ grep powertop tmp/en-US/epub/OEBPS/content.opf and $ grep perch tmp/en-US/epub/OEBPS/content.opf confirm both these files in the OPF file; and $ $ java -jar epubcheck-3.0.1.jar /home/rlandmann/Documents/books/rhel/Power_Management_Guide/trunk/6-trunk/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub is clean