Bug 701667

Summary: Improve quality of Publican's epub generation
Product: [Community] Publican Reporter: William Cohen <wcohen>
Component: publicanAssignee: Jeff Fearn 🐞 <jfearn>
Status: CLOSED CURRENTRELEASE QA Contact: Ruediger Landmann <rlandman+disabled>
Severity: low Docs Contact:
Priority: unspecified    
Version: 2.5CC: mmcallis, publican-list, raphael, rlandman, rnewton
Target Milestone: 3.0   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: 3.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-31 03:10:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Publican generated epub file
none
Output of epubcheck program
none
fixed EPUB for review none

Description William Cohen 2011-05-03 14:09:22 UTC
Description of problem:

The epub documents generated by publican could be improved. They are very slow to open and page through on Barnes & Noble classic Nook. 

Also when looking at the document detail on the nook give very little useful information about the publication when compared to other epub documents. No Author, Publisher, or Publication Date

There are a couple epub checkers on the web and there are many errors for the generated document. Below are URLs for the validator used to check the epub document:

http://code.google.com/p/epubcheck/
http://www.threepress.org/document/epub-validate/



Version-Release number of selected component (if applicable):

Used a RHEL-6 machine to work around bz698012 with:

publican-2.5-1.el6.x86_64
publican-doc-2.5-1.el6.x86_64
publican-fedora-1.7-0.el5.noarch


How reproducible:

Every time


Steps to Reproduce:


Download systemtap srpm from:
 http://koji.fedoraproject.org/koji/buildinfo?buildID=215008
rpm -Uvh systemtap-1.4-2.fc14.src.rpm

As root:
yum install "publican*" -y
yum-builddep systemtap-1.4-2.fc14.src.rpm 

cd ~/rpmbuild/SPEC
rpmbuild -ba systemtap.spec 

cd ~/rpmbuild/BUILD/systemtap-1.4/doc/beginners
publican build --formats=epub --langs=en-US

check generated epub document wit hhttp://code.google.com/p/epubcheck/


cd build/en-US/
java -jar /tmp/epubcheck-1.2.jar Systemtap-1.4-SystemTap_Beginners_Guide-en-US.epub 


  
Actual results:

epubcheck reports many errors
Slow, hard to navigate epub document generated.


Expected results:

epubcheck doesn't report any errors.
Speedy, easy to navigate epub document generated.


Additional info:

Comment 1 William Cohen 2011-05-03 14:10:50 UTC
Created attachment 496535 [details]
Publican generated epub file

Comment 2 William Cohen 2011-05-03 14:12:18 UTC
Created attachment 496536 [details]
Output of epubcheck program

Comment 3 William Cohen 2011-05-12 05:16:35 UTC
There errors can be grouped into:
  1) "exists in the zip file, but is not declared in the OPF file"
     Common_Content and some picture files are not included.
  2) "fragment identifier is not defined in ..."
  3) "could not parse OEBPS/...: duplicate id: ..." in ix01.html and toc.ncx
     duplicate id for h1 tagged elements
  4) the ix01.html file is very hard to read, virtually no newlines in file
  5) Some problems OEBPS/Common_Content/images/title_logo.svg file

Comment 4 Jeff Fearn 🐞 2012-03-13 07:12:31 UTC
Fixed some errors. Some error messages appear to be invalid, need to verify tool is validating correctly.

More fixes required, looks like the xsl to build the file list isn't recursive "sometimes" :(

Pushed To ssh://git.fedorahosted.org/git/publican.git
   a033b42..28f73d8  master -> master

Comment 5 Jeff Fearn 🐞 2012-03-14 03:51:06 UTC
PUG now builds to epub with no validation errors.

$ java -jar ~/Downloads/epubcheck-3.0b2.jar build/en-US/Publican-3.0-Users_Guide-en-US.epub
Epubcheck Version 3.0b2

No errors or warnings detected.

I don't have an ebook reader so I have no idea if this broke anything :(

Pushed To ssh://git.fedorahosted.org/git/publican.git
   4700cf5..403d79f  master -> master

Comment 6 William Cohen 2012-03-14 12:59:52 UTC
Jeff,

If you have a epub document generated with the new version of publican. I can try it out this evening on my nook simple touch.

Comment 7 Jeff Fearn 🐞 2012-03-15 02:22:26 UTC
Created attachment 570143 [details]
fixed EPUB for review

Comment 8 William Cohen 2012-03-15 16:04:19 UTC
The newly generated Publican-3.0-Users_Guide-en-US.epub seems to be much better behaved on the Nook simple touch reader than the old Publican-2.6-Users_Guide-en-US.epub from http://jfearn.fedorapeople.org/en-US/Publican/2.6/html/Users_Guide/ . The  Publican-3.0-Users_Guide-en-US.epub starts up in a reasonable amount of time, the pages turn quickly, table of contents reasonably quick.

The test results from epub check for the newly generated file look a lot better than old checks:

$ java -jar /tmp/epubcheck-3.0b4.jar /tmp/Publican-3.0-Users_Guide-en-US.epub 
Epubcheck Version 3.0b4

Validating against EPUB version 2.0
ERROR: /tmp/Publican-3.0-Users_Guide-en-US.epub/OEBPS/Common_Content/css/default.css: 'OEBPS/Common_Content/css/overrides.css': referenced resource missing in the package
ERROR: /tmp/Publican-3.0-Users_Guide-en-US.epub/OEBPS/Common_Content/css/default.css: 'OEBPS/Common_Content/css/lang.css': referenced resource missing in the package
ERROR: /tmp/Publican-3.0-Users_Guide-en-US.epub/OEBPS/Common_Content/css/print.css: 'OEBPS/Common_Content/css/overrides.css': referenced resource missing in the package
ERROR: /tmp/Publican-3.0-Users_Guide-en-US.epub/OEBPS/Common_Content/css/print.css: 'OEBPS/Common_Content/css/lang.css': referenced resource missing in the package

Check finished with warnings or errors!

Comment 9 Rebecca Newton 2012-05-04 04:17:45 UTC
I get errors too, with both epubcheck and validator.idpf, the same as above comment. It does look prettier in Calibre though.

Comment 10 Rebecca Newton 2012-05-04 04:37:11 UTC
And here's an update: checked with Rudi who suggested it was a brand issue. Made a brand spanking new book and tried again; errors are persistent. 


ERROR: Documentation-0.1-test-for-epub-en-US.epub/OEBPS/Common_Content/css/default.css: 'OEBPS/Common_Content/css/overrides.css': referenced resource missing in the package
ERROR: Documentation-0.1-test-for-epub-en-US.epub/OEBPS/Common_Content/css/default.css: 'OEBPS/Common_Content/css/lang.css': referenced resource missing in the package
ERROR: Documentation-0.1-test-for-epub-en-US.epub/OEBPS/Common_Content/css/print.css: 'OEBPS/Common_Content/css/overrides.css': referenced resource missing in the package
ERROR: Documentation-0.1-test-for-epub-en-US.epub/OEBPS/Common_Content/css/print.css: 'OEBPS/Common_Content/css/lang.css': referenced resource missing in the package

Check finished with warnings or errors

I tried it with and without brand commented out. Any really obvious steps I'm missing? Setting back to Assigned for now.

Comment 11 Raphaël Hertzog 2012-05-04 06:03:25 UTC
With another document of mine, I also got lots of errors on the generated epub. 

Many were due to the fact that publican has a much too restrictive implementation of the "anchor" XSL template. Upstream XSL generates lots of <a name=""> but Publican drops this and forwards the id attribute but only on some specific elements.

We really need to not loose a single identifier that can be used as a target for links. And at the very least all those on chapters/sections/sidebars/admonitions.

I also add a problem that some files were listed twice in manifest entries.

Comment 12 Jeff Fearn 🐞 2012-06-24 10:15:11 UTC
Fixed css warnings.

To ssh://git.fedorahosted.org/git/publican.git
   bcedf1e..d123942  master -> master

We won't be fixing the ID issue as part of this bug, it's not specifically and epub issue and we need to fix the individual tags that lead to that error. Unfortunately upstream is happy generating HTML that is invalid, I'd be happy to do that too but it's not what's been decided at this point.

Comment 13 Raphaël Hertzog 2012-06-24 14:59:28 UTC
(In reply to comment #12)
> We won't be fixing the ID issue as part of this bug, it's not specifically
> and epub issue and we need to fix the individual tags that lead to that
> error. Unfortunately upstream is happy generating HTML that is invalid, I'd
> be happy to do that too but it's not what's been decided at this point.

Has this been submitted upstream at least? It would gladly argue for a fix on the upstream side if there was some initial report that I could support.

Comment 14 Ruediger Landmann 2012-07-11 07:06:57 UTC
Verified that the CSS issues are fixed in build t207; the remaining errors relate to IDs and are out-of-scope for now. 

$ java -jar epubcheck-3.0b5.jar ~/Documents/books/rhel/Power_Management_Guide/releases/6.0/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub 
Epubcheck Version 3.0b5

Validating against EPUB version 2.0
ERROR: /home/rlandmann/Documents/books/rhel/Power_Management_Guide/releases/6.0/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub/OEBPS/PowerTOP.html(12,683): 'fig-PowerTOP': fragment identifier is not defined in 'OEBPS/PowerTOP.html'
ERROR: /home/rlandmann/Documents/books/rhel/Power_Management_Guide/releases/6.0/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub/OEBPS/PowerTOP.html(27,484): 'fig-PowerTOP': fragment identifier is not defined in 'OEBPS/PowerTOP.html'
ERROR: /home/rlandmann/Documents/books/rhel/Power_Management_Guide/releases/6.0/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub/OEBPS/cpufreq_governors.html(30,142): 'enabling_a_cpufreq_governor': fragment identifier is not defined in 'OEBPS/cpufreq_governors.html'
ERROR: /home/rlandmann/Documents/books/rhel/Power_Management_Guide/releases/6.0/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub/OEBPS/cpufreq_governors.html(60,273): 'enabling_a_cpufreq_governor': fragment identifier is not defined in 'OEBPS/cpufreq_governors.html'
ERROR: /home/rlandmann/Documents/books/rhel/Power_Management_Guide/releases/6.0/tmp/en-US/Red_Hat_Enterprise_Linux-6-Power_Management_Guide-en-US.epub/OEBPS/cpufreq_governors.html(62,419): 'enabling_a_cpufreq_governor': fragment identifier is not defined in 'OEBPS/cpufreq_governors.html'

Check finished with warnings or errors