Bug 589333

Summary: Publican doesn't follow standards for ePub files
Product: [Community] Publican Reporter: Jared Smith <jsmith.fedora>
Component: publicanAssignee: Jeff Fearn <jfearn>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 1.6CC: jfearn, mmcallis, publican-list, r.landmann
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 1.6.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-21 23:13:55 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Jared Smith 2010-05-05 17:20:04 EDT
Description of problem:

When Publican builds the zip file for an ePub file, it does so in a way that is not compliant with the ePub standard.

Version-Release number of selected component (if applicable):

publican-1.6.2-0.fc13.noarch

How reproducible:

Every time

Steps to Reproduce:
1. pushd /tmp/
2. publican create --name testepub
3. publican build -f epub -l all
4. Run resulting epub against the online ePub validator at http://threepress.org/document/epub-validate/, or download epubcheck from http://code.google.com/p/epubcheck/ and run manually.
  
Actual results:

The "mimetype" file is not the first file in the Zip archive, and is most likely compressed.

Expected results:

The "mimetype" file should be the very first file in the Zip archive, and should not be compressed.

Additional info:

From the Linux command line, the proper way to create the Zip archive would be:

zip -q0X  testepub.epub mimetype
zip -qXr9D  testepub.epub *

I obviously don't know much about Perl's Archive::Zip module, and whether or not it will handle the above requirements.  It appears at first glance, however, that it would simply require something like:

        $member = $zip->addFile( "$tmp_dir/$lang/$format/mimetype" );
        $member->desiredCompressionMethod( COMPRESSION_STORED );

immedately after creating the zip archive, and then excluding the "mimetype" file when you call:

        my @filelist = File::Find::Rule->file->in(".");
        foreach my $file (@filelist) {
            $member = $zip->addFile($file);
        }

Thoughts?  Am I going about this the wrong way?
Comment 1 Jeff Fearn 2010-05-05 20:26:11 EDT
Hi, pretty close! I did it this way since it made it clear that mimetype is being treated special and it allows the remaining content to be compressed.

@@ -820,17 +820,18 @@
         $dir = pushd("$tmp_dir/$lang/$format");
 
         my $zip    = Archive::Zip->new();
+
+        my $mimetype = $zip->addFile( "mimetype" );
+        $mimetype->desiredCompressionMethod( COMPRESSION_STORED );
+
         my $member = $zip->addDirectory("OEBPS/");
         $member = $zip->addDirectory("META-INF/");
 
-##     $member = $zip->addFile( "$tmp_dir/$lang/$format/mimetype" );
-
-        my @filelist = File::Find::Rule->file->in(".");
+        my @filelist = File::Find::Rule->file->not_name('mimetype')->in(".");
         foreach my $file (@filelist) {
             $member = $zip->addFile($file);
         }

Also I had to make a few other changes to get rid of some other errors:

A: Changed paths to the CSS and common images since the validation tool chokes on './foo' so it has to be 'foo'.

B: The default xsl spams xml:lang and lang a lot, which is often invalid. So I copied it to our epub.xsl and commented it out since it's not really useful anyway.

C: xsl:template name="opf.manifest" doesn't handle print CSS, so I copied that to our epub.xsl and added xsl:if test="$html.stylesheet.print != ''". This should probably go upstream.

D: Modified xsl:template name="body.attributes" so if class is null it doesn't get set, instead of outputting class="".

E: The validator still complains about 'label' tag, but this is a valid tag, so I'm ignoring it.
Comment 2 Ruediger Landmann 2010-05-06 02:31:05 EDT
Errors noted in the original report no longer visible in EPUBs built with 1.6.3.t150

However, <preface>s and <appendix>es lead to "duplicate id" errors:

Fedora-13-Burning_ISO_images_to_disc-en-US.epub: could not parse OEBPS/appe-Burning_ISO_images_to_disc-Revision_History.html: duplicate id: appe-Burning_ISO_images_to_disc-Revision_History

And books seem to get a lot of "fragment identifier is not defined" errors:

Fedora-13-User_Guide-en-US.epub/OEBPS/toc.ncx(37): 'id377438': fragment identifier is not defined in 'OEBPS/pref-User_Guide-Preface.html'

Fedora-13-User_Guide-en-US.epub/OEBPS/toc.ncx(43): 'id542324': fragment identifier is not defined in 'OEBPS/pref-User_Guide-Preface.html' 

Finally, it looks like we'll have to look closely at the contents of the SVGs used in brands if we want EPUBs to validate, because these show a wide variety of errors.
Comment 3 Jeff Fearn 2010-05-10 02:20:37 EDT
(In reply to comment #2)
> Errors noted in the original report no longer visible in EPUBs built with
> 1.6.3.t150
> 
> However, <preface>s and <appendix>es lead to "duplicate id" errors:
> 
> Fedora-13-Burning_ISO_images_to_disc-en-US.epub: could not parse
> OEBPS/appe-Burning_ISO_images_to_disc-Revision_History.html: duplicate id:
> appe-Burning_ISO_images_to_disc-Revision_History

FIXED.

> And books seem to get a lot of "fragment identifier is not defined" errors:
> 
> Fedora-13-User_Guide-en-US.epub/OEBPS/toc.ncx(37): 'id377438': fragment
> identifier is not defined in 'OEBPS/pref-User_Guide-Preface.html'
> 
> Fedora-13-User_Guide-en-US.epub/OEBPS/toc.ncx(43): 'id542324': fragment
> identifier is not defined in 'OEBPS/pref-User_Guide-Preface.html' 

FIXED
 
> Finally, it looks like we'll have to look closely at the contents of the SVGs
> used in brands if we want EPUBs to validate, because these show a wide variety
> of errors.    

This error is wrong IMHO , the text in the svg is:

   version="1.0"

Which is correct according to:

http://www.w3.org/TR/2002/WD-SVG11-20020108/#version-att
Comment 4 Ruediger Landmann 2010-05-11 00:50:07 EDT
Yeah, "version" should be fine in SVGs according to the spec. Duplicate IDs and "fragment identifier not defined" errors verified fixed in version 1.6.2.t169