Bug 648126

Summary: Non-ASCII characters in HTML filenames damaged when copying to publish directory
Product: [Community] Publican Reporter: Jesús Franco <jefrancomix>
Component: publicanAssignee: Jeff Fearn 🐞 <jfearn>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 2.3CC: eric, guillermo.gomez, jfearn, kwade, mmcallis, publican-list, rlandman
Target Milestone: ---   
Target Release: ---   
Hardware: noarch   
OS: Unspecified   
Fixed In Version: 2.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-08 03:49:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Jesús Franco 2010-10-31 06:42:47 UTC
Description of problem:

Some links for inside navigation (to jump to next chapter), takes me to home page of docs.fp.o

Version-Release number of selected component (if applicable):

Edition 1.0 (Spanish published version HTML navigable).

How reproducible:

Just jump at last page of a chapter, and then to next chapter. Even if the links looks OK in status page before clicking on them, the browser takes me to docs.fp.o

Steps to Reproduce:
1. Go to http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/ch07s07s02.html

2. Look at status page before clicking on "Next", it should show http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/Gesti%C3%B3n_avanzada_de_repositorios_yum.html

3. Click on it and see what happens
Actual results:

Jumps to docs.fp.o

Expected results:

Following at read of the book ;)

Additional info:

I've get this error in Chrome on WIndows and Chromium on Fedora 13. I have not tested in other browsers.

Comment 1 eric@christensenplace.us 2010-10-31 20:12:57 UTC
Thanks for the catch.  I'm not sure why this guide has actually been published, though, as it isn't ready for prime time.

Reassigning to Rudi to troubleshoot Publican.

Comment 2 Ruediger Landmann 2010-11-01 00:07:19 UTC
Thanks Jesús and Eric -- the problem here comes from the non-ASCII characters in the names of some of the HTML files. 

Publican uses the ID attribute on a section or chapter to generate the filename for the HTML page for that section or chapter, and for the URLs of any pages that link to it.

In this case, because the XML is written in Spanish, the IDs have non-ASCII characters in them, and these pages are not getting served correctly.

So in the XML of this book[0] we have:

<chapter id="Gestión_avanzada_de_repositorios_yum">

The HTML page for this chapter is therefore named "Gestión_avanzada_de_repositorios_yum.html", which Publican creates correctly, but somewhere between Publican and the docs server is getting turned into: "Gestión_avanzada_de_repositorios_yum.html", which is nonsense, but is accessible.[1]

Short term, the workaround is to remove all accented characters from the IDs of sections and chapters in that book. For example, change 

<chapter id="Gestión_avanzada_de_repositorios_yum">


<chapter id="Gestion_avanzada_de_repositorios_yum">

Long term, maybe Publican should use percent encoding for HTML filenames with non-ASCII characters in them? I'll change this bug to an RFE to that effect and reassign to Jeff for comment.


[0] http://git.fedorahosted.org/git/docs/software-management-guide.git (branch "rebase")

[1] http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/Gesti%C3%83%C2%B3n_avanzada_de_repositorios_yum.html

Comment 3 Ruediger Landmann 2010-11-03 23:14:33 UTC
*** Bug 649422 has been marked as a duplicate of this bug. ***

Comment 4 Ruediger Landmann 2010-11-04 04:41:10 UTC
So the problem seems to arise when Publican copies the HTML files from the tmp directory to the publish directory.

When I build the book with "publican build --publish", I get:




Comment 5 Jeff Fearn 🐞 2010-11-23 03:31:22 UTC

I have been looking in to this and it's quite odd. If you add an ID with a UTF8 character to any publican book, then publish the HTML, the file is broken as described.

However, if you run the command publican uses to copy the files, on the command line, it works properly!


$ perl -e 'use File::Copy::Recursive qw(rcopy);rcopy("tmp/en-US/html", "test");'

The contents of test/ are correct!

I'll continue to debug.

Comment 6 Jeff Fearn 🐞 2010-11-23 05:48:56 UTC
Hi, I have a work around in place for this issue.

Fixed in build: 2.3-0%{?dist}.t49

Comment 7 Jeff Fearn 🐞 2010-12-08 03:49:26 UTC
Publican 2.4 has shipped with a fix for this issue.

Comment 8 Jesús Franco 2010-12-08 21:52:09 UTC
(In reply to comment #7)
> Publican 2.4 has shipped with a fix for this issue.

Thanks for the fix, i'm not able to test it, but i hope the guide owner try to republish his guide ASAP and see if we can succesfully read through the guide now.