Bug 648126 - Non-ASCII characters in HTML filenames damaged when copying to publish directory
Non-ASCII characters in HTML filenames damaged when copying to publish directory
Status: CLOSED CURRENTRELEASE
Product: Publican
Classification: Community
Component: publican (Show other bugs)
2.3
noarch Unspecified
low Severity medium
: ---
: ---
Assigned To: Jeff Fearn
Fedora Extras Quality Assurance
:
: 649422 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-31 02:42 EDT by Jesús Franco
Modified: 2010-12-08 16:52 EST (History)
7 users (show)

See Also:
Fixed In Version: 2.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-12-07 22:49:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Jesús Franco 2010-10-31 02:42:47 EDT
Description of problem:

Some links for inside navigation (to jump to next chapter), takes me to home page of docs.fp.o

Version-Release number of selected component (if applicable):

Edition 1.0 (Spanish published version HTML navigable).

How reproducible:

Just jump at last page of a chapter, and then to next chapter. Even if the links looks OK in status page before clicking on them, the browser takes me to docs.fp.o

Steps to Reproduce:
1. Go to http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/ch07s07s02.html

2. Look at status page before clicking on "Next", it should show http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/Gesti%C3%B3n_avanzada_de_repositorios_yum.html

3. Click on it and see what happens
  
Actual results:

Jumps to docs.fp.o

Expected results:

Following at read of the book ;)

Additional info:

I've get this error in Chrome on WIndows and Chromium on Fedora 13. I have not tested in other browsers.
Comment 1 eric@christensenplace.us 2010-10-31 16:12:57 EDT
Thanks for the catch.  I'm not sure why this guide has actually been published, though, as it isn't ready for prime time.

Reassigning to Rudi to troubleshoot Publican.
Comment 2 Ruediger Landmann 2010-10-31 20:07:19 EDT
Thanks Jesús and Eric -- the problem here comes from the non-ASCII characters in the names of some of the HTML files. 

Publican uses the ID attribute on a section or chapter to generate the filename for the HTML page for that section or chapter, and for the URLs of any pages that link to it.

In this case, because the XML is written in Spanish, the IDs have non-ASCII characters in them, and these pages are not getting served correctly.

So in the XML of this book[0] we have:

<chapter id="Gestión_avanzada_de_repositorios_yum">

The HTML page for this chapter is therefore named "Gestión_avanzada_de_repositorios_yum.html", which Publican creates correctly, but somewhere between Publican and the docs server is getting turned into: "Gestión_avanzada_de_repositorios_yum.html", which is nonsense, but is accessible.[1]

Short term, the workaround is to remove all accented characters from the IDs of sections and chapters in that book. For example, change 

<chapter id="Gestión_avanzada_de_repositorios_yum">

to 

<chapter id="Gestion_avanzada_de_repositorios_yum">

Long term, maybe Publican should use percent encoding for HTML filenames with non-ASCII characters in them? I'll change this bug to an RFE to that effect and reassign to Jeff for comment.

Cheers
Rudi


[0] http://git.fedorahosted.org/git/docs/software-management-guide.git (branch "rebase")

[1] http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/Gesti%C3%83%C2%B3n_avanzada_de_repositorios_yum.html
Comment 3 Ruediger Landmann 2010-11-03 19:14:33 EDT
*** Bug 649422 has been marked as a duplicate of this bug. ***
Comment 4 Ruediger Landmann 2010-11-04 00:41:10 EDT
So the problem seems to arise when Publican copies the HTML files from the tmp directory to the publish directory.

When I build the book with "publican build --publish", I get:

tmp/es-ES/html/Gestión_avanzada_de_repositorios_yum.html

and

publish/es-ES/Fedora/14/html/Software_Management_Guide/Gestión_avanzada_de_repositorios_yum.html
Comment 5 Jeff Fearn 2010-11-22 22:31:22 EST
O_O

I have been looking in to this and it's quite odd. If you add an ID with a UTF8 character to any publican book, then publish the HTML, the file is broken as described.

However, if you run the command publican uses to copy the files, on the command line, it works properly!

e.g

$ perl -e 'use File::Copy::Recursive qw(rcopy);rcopy("tmp/en-US/html", "test");'

The contents of test/ are correct!

I'll continue to debug.
Comment 6 Jeff Fearn 2010-11-23 00:48:56 EST
Hi, I have a work around in place for this issue.

Fixed in build: 2.3-0%{?dist}.t49
Comment 7 Jeff Fearn 2010-12-07 22:49:26 EST
Publican 2.4 has shipped with a fix for this issue.
Comment 8 Jesús Franco 2010-12-08 16:52:09 EST
(In reply to comment #7)
> Publican 2.4 has shipped with a fix for this issue.

Thanks for the fix, i'm not able to test it, but i hope the guide owner try to republish his guide ASAP and see if we can succesfully read through the guide now.

Note You need to log in before you can comment on or make changes to this bug.