Bug 648126 - Non-ASCII characters in HTML filenames damaged when copying to publish directory
Summary: Non-ASCII characters in HTML filenames damaged when copying to publish directory
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Publican
Classification: Community
Component: publican
Version: 2.3
Hardware: noarch
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Jeff Fearn 🐞
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 649422 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-31 06:42 UTC by Jesús Franco
Modified: 2010-12-08 21:52 UTC (History)
7 users (show)

Fixed In Version: 2.4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-08 03:49:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 661513 0 low CLOSED Links inside navigable HTML published stills broken 2021-02-22 00:41:40 UTC

Internal Links: 661513

Description Jesús Franco 2010-10-31 06:42:47 UTC
Description of problem:

Some links for inside navigation (to jump to next chapter), takes me to home page of docs.fp.o

Version-Release number of selected component (if applicable):

Edition 1.0 (Spanish published version HTML navigable).

How reproducible:

Just jump at last page of a chapter, and then to next chapter. Even if the links looks OK in status page before clicking on them, the browser takes me to docs.fp.o

Steps to Reproduce:
1. Go to http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/ch07s07s02.html

2. Look at status page before clicking on "Next", it should show http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/Gesti%C3%B3n_avanzada_de_repositorios_yum.html

3. Click on it and see what happens
  
Actual results:

Jumps to docs.fp.o

Expected results:

Following at read of the book ;)

Additional info:

I've get this error in Chrome on WIndows and Chromium on Fedora 13. I have not tested in other browsers.

Comment 1 eric 2010-10-31 20:12:57 UTC
Thanks for the catch.  I'm not sure why this guide has actually been published, though, as it isn't ready for prime time.

Reassigning to Rudi to troubleshoot Publican.

Comment 2 Ruediger Landmann 2010-11-01 00:07:19 UTC
Thanks Jesús and Eric -- the problem here comes from the non-ASCII characters in the names of some of the HTML files. 

Publican uses the ID attribute on a section or chapter to generate the filename for the HTML page for that section or chapter, and for the URLs of any pages that link to it.

In this case, because the XML is written in Spanish, the IDs have non-ASCII characters in them, and these pages are not getting served correctly.

So in the XML of this book[0] we have:

<chapter id="Gestión_avanzada_de_repositorios_yum">

The HTML page for this chapter is therefore named "Gestión_avanzada_de_repositorios_yum.html", which Publican creates correctly, but somewhere between Publican and the docs server is getting turned into: "Gestión_avanzada_de_repositorios_yum.html", which is nonsense, but is accessible.[1]

Short term, the workaround is to remove all accented characters from the IDs of sections and chapters in that book. For example, change 

<chapter id="Gestión_avanzada_de_repositorios_yum">

to 

<chapter id="Gestion_avanzada_de_repositorios_yum">

Long term, maybe Publican should use percent encoding for HTML filenames with non-ASCII characters in them? I'll change this bug to an RFE to that effect and reassign to Jeff for comment.

Cheers
Rudi


[0] http://git.fedorahosted.org/git/docs/software-management-guide.git (branch "rebase")

[1] http://docs.fedoraproject.org/es-ES/Fedora/14/html/Software_Management_Guide/Gesti%C3%83%C2%B3n_avanzada_de_repositorios_yum.html

Comment 3 Ruediger Landmann 2010-11-03 23:14:33 UTC
*** Bug 649422 has been marked as a duplicate of this bug. ***

Comment 4 Ruediger Landmann 2010-11-04 04:41:10 UTC
So the problem seems to arise when Publican copies the HTML files from the tmp directory to the publish directory.

When I build the book with "publican build --publish", I get:

tmp/es-ES/html/Gestión_avanzada_de_repositorios_yum.html

and

publish/es-ES/Fedora/14/html/Software_Management_Guide/Gestión_avanzada_de_repositorios_yum.html

Comment 5 Jeff Fearn 🐞 2010-11-23 03:31:22 UTC
O_O

I have been looking in to this and it's quite odd. If you add an ID with a UTF8 character to any publican book, then publish the HTML, the file is broken as described.

However, if you run the command publican uses to copy the files, on the command line, it works properly!

e.g

$ perl -e 'use File::Copy::Recursive qw(rcopy);rcopy("tmp/en-US/html", "test");'

The contents of test/ are correct!

I'll continue to debug.

Comment 6 Jeff Fearn 🐞 2010-11-23 05:48:56 UTC
Hi, I have a work around in place for this issue.

Fixed in build: 2.3-0%{?dist}.t49

Comment 7 Jeff Fearn 🐞 2010-12-08 03:49:26 UTC
Publican 2.4 has shipped with a fix for this issue.

Comment 8 Jesús Franco 2010-12-08 21:52:09 UTC
(In reply to comment #7)
> Publican 2.4 has shipped with a fix for this issue.

Thanks for the fix, i'm not able to test it, but i hope the guide owner try to republish his guide ASAP and see if we can succesfully read through the guide now.


Note You need to log in before you can comment on or make changes to this bug.