Bug 1171670
Summary: | Fails to build document in a non-ASCII directory | ||
---|---|---|---|
Product: | [Community] Publican | Reporter: | Raphaël Hertzog <raphael> |
Component: | publican | Assignee: | Nobody <nobody> |
Status: | CLOSED DUPLICATE | QA Contact: | Ruediger Landmann <rlandman> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2 | CC: | cbredesen, rlandman |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-10-24 06:39:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Raphaël Hertzog
2014-12-08 10:50:38 UTC
Could be a bug in XML::TreeBuilder where it uses File::Spec->rel2abs Yeah, I remember we already had a similar issue in the past (non-regression tests... :)). Though I couldn't find the reference of the bug. 648126 & 875021 are similar, but they are publican specific. This looks to be deeper in the stack, and while I'm happy to force UTF8 always on in publican I'm not so sure it's a good idea to do it deeper in the stack. So your guess about XML::TreeBuilder was right. I fixed the problem by replacing the line with the File::Spec->rel2abs() call with this: my $relpath = $directories ? File::Spec->catfile($directories, $sysid) : $sysid; $file = decode_utf8(abs_path(encode_utf8($relpath))); This requires adding "use Cwd qw(abs_path);" and "use Encode;" at the top. I'm not sure what's the best way forward. It looks like I have a better solution. The problem comes from the fact that we pass an UTF-8 Perl string to rel2abs when all the filesystem functions really want raw bytes (independent of encoding). I just added the following two lines before the rel2abs call: $sysid = encode_utf8($sysid) if utf8::is_utf8($sysid); $directories = encode_utf8($directories) if utf8::is_utf8($directories); And it works! Then I wanted to test what happens if "$directories" contain a non-ASCII character, so I edited publican.cfg to say « xml_lang: "ené-US" » and called « publican build --formats=html --langs=ené-US » and it failed but not at the above location: DEBUG: Publican: config loaded Setting up ené-US Processing file tmp/ené-US/xml/Common_Content/Conventions.xml -> tmp/ené-US/xml/Common_Content/Conventions.xml Can't open file 'tmp/ené-US/xml/Common_Content/Conventions.xml' Couldn't open tmp/ené-US/xml/Common_Content/Conventions.xml: No such file or directory at /usr/share/perl5/XML/TreeBuilder.pm line 315. at /usr/share/perl5/Publican/Builder/DocBook5.pm line 480. So any parameter that you use to build a path should really be passed through encode_utf8() first if it's read/received as an UTF-8 string. Obviously you need "use Encode qw(encode_utf8);” for the above fix to work. (In reply to Raphaël Hertzog from comment #4) > I'm not sure what's the best way forward. Just open a CPAN-RT against XML::TreeBurilder and make a pull request against https://github.com/jfearn/XML-TreeBuilder if you are keen. I'm on good terms with the upstream guy, most of the time anyway, so it shouldn't be a problem to get a patched version out soon! ;) :-) Both done: https://github.com/jfearn/XML-TreeBuilder/pull/1 https://rt.cpan.org/Ticket/Display.html?id=101006 Note that you might want to review Publican's code to make sure that you don't feed Unicode strings down to other modules which expect paths. Because everything that comes from configuration files (or even command line) seems to end up as unicode string. *** This bug has been marked as a duplicate of bug 1187448 *** |