My book directory is ~/écriture/systemd (note the "é" which is not ASCII). When I tried to build my book, it fails with this: $ LANG=C publican build --formats=html --langs=en-US FATAL ERROR: en-US/Book_Info.xml: No such file or directory at /usr/lib/x86_64-linux-gnu/perl5/5.20/XML/Parser/Expat.pm line 470. at /usr/bin/publican line 993. I changed the Publican->new call on line 993 to containe "debug => 1" in the parameters but it doesn't print any supplementary information. With strace I get a confirmation that the problem is due to mishandling of the path name (besides the fact that it builds if I copy it to /tmp): stat("en-US/Book_Info.xml", {st_mode=S_IFREG|0644, st_size=1426, ...}) = 0 open("en-US/Book_Info.xml", O_RDONLY) = 4 ioctl(4, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff9dc9af40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(4, 0, SEEK_CUR) = 0 fstat(4, {st_mode=S_IFREG|0644, st_size=1426, ...}) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 brk(0x499a000) = 0x499a000 read(4, "<?xml version='1.0' encoding='ut"..., 8192) = 1426 read(4, "", 8192) = 0 getcwd("/home/rhertzog/\303\251criture/systemd", 4095) = 33 open("/home/rhertzog/\303\203\302\251criture/systemd/en-US/systemd-survival-guide.ent", O_RDONLY) = -1 ENOENT (No such file or directory) close(4) = 0 write(2, "FATAL ERROR: en-US/Book_Info.xml"..., 162FATAL ERROR: en-US/Book_Info.xml: No such file or directory at /usr/lib/x86_64-linux-gnu/perl5/5.20/XML/Parser/Expat.pm line 470. at /usr/bin/publican line 993. ) = 162 This is with publican 3.2.6, libxml-parser-perl 2.41-3, libexpat1 2.1.0-6+b3. I'm not sure that publican is at fault. In fact, it might well be XML::Parser::Expat... if that's the case, it would be nice if you can figure out some simple test case and submit it to the XML::Parser::Expat upstream developers.
Could be a bug in XML::TreeBuilder where it uses File::Spec->rel2abs
Yeah, I remember we already had a similar issue in the past (non-regression tests... :)). Though I couldn't find the reference of the bug.
648126 & 875021 are similar, but they are publican specific. This looks to be deeper in the stack, and while I'm happy to force UTF8 always on in publican I'm not so sure it's a good idea to do it deeper in the stack.
So your guess about XML::TreeBuilder was right. I fixed the problem by replacing the line with the File::Spec->rel2abs() call with this: my $relpath = $directories ? File::Spec->catfile($directories, $sysid) : $sysid; $file = decode_utf8(abs_path(encode_utf8($relpath))); This requires adding "use Cwd qw(abs_path);" and "use Encode;" at the top. I'm not sure what's the best way forward.
It looks like I have a better solution. The problem comes from the fact that we pass an UTF-8 Perl string to rel2abs when all the filesystem functions really want raw bytes (independent of encoding). I just added the following two lines before the rel2abs call: $sysid = encode_utf8($sysid) if utf8::is_utf8($sysid); $directories = encode_utf8($directories) if utf8::is_utf8($directories); And it works! Then I wanted to test what happens if "$directories" contain a non-ASCII character, so I edited publican.cfg to say « xml_lang: "ené-US" » and called « publican build --formats=html --langs=ené-US » and it failed but not at the above location: DEBUG: Publican: config loaded Setting up ené-US Processing file tmp/ené-US/xml/Common_Content/Conventions.xml -> tmp/ené-US/xml/Common_Content/Conventions.xml Can't open file 'tmp/ené-US/xml/Common_Content/Conventions.xml' Couldn't open tmp/ené-US/xml/Common_Content/Conventions.xml: No such file or directory at /usr/share/perl5/XML/TreeBuilder.pm line 315. at /usr/share/perl5/Publican/Builder/DocBook5.pm line 480. So any parameter that you use to build a path should really be passed through encode_utf8() first if it's read/received as an UTF-8 string.
Obviously you need "use Encode qw(encode_utf8);” for the above fix to work.
(In reply to Raphaël Hertzog from comment #4) > I'm not sure what's the best way forward. Just open a CPAN-RT against XML::TreeBurilder and make a pull request against https://github.com/jfearn/XML-TreeBuilder if you are keen. I'm on good terms with the upstream guy, most of the time anyway, so it shouldn't be a problem to get a patched version out soon! ;)
:-) Both done: https://github.com/jfearn/XML-TreeBuilder/pull/1 https://rt.cpan.org/Ticket/Display.html?id=101006 Note that you might want to review Publican's code to make sure that you don't feed Unicode strings down to other modules which expect paths. Because everything that comes from configuration files (or even command line) seems to end up as unicode string.