When building the Publican user manual on a Debian system, the build is taking ages. When stracing the program, I see that it downloads lots of stuff (often multiple times) from oasis-open.org. I believe that it is downloading http://www.oasis-open.org/docbook/xml/4.5/dbcentx.mod and the files referenced there. This should not happen as I have all the required files in the local system. I'm not very familiar with how the XML catalog works that said. I have a feeling that the underlying problem is that the user manual is Docbook 5 and that the specific public identifier is not known in the catalog associated to Docbook 5 (I used to believe that the catalog was something system-wide but it's apparently more complicated than that). Thus my first question is, is there any reason to include dbcentx.mod at all in all doctype (at least in Docbook 5 mode)? BTW it would be nice to have a way to disable network download (even if it means failing the build when we don't have everything required).
Could this be a side affect of 1143060? We include it by default because almost all the books we tested on had the DB4 entities in them and not including it caused a lot of annoyances that would discourage people from migrating. Please open another bug for the --nonetwork option.
It's not a side effect of #1143060 as I was building the manual with the patch applied, otherwise it just fails earlier (during validation, possibly of a docbook 4 test document?). I mention the build of the User Manual but in truth my test is a complete rebuild of the Debian package which includes the build of the user manual but also the test suite. I opened #1144949 for the --nonet option.
Yeah this is odd, our koji build also have no network connectivity during the builds. Clutching at straws time. Do you have both the docbook5 and docbook4 dtds & xsl as build deps?
Oh, I wonder if XML::Catalog isn't resolving catalogues referenced from other catalogues.
I have both the docbook 4 & 5 DTD. I noticed I was missing the docbook 5 XSL but I fixed this and it doesn't change anything. I double checked and it's publican itself which is downloading the files, not xsltproc. Here's an strace extract: [pid 8224] execve("/usr/bin/perl", ["perl", "-CDAS", "-I", "/home/rhertzog/deb/pkg/publican/"..., "/home/rhertzog/deb/pkg/publican/"..., "build", "--formats", "html", "--langs", "de-DE", "--quiet"], [/* 44 vars */]) = 0 [...] [pid 8224] stat("http://www.oasis-open.org/docbook/xml/4.5/dbcentx.mod", 0x7fff35efc310) = -1 ENOENT (No such file or directory) [pid 8224] stat("/etc/xml/catalog", {st_mode=S_IFREG|0644, st_size=5013, ...}) = 0 [pid 8224] open("/etc/xml/catalog", O_RDONLY) = 4 [pid 8224] lseek(4, 0, SEEK_CUR) = 0 [pid 8224] open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 6 [pid 8224] fstat(6, {st_mode=S_IFREG|0644, st_size=177974, ...}) = 0 [pid 8224] mmap(NULL, 177974, PROT_READ, MAP_PRIVATE, 6, 0) = 0x7fe4b0f57000 [pid 8224] close(6) = 0 [pid 8224] read(4, "<?xml version=\"1.0\"?>\n<!DOCTYPE "..., 8192) = 5013 [pid 8224] read(4, "", 3179) = 0 [pid 8224] close(4) = 0 [pid 8224] stat("/etc/xml/docbook-xml.xml", {st_mode=S_IFREG|0644, st_size=10313, ...}) = 0 [pid 8224] open("/etc/xml/docbook-xml.xml", O_RDONLY) = 4 [pid 8224] lseek(4, 0, SEEK_CUR) = 0 [pid 8224] read(4, "<?xml version=\"1.0\"?>\n<!DOCTYPE "..., 8192) = 8192 [pid 8224] read(4, "dStartString=\"-//OASIS//ENTITIES"..., 16384) = 2121 [pid 8224] read(4, "", 14263) = 0 [pid 8224] close(4) = 0 [pid 8224] stat("http://www.oasis-open.org/docbook/xml/4.5/dbcentx.mod", 0x7fff35efc310) = -1 ENOENT (No such file or directory) […] [pid 8224] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4 [pid 8224] fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) [pid 8224] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 8224] connect(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.237.193.98")}, 16) = -1 EINPROGRESS (Operation now in progress) [pid 8224] poll([{fd=4, events=POLLOUT}], 1, 60000) = 1 ([{fd=4, revents=POLLOUT}]) [pid 8224] getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 [pid 8224] sendto(4, "GET /docbook/xml/4.5/dbcentx.mod"..., 94, 0, NULL, 0) = 94 This is the first time it tries to connect to the network in a run of « strace -f perl t/910.publican.Users_Guide.t >/tmp/log 2>&1 ». Before this part of the log, there are other "catalog files" which are loaded and among those there is /usr/share/xml/docbook/schema/dtd/4.5/catalog.xml which containes the public identifier for dbcentx.mod: $ grep -A1 Entities /usr/share/xml/docbook/schema/dtd/4.5/catalog.xml <public publicId="-//OASIS//ENTITIES DocBook Additional General Entities V4.5//EN" uri="dbgenent.mod"/> -- <public publicId="-//OASIS//ENTITIES DocBook Character Entities V4.5//EN" uri="dbcentx.mod"/>
Yeah publican uses XML::TreeBuilder for a lot of stuff, which uses XML::Catalog to handle entities, which uses XML::Parser::Expat. I think something in that stack isn't quite working properly. Initial testing looks like there are some calls where the catalog isn't being set properly, not sure if this is entirely accurate or where/how that is happening.
FYI I've opened a bug against XML::Parser to get a better error message. https://rt.cpan.org/Ticket/Display.html?id=99098 FYI2 If I reverse the declaration of the docbook entities and the books ent file, the error message changes for me. I've checked some code in to devel branch to make it easier to switch the order for testing.
Given your answers, I guess that you are reproducing the problem? Good that it's not specific to the versions I have in Debian.... You seem to convinced that it's in XML::TreeBuilder/XML::Catalog/XML::Parser, have you verified already that the network calls are not made via XML::LibXSLT ? At least the latter has some support to intercept network calls via XML::LibXSLT::Security.
A fix for this issue has shipped in publican 4.2.3.