Bug 1143892 - Build of the Users Guide always requires the network
Summary: Build of the Users Guide always requires the network
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Publican
Classification: Community
Component: publican
Version: 4.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeff Fearn 🐞
QA Contact: Ruediger Landmann
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-18 08:24 UTC by Raphaël Hertzog
Modified: 2014-10-07 03:19 UTC (History)
2 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2014-10-07 03:19:19 UTC


Attachments (Terms of Use)

Description Raphaël Hertzog 2014-09-18 08:24:57 UTC
When building the Publican user manual on a Debian system, the build is taking ages. When stracing the program, I see that it downloads lots of stuff (often multiple times) from oasis-open.org.

I believe that it is downloading http://www.oasis-open.org/docbook/xml/4.5/dbcentx.mod and the files referenced there.

This should not happen as I have all the required files in the local system.

I'm not very familiar with how the XML catalog works that said. I have a feeling that the underlying problem is that the user manual is Docbook 5 and that the specific public identifier is not known in the catalog associated to Docbook 5 (I used to believe that the catalog was something system-wide but it's apparently more complicated than that).

Thus my first question is, is there any reason to include dbcentx.mod at all in all doctype (at least in Docbook 5 mode)?

BTW it would be nice to have a way to disable network download (even if it means failing the build when we don't have everything required).

Comment 1 Jeff Fearn 🐞 2014-09-21 23:35:37 UTC
Could this be a side affect of 1143060?

We include it by default because almost all the books we tested on had the DB4 entities in them and not including it caused a lot of annoyances that would discourage people from migrating.

Please open another bug for the --nonetwork option.

Comment 2 Raphaël Hertzog 2014-09-22 06:35:36 UTC
It's not a side effect of #1143060 as I was building the manual with the patch applied, otherwise it just fails earlier (during validation, possibly of a docbook 4 test document?). I mention the build of the User Manual but in truth my test is a complete rebuild of the Debian package which includes the build of the user manual but also the test suite.

I opened #1144949 for the --nonet option.

Comment 3 Jeff Fearn 🐞 2014-09-22 23:09:04 UTC
Yeah this is odd, our koji build also have no network connectivity during the builds.

Clutching at straws time.

Do you have both the docbook5 and docbook4 dtds & xsl as build deps?

Comment 4 Jeff Fearn 🐞 2014-09-23 00:04:39 UTC
Oh, I wonder if XML::Catalog isn't resolving catalogues referenced from other catalogues.

Comment 5 Raphaël Hertzog 2014-09-23 06:55:44 UTC
I have both the docbook 4 & 5 DTD. I noticed I was missing the docbook 5 XSL but I fixed this and it doesn't change anything. I double checked and it's publican itself which is downloading the files, not xsltproc. Here's an strace extract:

[pid  8224] execve("/usr/bin/perl", ["perl", "-CDAS", "-I", "/home/rhertzog/deb/pkg/publican/"..., "/home/rhertzog/deb/pkg/publican/"..., "build", "--formats", "html", "--langs", "de-DE", "--quiet"], [/* 44 vars */]) = 0
[...]
[pid  8224] stat("http://www.oasis-open.org/docbook/xml/4.5/dbcentx.mod", 0x7fff35efc310) = -1 ENOENT (No such file or directory)
[pid  8224] stat("/etc/xml/catalog", {st_mode=S_IFREG|0644, st_size=5013, ...}) = 0
[pid  8224] open("/etc/xml/catalog", O_RDONLY) = 4
[pid  8224] lseek(4, 0, SEEK_CUR)       = 0
[pid  8224] open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 6
[pid  8224] fstat(6, {st_mode=S_IFREG|0644, st_size=177974, ...}) = 0
[pid  8224] mmap(NULL, 177974, PROT_READ, MAP_PRIVATE, 6, 0) = 0x7fe4b0f57000
[pid  8224] close(6)                    = 0
[pid  8224] read(4, "<?xml version=\"1.0\"?>\n<!DOCTYPE "..., 8192) = 5013
[pid  8224] read(4, "", 3179)           = 0
[pid  8224] close(4)                    = 0
[pid  8224] stat("/etc/xml/docbook-xml.xml", {st_mode=S_IFREG|0644, st_size=10313, ...}) = 0
[pid  8224] open("/etc/xml/docbook-xml.xml", O_RDONLY) = 4
[pid  8224] lseek(4, 0, SEEK_CUR)       = 0
[pid  8224] read(4, "<?xml version=\"1.0\"?>\n<!DOCTYPE "..., 8192) = 8192
[pid  8224] read(4, "dStartString=\"-//OASIS//ENTITIES"..., 16384) = 2121
[pid  8224] read(4, "", 14263)          = 0
[pid  8224] close(4)                    = 0
[pid  8224] stat("http://www.oasis-open.org/docbook/xml/4.5/dbcentx.mod", 0x7fff35efc310) = -1 ENOENT (No such file or directory)
[…]
[pid  8224] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
[pid  8224] fcntl(4, F_GETFL)           = 0x2 (flags O_RDWR)
[pid  8224] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  8224] connect(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.237.193.98")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid  8224] poll([{fd=4, events=POLLOUT}], 1, 60000) = 1 ([{fd=4, revents=POLLOUT}])
[pid  8224] getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid  8224] sendto(4, "GET /docbook/xml/4.5/dbcentx.mod"..., 94, 0, NULL, 0) = 94


This is the first time it tries to connect to the network in a run of « strace -f perl t/910.publican.Users_Guide.t >/tmp/log 2>&1 ».

Before this part of the log, there are other "catalog files" which are loaded and among those there is /usr/share/xml/docbook/schema/dtd/4.5/catalog.xml which containes the public identifier for dbcentx.mod:
$ grep -A1 Entities /usr/share/xml/docbook/schema/dtd/4.5/catalog.xml
<public publicId="-//OASIS//ENTITIES DocBook Additional General Entities V4.5//EN"
        uri="dbgenent.mod"/>
--
<public publicId="-//OASIS//ENTITIES DocBook Character Entities V4.5//EN"
        uri="dbcentx.mod"/>

Comment 6 Jeff Fearn 🐞 2014-09-23 21:55:00 UTC
Yeah publican uses XML::TreeBuilder for a lot of stuff, which uses XML::Catalog to handle entities, which uses XML::Parser::Expat. I think something in that  stack isn't quite working properly.

Initial testing looks like there are some calls where the catalog isn't being set properly, not sure if this is entirely accurate or where/how that is happening.

Comment 7 Jeff Fearn 🐞 2014-09-24 02:40:36 UTC
FYI I've opened a bug against XML::Parser to get a better error message.

https://rt.cpan.org/Ticket/Display.html?id=99098

FYI2 If I reverse the declaration of the docbook entities and the books ent file, the error message changes for me. I've checked some code in to devel branch to make it easier to switch the order for testing.

Comment 8 Raphaël Hertzog 2014-09-24 07:17:52 UTC
Given your answers, I guess that you are reproducing the problem? Good that it's not specific to the versions I have in Debian....

You seem to convinced that it's in XML::TreeBuilder/XML::Catalog/XML::Parser, have you verified already that the network calls are not made via XML::LibXSLT ?

At least the latter has some support to intercept network calls via XML::LibXSLT::Security.

Comment 9 Jeff Fearn 🐞 2014-10-07 03:19:19 UTC
A fix for this issue has shipped in publican 4.2.3.


Note You need to log in before you can comment on or make changes to this bug.