Bug 703653 - Publican uses several GB when compiling docbook with a large number of invalid xrefs
Summary: Publican uses several GB when compiling docbook with a large number of invali...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Publican
Classification: Community
Component: publican
Version: 3.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Brian Forte
QA Contact: Ruediger Landmann
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-11 00:04 UTC by Matthew Casperson
Modified: 2014-08-04 22:25 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-05-11 04:41:46 UTC
Embargoed:


Attachments (Terms of Use)
Sample docbook (728.05 KB, application/zip)
2011-05-11 00:05 UTC, Matthew Casperson
no flags Details

Description Matthew Casperson 2011-05-11 00:04:36 UTC
The attached file is an example of a docbook book that includes a large number of invalid xrefs. This was the result of a tool that attempted to link topics together. Admittedly there are "a lot" of invalid xrefs, but trying to compile the book consumes at least 4 GB of memory (if not closer to 6 or 7 GB), which will kill a system that doesn't have that memory free.

Comment 1 Matthew Casperson 2011-05-11 00:05:25 UTC
Created attachment 498180 [details]
Sample docbook

Comment 2 Joshua Wulf 2011-05-11 00:21:17 UTC
To reproduce:

publican build --langs=en-US --formats=html

on the attached book. Publican sucks up 4 or 5GBs of RAM.

Comment 3 Matthew Casperson 2011-05-11 00:31:19 UTC
By "kill" I mean "make so unresponsive that you are forced to reset".

Comment 4 Joshua Wulf 2011-05-11 00:53:41 UTC
Or, if you don't have a swapfile the kernel will kill the publican job. Running two builds of that book in parallel results in:

Beginning work on en-US
Validation failed: 
Killed

Comment 5 Brian Forte 2011-05-11 04:41:46 UTC
Even one invalid xref invalidates the DocBook XML, ensuring Publican won’t build the book.

Lots and lots of invalid xrefs just makes the XML that much more invalid.

Remove the invalid xrefs and the book will build and the memory consumption won’t occur and the system killing won’t happen.

Comment 6 Misty Stanley-Jones 2011-05-11 04:56:49 UTC
Then it seems to me that reasonable behavior would be for Publican to fail at the first invalid xref and not to continue using system resources.

Comment 7 Joshua Wulf 2011-05-11 05:00:58 UTC
+1

How about stopping at the first instance of invalidity, rather than killing my system?

Comment 8 Joshua Wulf 2011-05-11 05:08:17 UTC
It's either xmllint or xsltproc doing it. If a document has invalid links, xsltproc probably doesn't get called, because xmllint will fail it; so probably xmllint.

Comment 9 Joshua Wulf 2011-05-11 05:08:54 UTC
lol at the "Doing that hurts? Don't do that then" prescription.

Comment 10 Jeff Fearn 🐞 2011-05-11 07:16:49 UTC
FWIW this is probably XML::LibXML::Error spamming error nodes. Finding out where it's doing that and limiting it would take quite a lot of effort, so it's not likely this would get done any time soon.

AFAICT there is no option in LibXML to stop at the first error found, it's possible it might be catchable in XML::LibXML::Error, but again that is a significant development effort finding out where and how to do that so it would not happen in any reasonable time frame.

If this is having a significant impact then it'd be worth it for the people it's affecting to ask about these changes with the upstream XML::LibXML maintainer at CPAN.

Cheers, Jeff.


Note You need to log in before you can comment on or make changes to this bug.