Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Publican uses several GB when compiling docbook with a large number of invalid xrefs|
|Product:||[Community] Publican||Reporter:||Matthew Casperson <mcaspers>|
|Component:||publican||Assignee:||Brian Forte <bforte>|
|Status:||CLOSED NOTABUG||QA Contact:||Ruediger Landmann <rlandman>|
|Version:||3.0||CC:||bforte, cbredesen, jfearn, jwulf, misty, mmcallis, publican-list, r.landmann|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2011-05-11 00:41:46 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Matthew Casperson 2011-05-10 20:04:36 EDT
The attached file is an example of a docbook book that includes a large number of invalid xrefs. This was the result of a tool that attempted to link topics together. Admittedly there are "a lot" of invalid xrefs, but trying to compile the book consumes at least 4 GB of memory (if not closer to 6 or 7 GB), which will kill a system that doesn't have that memory free.
Comment 1 Matthew Casperson 2011-05-10 20:05:25 EDT
Created attachment 498180 [details] Sample docbook
Comment 2 Joshua Wulf 2011-05-10 20:21:17 EDT
To reproduce: publican build --langs=en-US --formats=html on the attached book. Publican sucks up 4 or 5GBs of RAM.
Comment 3 Matthew Casperson 2011-05-10 20:31:19 EDT
By "kill" I mean "make so unresponsive that you are forced to reset".
Comment 4 Joshua Wulf 2011-05-10 20:53:41 EDT
Or, if you don't have a swapfile the kernel will kill the publican job. Running two builds of that book in parallel results in: Beginning work on en-US Validation failed: Killed
Comment 5 Brian Forte 2011-05-11 00:41:46 EDT
Even one invalid xref invalidates the DocBook XML, ensuring Publican won’t build the book. Lots and lots of invalid xrefs just makes the XML that much more invalid. Remove the invalid xrefs and the book will build and the memory consumption won’t occur and the system killing won’t happen.
Comment 6 Misty Stanley-Jones 2011-05-11 00:56:49 EDT
Then it seems to me that reasonable behavior would be for Publican to fail at the first invalid xref and not to continue using system resources.
Comment 7 Joshua Wulf 2011-05-11 01:00:58 EDT
+1 How about stopping at the first instance of invalidity, rather than killing my system?
Comment 8 Joshua Wulf 2011-05-11 01:08:17 EDT
It's either xmllint or xsltproc doing it. If a document has invalid links, xsltproc probably doesn't get called, because xmllint will fail it; so probably xmllint.
Comment 9 Joshua Wulf 2011-05-11 01:08:54 EDT
lol at the "Doing that hurts? Don't do that then" prescription.
Comment 10 Jeff Fearn 2011-05-11 03:16:49 EDT
FWIW this is probably XML::LibXML::Error spamming error nodes. Finding out where it's doing that and limiting it would take quite a lot of effort, so it's not likely this would get done any time soon. AFAICT there is no option in LibXML to stop at the first error found, it's possible it might be catchable in XML::LibXML::Error, but again that is a significant development effort finding out where and how to do that so it would not happen in any reasonable time frame. If this is having a significant impact then it'd be worth it for the people it's affecting to ask about these changes with the upstream XML::LibXML maintainer at CPAN. Cheers, Jeff.