Bug 705956

Summary: publican print_unused parses all files in the en-US and subfolders
Product: [Community] Publican Reporter: Jared MORGAN <jmorgan>
Component: publicanAssignee: Jeff Fearn 🐞 <jfearn>
Status: CLOSED CURRENTRELEASE QA Contact: Ruediger Landmann <rlandman+disabled>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.5CC: mmcallis, mmurray, publican-list, rlandman
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-26 00:42:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jared MORGAN 2011-05-19 00:28:54 UTC
Description of problem:

When checking for unused DocBook XML files in a user guide, the publican print_unused command checks all xi:included files, regardless of the attributes set on the xi:include. sub-directories below en-US. This causes problems for non-docbook XML files stored in the /extras directory. print_unused detects these as invalid XML and abends.

Version-Release number of selected component (if applicable):

publican v2.5

How reproducible:

100%

Steps to Reproduce:
You can test this on books from Middleware that contain XML configuration samples, such as the Hibernate Core Reference Guide (this guide only contains one configuration sample so should be easy to test).

1. xi:include files containing XML configuration samples. 
2. Set the parse="text" attribute on the xi:include
2. Ensure the configuration sample uses the .xml extension (not .xml_sample or some other arbitrary file extension name).
3. Execute "publican print_unused" on the document.
  
Actual results:

You get a validation error, and it is difficult to see what the problem is.

Expected results:

The print_unused command ignores the xi:include marked as parse="text" and searches for other excluded DocBook XML files.

Additional info:

Original Email sent out to list. The info above summarises this information better, but including for extra detail.

==========================
**Short Answer**

Don't name your XML code example files using the .xml filename extension. Choose a consistent alternative filename for your XML files, such as .xml_sample.

**Background**

The useful "publican print_unused" command displays any files that are not used in your XML.

It does this by parsing all files with a .xml extension. Depending on your naming convention, this may also include any XML code examples you have xi:included in your documentation.

If you leave your XML code example files with a .xml extension (instead of something like .xml_sample), the command will fail because it is trying to parse XML that does not contain valid markup. This is particularly relevant if your code samples have ellipses in them to show that content has been removed for readability.

publican print_unused currently disregards the parse="text" parameter set on the xi:include.

**How I Discovered This**

Translation discovered some unused files in the branch, and wanted me to remove them so they didn't waste translation effort.

I tried to execute publican print_unused, but it failed with an error message.

Ryan helped me work out what the problem was, as described in $BACKGROUND. The error message is not that descriptive.

I've just spent a full day going back through the EAP 5.1.0 branch for the translators. 
============================

Comment 2 Jeff Fearn 🐞 2011-05-19 09:47:47 UTC
Stopped print_unused from loading xml files parsed as text.

Committed revision 1774.

Comment 3 Jeff Fearn 🐞 2011-07-07 11:08:49 UTC
Back ported to branches/publican-2x

Committed revision 1813.