Bug 1051921 - Update XML validation to use Docbook 5
Summary: Update XML validation to use Docbook 5
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: PressGang CCMS
Classification: Community
Component: Web-UI
Version: 1.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 1.4
Assignee: Matthew Casperson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1051919
TreeView+ depends on / blocked
 
Reported: 2014-01-12 22:15 UTC by Matthew Casperson
Modified: 2014-08-04 22:27 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-02-23 23:43:56 UTC
Embargoed:


Attachments (Terms of Use)

Description Matthew Casperson 2014-01-12 22:15:54 UTC
It is recommended that Docbook 5 XML be validated against the RELAX-NG schemas (http://www.docbook.org/tdg5/en/html/ch01.html#ex.docbook5). These can be downloaded from http://www.docbook.org/xml/5.0/rng/.

xmllint works with RELAX-NG (http://infohost.nmt.edu/~tcc/help/xml/lint.html#rng), so validation should be reasonably straight forward.

Comment 2 Lee Newson 2014-01-29 06:53:31 UTC
Verified that basic DocBook 5.0 xml will validate. However the following doesn't validate (or render):

<section xmlns:xl="http://www.w3.org/1999/xlink">
	<title>LS command</title>
	<para>
		<application xl:href="http://www.gnu.org/software/emacs/">Emacs</application>
	</para>
</section>

and gives the following errors:

topic.xml:1: element section: validity error : No declaration for attribute xmlns:xl of element section
topic.xml:4: element application: validity error : No declaration for attribute href of element application

This is the error without the namespace declaration:

topic.xml:4: namespace error : Namespace prefix xl for href on application is not defined
		<application xl:href="http://www.gnu.org/software/emacs/">Emacs</application>
		                                                         ^
topic.xml:4: element application: validity error : No declaration for attribute xl:href of element application

Tested with build 201401291413

Comment 5 Matthew Casperson 2014-01-30 03:50:06 UTC
I think we need the ability to define namespaces at the content spec level, and automatically define any common ones.

It is not as trivial as it looks to know which namespace to use, and defining them in each topic could lead to inconsistency.

For example, the following XML is taken from the Docbook documentation at http://docbook.org/docs/howto/#changes:

<article xmlns="http://docbook.org/ns/docbook" 
         xmlns:xl="http://www.w3.org/1999/xlink" version="5.0">
  <title>Test article</title>

  <para><application xl:href="http://www.gnu.org/software/emacs/emacs.html">Emacs</application> 
    is my favourite text editor.</para>
</article>

If you validate this with xmllint using the docbook.dtd at http://www.docbook.org/xml/5.0/dtd/ with the command:

xmllint --dtdvalid docbook.dtd test.xml

you'll get the errors:

test.xml:2: element article: validity error : No declaration for attribute xmlns:xl of element article
test.xml:5: element application: validity error : No declaration for attribute href of element application

This is why the namespace xlink (as opposed to xl) is defined by the web ui validation routine. 

So not even the official Docbook documentation uses namespaces that will validate with common tools. If we automatically define the common ones at the spec level this issue will be largely mitigated.

Comment 6 Matthew Casperson 2014-01-30 03:55:00 UTC
It should be pointed out that the xlink namespace is how the W3C references it: http://www.w3.org/TR/xlink/

Comment 7 Lee Newson 2014-01-30 04:34:34 UTC
As per the XML specification any prefix should be able to be used: http://www.w3.org/TR/REC-xml-names/#NT-Prefix or more specifically:

-------------------------------------------------------------------

Namespace constraint: Prefix Declared

The namespace prefix, unless it is xml or xmlns, MUST have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e., an element in whose content the prefixed markup occurs).

-------------------------------------------------------------------

Also we should not be validating against the DTD (as you noted in the first comment). We should be validating against the RelaxNG schema, as it looks like their DTD just has hacks to get the namespaces to work in a limited way.

As for declaring it at the content spec level I disagree as then the topic becomes 100% bound to the content specs when it should be it's own little component (granted entities already break that concept). I think what we really need here is a global database of pre-defined namespaces that can be used. For custom ones they should then be defined on the topics (or a specific element), as the namespace is only valid for the element and it's children. ie:

<section xmlns="http://docbook.org/ns/docbook" version="5.0">
        <title>Test</title>
	<informalequation xmlns:mml="http://www.w3.org/1998/Math/MathML">
		<mml:math>
			<mml:msup>
				<mml:mi>x</mml:mi>
				<mml:mn>3</mml:mn>
			</mml:msup>
		</mml:math>
	</informalequation>
        <para xmlns:mms="http://www.w3.org/1999/xlink">
            <link mms:href="http://www.example.com">Example Link</link>
        </para>
</section>

is valid (you can see I used the mms prefix twice). That means that the only conflicts should be those in the global database.

Comment 8 Lee Newson 2014-02-11 05:33:01 UTC
Fixed the issue mentioned in Comment #4 (about only normal topics validating) in 1.4-SNAPSHOT build 201402111518

Comment 9 Lee Newson 2014-02-11 05:36:55 UTC
I just noticed I messed up with copying my above example. It should have been:

<section xmlns="http://docbook.org/ns/docbook" version="5.0">
        <title>Test</title>
	<informalequation xmlns:mml="http://www.w3.org/1998/Math/MathML">
		<mml:math>
			<mml:msup>
				<mml:mi>x</mml:mi>
				<mml:mn>3</mml:mn>
			</mml:msup>
		</mml:math>
	</informalequation>
        <para xmlns:mml="http://www.w3.org/1999/xlink">
            <link mml:href="http://www.example.com">Example Link</link>
        </para>
</section>

Comment 10 Matthew Casperson 2014-02-11 21:09:24 UTC
I've stripped out the code that adds the namespaces. All namespace information will have to be defined manually in the topic when elements like link are used.

Conflicting namespaces are already reported on by the validation code, so that edge case is handled.

There is not a great deal we can do to fix validation errors using the xl namespace, as this is a bug in the libxml library.

Comment 11 Matthew Casperson 2014-02-11 21:38:46 UTC
I was wrong before about the reason why the xlink namespace was required. It is because it is defined in the Docbook DTD.

<!ENTITY % db.common.linking.attributes "
	linkend	IDREF	#IMPLIED
	xmlns:xlink	CDATA	#FIXED	'http://www.w3.org/1999/xlink'	
	xlink:href	CDATA	#IMPLIED
	xlink:type	CDATA	#IMPLIED
	xlink:role	CDATA	#IMPLIED
	xlink:arcrole	CDATA	#IMPLIED
	xlink:title	CDATA	#IMPLIED
	xlink:show	(new|replace|embed|other|none)	#IMPLIED
	xlink:actuate	(onLoad|onRequest|other|none)	#IMPLIED

">

Because of a limitation with chrome under linux, the emscripten version of libxml can't run relaxng validation. This means we can only support DTD validation in the browser, which means topics have to use the namespaces defined in the DTD to use the browser based xml validation and rendering.

If authors want to use namespaces that are not defined in the DTD they are free to do so (the content will still be saved), but validation and rendering will not work.

Comment 12 Matthew Casperson 2014-02-11 21:48:08 UTC
Version 201402120646 pushed to dev server.

Comment 13 Lee Newson 2014-02-11 22:53:09 UTC
Given that we are pretty much forcing users to use the "xlink" namespace prefix, I think that the namespaces should automatically be added. The reason for that is that at the moment we are just causing more work for authors, since they have to add the namespace declaration in every topic manually when it really should be part of the base template.

The other option is that when creating a topic and switching the format, the initial template should change as well (or the namespaces should be added). Ideally when clicking create new topic you should initially be asked what the format and topic type would be, but that is another RFE in itself.

Anyways everything else appears to work fine taking in to account, the fact that we are limited to using the DTD. So I'm going to move this back to ASSIGNED just to address the usability issue above.

Comment 14 Matthew Casperson 2014-02-12 03:22:58 UTC
I have reenabled the code that adds the DTD mandated namespaces. Topics can again use xlink attributes without specifying the namespace.

Also updated the code so that the namespaces are not redefined if they are already included in the topic.

Build 201402121313 has been uploaded to the dev server.

Comment 15 Lee Newson 2014-02-12 23:07:00 UTC
Verified and opened BZ#1064593 for asking the user about the topic type/format.


Note You need to log in before you can comment on or make changes to this bug.