Bug 825005 - Some global entities still break topics
Some global entities still break topics
Product: PressGang CCMS
Classification: Community
Component: Web-UI (Show other bugs)
All Linux
low Severity medium
: ---
: ---
Assigned To: Matthew Casperson
Depends On: 739466
  Show dependency treegraph
Reported: 2012-05-24 15:36 EDT by Stephen Gordon
Modified: 2014-08-04 18:27 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 739466
Last Closed: 2013-07-01 19:32:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Stephen Gordon 2012-05-24 15:36:29 EDT
I've cloned an existing bug that was marked CLOSED CURRENT RELEASE as I think it did indeed fix the cases that were listed in it. Today however using the csprocessor to build output containing a number of newly imported topics I received an error "ERROR: Topic doesn't have well-formed xml" in the compiler output.

I couldn't see anything obviously wrong with the topic content so I exported it and used xmllint to validate it against the DocBook 4.5 DTD, sure enough it passed as valid.

 After a bit of investigation I found that the lines causing issues used ', which is the single quote. This in itself isn't a problem, after all I can just replace them with a single quote. What I would like to see however is:

1) A better error message. The topic was well-formed as far as the DocBook XML 4.5 DTD goes.

2) A review of how entities are handled, it seems like under the previous bug the most common cases were picked up but we still have entities that are valid in DocBook XML 4.5 but don't work via skynet.

The topic # was 7526 and the revision # exhibiting the issue is 97932.
Comment 1 Lee Newson 2012-05-26 04:33:06 EDT
Will look into it since I changed the entity handling recently due to another bug that wasn't logged where the "& will fail";" was picked up as an entity in:

   private final String "This is an example that contains an ampersand & will fail";

As for the better error messages I'll have to take another look as the last time I looked the library we used didn't report any errors. So that means I'll have to find another that fits our requirements.
Comment 2 Lee Newson 2012-05-26 07:11:25 EDT
Also what version where you using? I'm just asking because I did originally miss values that used a # and fixed it in 0.24.2. In saying that though there is another underlying issue even if you are using 0.24.2.

On the note of the better error message, I found a way with our current library to return error messages, so the next version will contain something like:

ERROR: Topic doesn't have well-formed xml. The content of elements must consist of well-formed character data or markup.

or for missing tags:

ERROR: Topic doesn't have well-formed xml. The element type "para" must be terminated by the matching end-tag "</para>".

While the libraries do return the line numbers they are sometimes very inaccurate depending on the issue, so for now I'll leave that out.
Comment 3 Lee Newson 2012-05-26 08:01:08 EDT
Will have to talk to Matt on Monday about this since Xerces doesn't permit HTML entities when parsing. So I need to check if we should throw errors about it or just convert the HTML entities to XML entities. Since xmllint and other tools we use allow it I would say that the second option is the better.
Comment 4 Lee Newson 2012-05-29 01:03:49 EDT
Talked to Matt about using HTML entities in XML and he said that we should throw an error and encourage the correct XML entities be used.
Comment 5 Lee Newson 2012-11-23 00:31:27 EST
Marking is as ON_QA since we are counting HTML Decimal Notation as invalid. You should use the string based version as it is more readable and that is the only format that works at this point in time with xerces.

Note You need to log in before you can comment on or make changes to this bug.