Bug 987252 - INVALID_CHARACTER_ER is not shown when XML editing but only provided in docbuilder error
INVALID_CHARACTER_ER is not shown when XML editing but only provided in docbu...
Status: CLOSED CURRENTRELEASE
Product: PressGang CCMS
Classification: Community
Component: Web-UI (Show other bugs)
1.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 1.8
Assigned To: Lee Newson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-23 01:25 EDT by Julie
Modified: 2014-07-13 17:10 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-07-13 17:10:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Julie 2013-07-23 01:25:13 EDT
Description of problem:

When an invalid character ( & ) is in the XML text, the validation message is still 'The XML is well formed." but the rendered page does not display. It is hard to locate the problem right away. I only found out what was causing the problem when docbuilder displayed the error message:

Topic ID 15007

    INFO: Topic URL
    ERROR: This topic doesn't have well-formed xml. INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. The processed XML is 


Expected results:
An error message can displayed in XML editing, and user don't have to wait for build to find out the error.
An error message to tell users to change '&' to '&'
Comment 1 Lee Newson 2013-07-23 01:38:10 EDT
Hey Julie, can you provide the topic id this occurred for (and preferably a revision if possible)? As I've tried most variations I can think of and I can't replicate it. When I add an ampersand to the XML I got an error every time, so if I could see how it was used that would help.

For what it's worth these are the errors I got when trying it standalone and at the start, end and middle of a string:

1. parser error : EntityRef: expecting ';'
2. parser error : xmlParseEntityRef: no name
3. parser error : Entity 'production' not defined
Comment 2 Julie 2013-07-23 02:49:39 EDT
(In reply to Lee Newson from comment #1)
> Hey Julie, can you provide the topic id this occurred for (and preferably a
> revision if possible)? As I've tried most variations I can think of and I
> can't replicate it. When I add an ampersand to the XML I got an error every
> time, so if I could see how it was used that would help.
> 
> For what it's worth these are the errors I got when trying it standalone and
> at the start, end and middle of a string:
> 
> 1. parser error : EntityRef: expecting ';'
> 2. parser error : xmlParseEntityRef: no name
> 3. parser error : Entity 'production' not defined

Hey Lee,
     I just checked again and can't seem to replicate the error. 
Topic ID 15007

You're right; I see all those error messages when I type a &.
Comment 3 Lee Newson 2013-07-23 03:00:36 EDT
Okay I was able to find out why you saw what you did. The problem isn't from the ampersand at all and instead is from using a HTML entity in XML content (in this case: ’). This is generally discouraged and hence why it's not implemented in the Java XML parser (Xerces), so I'm going to leave this open to see if we can get this error to show in the UI as well.
Comment 4 Lee Newson 2013-07-23 03:51:56 EDT
Looked into this more and the problem is that the XML spec (http://www.w3.org/TR/REC-xml/#sec-references) refers to the &#...; notation as a character reference and as such when it's converted to a DOM object it should be converted into a character, unlike an entity reference &...; which references some internal/external entity. So possibly what we should be doing is getting the server/csprocessor to convert the the character references to their actual character instead of trying to keep them as entity references.
Comment 7 Lee Newson 2014-07-01 20:12:58 EDT
Fixed in 1.8-SNAPSHOT build 201407020954

Character references (or html entities as they are sometime known) are now excluded from the entity escaping function, meaning that they will be resolved into an actual character by the xml parser.

To go with this our custom convertNodeToString function has also been updated to escape the reserved characters (this is also done by most xml serializers).
Comment 10 Matthew Casperson 2014-07-03 18:09:10 EDT
Confirmed that adding character references like

¬ 
¬ 
« 
«

are rendered in the live preview correctly, and converted to their corresponding characters correctly when saved through the web ui and webdav.
Comment 11 Matthew Casperson 2014-07-03 18:11:32 EDT
Confirmed that topics with extended characters build and preview ok with Publican and csprocessor.
Comment 12 Matthew Casperson 2014-07-03 18:15:22 EDT
Character references build ok when they are added to the content spec directly (like with a section title or the spec product), but the references are not replaced with their associated character.

If we are replacing these entities in topics, it probably makes sense to extend this behaviour to content specs too.
Comment 13 Lee Newson 2014-07-03 19:24:10 EDT
The bigger question here is do we want users to use them in Content Specs, as it is not meant to be XML syntax and is supposed to be clear text?
Comment 14 Lee Newson 2014-07-03 19:27:48 EDT
(In reply to Lee Newson from comment #13)
> The bigger question here is do we want users to use them in Content Specs,
> as it is not meant to be XML syntax and is supposed to be clear text?

With regards to this, I know we have allowed some XML content in to the specs (ie entities), however I'd like to keep as much out as possible.
Comment 15 Lee Newson 2014-07-06 18:28:34 EDT
I was thinking about this more over the weekend and given it won't be visible to most users (with the exception of invalid specs), we might as well implement this.
Comment 16 Lee Newson 2014-07-06 19:29:07 EDT
Fixed in 1.8-SNAPSHOT build 201407070919

The content spec parser has been updated to resolve XML Character references when parsing.
Comment 18 Matthew Casperson 2014-07-07 17:43:37 EDT
Verified that a spec edited through the UI or created using csprocessor with character references were replaced as expected.

Note You need to log in before you can comment on or make changes to this bug.