Bug 1043350 - Em dash causes parser error
Summary: Em dash causes parser error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: PressGang CCMS
Classification: Community
Component: Web-UI
Version: 1.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 1.4
Assignee: Matthew Casperson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-16 04:32 UTC by Zac Dover
Modified: 2014-08-04 22:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-23 23:44:02 UTC


Attachments (Terms of Use)

Description Zac Dover 2013-12-16 04:32:41 UTC
Description of problem:
An em dash in the middle of a <para> causes this to appear:

  topic.xml:15: parser error : PCDATA invalid Char value 20

Version-Release number of selected component (if applicable):


How reproducible:
Add an em dash to the middle of a <para>. To do this, set the compose key and holding down the compose key while pressing the hyphen key three times.

Steps to Reproduce:
1.
2.
3.

Actual results:

 topic.xml:15: parser error : PCDATA invalid Char value 20

Expected results:

 Instead of "topic.xml:15: parser error : PCDATA invalid Char value 20", the message should be "The XML is well-formed."

Additional info:

Comment 1 Lee Newson 2013-12-16 04:42:44 UTC
This works fine if you use the &mdash; entity, I believe this is the preferred approach and as such em dashes shouldn't be used directly. However I'd have to check with Matt on that one.

Comment 2 Matthew Casperson 2014-01-12 20:48:38 UTC
This is probably something to do with the way the emscripten virtual file system deals with UTF8 files. It looks like any non-ascii character will cause validation issues.

Comment 3 Matthew Casperson 2014-01-12 20:53:06 UTC
This bug describes an issue with UTF8 files and the Emscripten virtual file system - https://github.com/kripken/emscripten/pull/402. It appears to be fixed a year ago, but the xml.js library we are using (https://github.com/kripken/xml.js) is two years old.

I'll have to recompile xml.js with the latest version of emscripten.

Comment 4 Matthew Casperson 2014-01-15 01:53:16 UTC
Fixed in 201401151143 and deployed to the dev server.

xmllint has been recompiled from libxml2 2.9.1 with the latest version of Emscripten. The library and instructions on the compilation process can be found at https://github.com/pressgang-ccms/xsltproc.js.

Now all UTF8 characters, like the mdash, will validate properly.

Comment 5 Lee Newson 2014-01-28 00:32:37 UTC
Verified that mdash as well as other UTF-8 characters validate correctly.

Note: This also fixed an issue with characters from other languages being marked as invalid.


Note You need to log in before you can comment on or make changes to this bug.