Bug 1016338

Summary: Detect invalid UTF-8 in XML
Product: [Community] PressGang CCMS Reporter: Ruediger Landmann <rlandman>
Component: CCMS-CoreAssignee: pressgang-ccms-dev
Status: NEW --- QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 1.1CC: cbredesen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1012194    

Description Ruediger Landmann 2013-10-08 00:43:07 UTC
Description of problem:
If a topic contains valid XML entities that are invalid UTF-8, PressGang doesn't report any problem, but the builds in DocBuilder and elsewhere fail for no obvious reason.

Version-Release number of selected component (if applicable):
1.1

How reproducible:
100%

Steps to Reproduce:
1. Create a topic
2. Insert a &#13; somewhere (carriage return)
3. Take a look in DocBuilder

Actual results:
PressGang reports no problem with the topic, but it doesn't build

Expected results:
PressGang warns user that there's a UTF-8 problem

Additional info:
PressGang should probably still allow users to write and store valid XML that's not UTF-8 compliant. That will never build in Publican, but we should remain open to the possibility that users might want to transform their XML with some other tool that might not require UTF-8 compliance.

Only adding this one as a blocker because of its pure nuisance value; it's easily worked around with sed before doing a mass upload. This particular CR is the only offending one I've hit so far.

Comment 1 Matthew Casperson 2014-01-12 21:34:44 UTC
Is there some documentation on character codes that are not valid UTF-8?